summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2022-08-31netfilter: remove nf_conntrack_helper sysctl and modparam togglesPablo Neira Ayuso
__nf_ct_try_assign_helper() remains in place but it now requires a template to configure the helper. A toggle to disable automatic helper assignment was added by: a9006892643a ("netfilter: nf_ct_helper: allow to disable automatic helper assignment") in 2012 to address the issues described in "Secure use of iptables and connection tracking helpers". Automatic conntrack helper assignment was disabled by: 3bb398d925ec ("netfilter: nf_ct_helper: disable automatic helper assignment") back in 2016. This patch removes the sysctl and modparam toggles, users now have to rely on explicit conntrack helper configuration via ruleset. Update tools/testing/selftests/netfilter/nft_conntrack_helper.sh to check that auto-assignment does not happen anymore. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-30bpftool: Add support for querying cgroup_iter linkHao Luo
Support dumping info of a cgroup_iter link. This includes showing the cgroup's id and the order for walking the cgroup hierarchy. Example output is as follows: > bpftool link show 1: iter prog 2 target_name bpf_map 2: iter prog 3 target_name bpf_prog 3: iter prog 12 target_name cgroup cgroup_id 72 order self_only > bpftool -p link show [{ "id": 1, "type": "iter", "prog_id": 2, "target_name": "bpf_map" },{ "id": 2, "type": "iter", "prog_id": 3, "target_name": "bpf_prog" },{ "id": 3, "type": "iter", "prog_id": 12, "target_name": "cgroup", "cgroup_id": 72, "order": "self_only" } ] Signed-off-by: Hao Luo <haoluo@google.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220829231828.1016835-1-haoluo@google.com Signed-off-by: Martin KaFai Lau <martin.lau@linux.dev>
2022-08-30memblock tests: add tests for memblock_trim_memoryRebecca Mckeever
Add tests for memblock_trim_memory() for the following scenarios: - all regions aligned - one unaligned region that is smaller than the alignment - one unaligned region that is unaligned at the base - one unaligned region that is unaligned at the end Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/0e5f55154a3b66581e04ba3717978795cbc08a5b.1661578349.git.remckee0@gmail.com
2022-08-30memblock tests: add tests for memblock_*bottom_up functionsRebecca Mckeever
Add simple tests for memblock_set_bottom_up() and memblock_bottom_up(). Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/b03701d2faeaf00f7184e4b72903de4e5e939437.1661578349.git.remckee0@gmail.com
2022-08-30memblock tests: update alloc_nid_api to test memblock_alloc_try_nid_rawRebecca Mckeever
Update memblock_alloc_try_nid() tests so that they test either memblock_alloc_try_nid() or memblock_alloc_try_nid_raw() depending on the value of alloc_nid_test_flags. Run through all the existing tests in alloc_nid_api twice: once for memblock_alloc_try_nid() and once for memblock_alloc_try_nid_raw(). When the tests run memblock_alloc_try_nid(), they test that the entire memory region is zero. When the tests run memblock_alloc_try_nid_raw(), they test that the entire memory region is nonzero. The content of the memory region is initialized to nonzero, and we expect it to remain unchanged if running memblock_alloc_try_nid_raw(). Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/6fa8938f67872841c10a00afb042947d1d280a04.1661578349.git.remckee0@gmail.com
2022-08-30memblock tests: update alloc_api to test memblock_alloc_rawRebecca Mckeever
Update memblock_alloc() tests so that they test either memblock_alloc() or memblock_alloc_raw() depending on the value of alloc_test_flags. Run through all the existing tests in memblock_alloc_api twice: once for memblock_alloc() and once for memblock_alloc_raw(). When the tests run memblock_alloc(), they test that the entire memory region is zero. When the tests run memblock_alloc_raw(), they test that the entire memory region is nonzero. The content of the memory region is initialized to nonzero, and we expect it to remain unchanged if running memblock_alloc_raw(). Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/5a7cfb2f807ee2cb53ee77f9f5c910107b253d6e.1661578349.git.remckee0@gmail.com
2022-08-30memblock tests: add additional tests for basic api and memblock_allocRebecca Mckeever
Add tests for memblock_add(), memblock_reserve(), memblock_remove(), memblock_free(), and memblock_alloc() for the following test scenarios. memblock_add() and memblock_reserve(): - add/reserve a memory block in the gap between two existing memory blocks, and check that the blocks are merged into one region - try to add/reserve memblock regions that extend past PHYS_ADDR_MAX memblock_remove() and memblock_free(): - remove/free a region when it is the only available region + These tests ensure that the first region is overwritten with a "dummy" region when the last remaining region of that type is removed or freed. - remove/free() a region that overlaps with two existing regions of the relevant type - try to remove/free memblock regions that extend past PHYS_ADDR_MAX memblock_alloc(): - try to allocate a region that is larger than the total size of available memory (memblock.memory) Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/c23c0393c5b9a53fe7f676996913c629495e9727.1661578349.git.remckee0@gmail.com
2022-08-30memblock tests: add labels to verbose output for generic alloc testsRebecca Mckeever
Generic tests for memblock_alloc*() functions do not use separate functions for testing top-down and bottom-up allocation directions. Therefore, the function name that is displayed in the verbose testing output does not include the allocation direction. Add an additional prefix when running generic tests for memblock_alloc*() functions that indicates which allocation direction is set. The prefix will be displayed when the tests are run in verbose mode. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/fb76a42253d2a196a7daea29dd8121a69904f58e.1661578349.git.remckee0@gmail.com
2022-08-30memblock tests: update zeroed memory check for memblock_alloc_* testsRebecca Mckeever
Update the assert in memblock_alloc_try_nid() and memblock_alloc_from() tests that checks whether the memory is cleared so that it checks the entire chunk of allocated memory instead of just the first byte. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/24b3271751756100142e65b75284d43b4d30c9b7.1661578349.git.remckee0@gmail.com
2022-08-30memblock tests: update tests to check if memblock_alloc zeroed memoryRebecca Mckeever
Add an assert in memblock_alloc() tests where allocation is expected to occur. The assert checks whether the entire chunk of allocated memory is cleared. The current memblock_alloc() tests do not check whether the allocated memory was zeroed. memblock_alloc() should zero the allocated memory since it is a wrapper for memblock_alloc_try_nid(). Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/83ffb941b65074f40eb14552f8bfe5b71fe50abd.1661578349.git.remckee0@gmail.com
2022-08-30memblock tests: update reference to obsolete build option in commentsRebecca Mckeever
The VERBOSE build option was replaced with the --verbose runtime option, but the comments describing the ASSERT_*() macros still refer to the VERBOSE build option. Update these comments so that they refer to the --verbose runtime option. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/5f8a4c2bde34cc029282c68d47eda982d950f421.1660451025.git.remckee0@gmail.com
2022-08-30memblock tests: add command line help optionRebecca Mckeever
Add a help command line option to the help message. Add the help option to the short and long options so it will be recognized as a valid option. Usage: $ ./main -h Or: $ ./main --help Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/0f3b93a79de78c0da1ca90f74fe35e9a85c7cf93.1660451025.git.remckee0@gmail.com
2022-08-29selftests/bpf: Fix connect4_prog tcp/socket header type conflictJames Hilliard
There is a potential for us to hit a type conflict when including netinet/tcp.h and sys/socket.h, we can replace both of these includes with linux/tcp.h and bpf_tcp_helpers.h to avoid this conflict. Fixes errors like the below when compiling with gcc BPF backend: In file included from /usr/include/netinet/tcp.h:91, from progs/connect4_prog.c:11: /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char' 34 | typedef __INT8_TYPE__ int8_t; | ^~~~~~ In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155, from /usr/include/x86_64-linux-gnu/bits/socket.h:29, from /usr/include/x86_64-linux-gnu/sys/socket.h:33, from progs/connect4_prog.c:10: /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'} 24 | typedef __int8_t int8_t; | ^~~~~~ /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int' 43 | typedef __INT64_TYPE__ int64_t; | ^~~~~~~ /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'} 27 | typedef __int64_t int64_t; | ^~~~~~~ Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220829154710.3870139-1-james.hilliard1@gmail.com
2022-08-29selftests/bpf: Fix bind{4,6} tcp/socket header type conflictJames Hilliard
There is a potential for us to hit a type conflict when including netinet/tcp.h with sys/socket.h, we can remove these as they are not actually needed. Fixes errors like the below when compiling with gcc BPF backend: In file included from /usr/include/netinet/tcp.h:91, from progs/bind4_prog.c:10: /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char' 34 | typedef __INT8_TYPE__ int8_t; | ^~~~~~ In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155, from /usr/include/x86_64-linux-gnu/bits/socket.h:29, from /usr/include/x86_64-linux-gnu/sys/socket.h:33, from progs/bind4_prog.c:9: /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'} 24 | typedef __int8_t int8_t; | ^~~~~~ /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int' 43 | typedef __INT64_TYPE__ int64_t; | ^~~~~~~ /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'} 27 | typedef __int64_t int64_t; | ^~~~~~~ make: *** [Makefile:537: /home/buildroot/bpf-next/tools/testing/selftests/bpf/bpf_gcc/bind4_prog.o] Error 1 Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220826052925.980431-1-james.hilliard1@gmail.com
2022-08-28Merge tag 'x86-urgent-2022-08-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix PAT on Xen, which caused i915 driver failures - Fix compat INT 80 entry crash on Xen PV guests - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs - Fix RSB stuffing regressions - Fix ORC unwinding on ftrace trampolines - Add Intel Raptor Lake CPU model number - Fix (work around) a SEV-SNP bootloader bug providing bogus values in boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups. - Fix SEV-SNP early boot failure - Fix the objtool list of noreturn functions and annotate snp_abort(), which bug confused objtool on gcc-12. - Fix the documentation for retbleed * tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI: Mention retbleed vulnerability info file for sysfs x86/sev: Mark snp_abort() noreturn x86/sev: Don't use cc_platform_has() for early SEV-SNP calls x86/boot: Don't propagate uninitialized boot_params->cc_blob_address x86/cpu: Add new Raptor Lake CPU model number x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry x86/nospec: Fix i386 RSB stuffing x86/nospec: Unwreck the RSB stuffing x86/bugs: Add "unknown" reporting for MMIO Stale Data x86/entry: Fix entry_INT80_compat for Xen PV guests x86/PAT: Have pat_enabled() properly reflect state when running on Xen
2022-08-27perf stat: Capitalize topdown metrics' namesZhengjun Xing
Capitalize topdown metrics' names to follow the intel SDM. Before: # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 228,094.05 msec cpu-clock # 225.026 CPUs utilized 842 context-switches # 3.691 /sec 224 cpu-migrations # 0.982 /sec 70 page-faults # 0.307 /sec 23,164,105 cycles # 0.000 GHz 29,403,446 instructions # 1.27 insn per cycle 5,268,185 branches # 23.097 K/sec 33,239 branch-misses # 0.63% of all branches 136,248,990 slots # 597.337 K/sec 32,976,450 topdown-retiring # 24.2% retiring 4,651,918 topdown-bad-spec # 3.4% bad speculation 26,148,695 topdown-fe-bound # 19.2% frontend bound 72,515,776 topdown-be-bound # 53.2% backend bound 6,008,540 topdown-heavy-ops # 4.4% heavy operations # 19.8% light operations 3,934,049 topdown-br-mispredict # 2.9% branch mispredict # 0.5% machine clears 16,655,439 topdown-fetch-lat # 12.2% fetch latency # 7.0% fetch bandwidth 41,635,972 topdown-mem-bound # 30.5% memory bound # 22.7% Core bound 1.013634593 seconds time elapsed After: # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 228,081.94 msec cpu-clock # 225.003 CPUs utilized 824 context-switches # 3.613 /sec 224 cpu-migrations # 0.982 /sec 67 page-faults # 0.294 /sec 22,647,423 cycles # 0.000 GHz 28,870,551 instructions # 1.27 insn per cycle 5,167,099 branches # 22.655 K/sec 32,383 branch-misses # 0.63% of all branches 133,411,074 slots # 584.926 K/sec 32,352,607 topdown-retiring # 24.3% Retiring 4,456,977 topdown-bad-spec # 3.3% Bad Speculation 25,626,487 topdown-fe-bound # 19.2% Frontend Bound 70,955,316 topdown-be-bound # 53.2% Backend Bound 5,834,844 topdown-heavy-ops # 4.4% Heavy Operations # 19.9% Light Operations 3,738,781 topdown-br-mispredict # 2.8% Branch Mispredict # 0.5% Machine Clears 16,286,803 topdown-fetch-lat # 12.2% Fetch Latency # 7.0% Fetch Bandwidth 40,802,069 topdown-mem-bound # 30.6% Memory Bound # 22.6% Core Bound 1.013683125 seconds time elapsed Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220825015458.3252239-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-27perf docs: Update the documentation for the save_type filterKan Liang
Update the documentation to reflect the kernel changes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220816125612.2042397-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-27perf sched: Fix memory leaks in __cmd_record detected with -fsanitize=addressIan Rogers
An array of strings is passed to cmd_record but not freed. As cmd_record modifies the array, add another array as a copy that can be mutated allowing the original array contents to all be freed. Detected with -fsanitize=address. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220824145733.409005-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-27perf record: Fix manpage formatting of description of support to hybrid systemsAndi Kleen
The Intel hybrid description is written in a different style than the rest of the perf record man page. There were some new command line options added after it which resulted in very strange section ordering. Move the hybrid include last. Also the sub sections in the hybrid document don't fit the record manpage well (especially since it talks about all kinds of unrelated commands). I left this for now, but would be better to separate this properly in the different man pages. It would be better to use sub sections for the other sections, but these don't seem to be supported in AsciiDoc? Some of the examples are still misrendered in the manpage with an indented troff command, but I don't know how to fix that. In any case it's now better than before. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220818100127.249401-1-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-27perf test: Stat test for repeat with a weak groupIan Rogers
Breaking a weak group requires multiple passes of an evlist, with multiple runs this can introduce bugs ultimately leading to segfaults. Add a test to cover this. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220822213352.75721-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-27perf stat: Clear evsel->reset_group for each stat runIan Rogers
If a weak group is broken then the reset_group flag remains set for the next run. Having reset_group set means the counter isn't created and ultimately a segfault. A simple reproduction of this is: # perf stat -r2 -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}:W which will be added as a test in the next patch. Fixes: 4804e0111662d7d8 ("perf stat: Use affinity for opening events") Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220822213352.75721-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-27tools kvm headers arm64: Update KVM header from the kernel sourcesArnaldo Carvalho de Melo
To pick the changes from: ae3b1da95413614f ("KVM: arm64: Fix compile error due to sign extension") That doesn't result in any changes in tooling (when built on x86), only addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h' diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/all/YwOMCCc4E79FuvDe@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-27perf python: Fix build when PYTHON_CONFIG is user suppliedJames Clark
The previous change to Python autodetection had a small mistake where the auto value was used to determine the Python binary, rather than the user supplied value. The Python binary is only used for one part of the build process, rather than the final linking, so it was producing correct builds in most scenarios, especially when the auto detected value matched what the user wanted, or the system only had a valid set of Pythons. Change it so that the Python binary path is derived from either the PYTHON_CONFIG value or PYTHON value, depending on what is specified by the user. This was the original intention. This error was spotted in a build failure an odd cross compilation environment after commit 4c41cb46a732fe82 ("perf python: Prefer python3") was merged. Fixes: 630af16eee495f58 ("perf tools: Use Python devtools for version autodetection rather than runtime") Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220728093946.1337642-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-26bpf: Fix a few typos in BPF helpers documentationQuentin Monnet
Address a few typos in the documentation for the BPF helper functions. They were reported by Jakub [0], who ran spell checkers on the generated man page [1]. [0] https://lore.kernel.org/linux-man/d22dcd47-023c-8f52-d369-7b5308e6c842@gmail.com/T/#mb02e7d4b7fb61d98fa914c77b581184e9a9537af [1] https://lore.kernel.org/linux-man/eb6a1e41-c48e-ac45-5154-ac57a2c76108@gmail.com/T/#m4a8d1b003616928013ffcd1450437309ab652f9f v3: Do not copy unrelated (and breaking) elements to tools/ header v2: Turn a ',' into a ';' Reported-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220825220806.107143-1-quentin@isovalent.com
2022-08-26selftests/bpf: Declare subprog_noise as static in tailcall_bpf2bpf4James Hilliard
Due to bpf_map_lookup_elem being declared static we need to also declare subprog_noise as static. Fixes the following error: progs/tailcall_bpf2bpf4.c:26:9: error: 'bpf_map_lookup_elem' is static but used in inline function 'subprog_noise' which is not static [-Werror] 26 | bpf_map_lookup_elem(&nop_table, &key); | ^~~~~~~~~~~~~~~~~~~ Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20220826035141.737919-1-james.hilliard1@gmail.com
2022-08-26selftests/bpf: fix type conflict in test_tc_dtimeJames Hilliard
The sys/socket.h header isn't required to build test_tc_dtime and may cause a type conflict. Fixes the following error: In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155, from /usr/include/x86_64-linux-gnu/bits/socket.h:29, from /usr/include/x86_64-linux-gnu/sys/socket.h:33, from progs/test_tc_dtime.c:18: /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: error: conflicting types for 'int8_t'; have '__int8_t' {aka 'signed char'} 24 | typedef __int8_t int8_t; | ^~~~~~ In file included from progs/test_tc_dtime.c:5: /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'char'} 34 | typedef __INT8_TYPE__ int8_t; | ^~~~~~ /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: error: conflicting types for 'int64_t'; have '__int64_t' {aka 'long long int'} 27 | typedef __int64_t int64_t; | ^~~~~~~ /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long int'} 43 | typedef __INT64_TYPE__ int64_t; | ^~~~~~~ make: *** [Makefile:537: /home/buildroot/bpf-next/tools/testing/selftests/bpf/bpf_gcc/test_tc_dtime.o] Error 1 Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Link: https://lore.kernel.org/r/20220826050703.869571-1-james.hilliard1@gmail.com Signed-off-by: Martin KaFai Lau <kafai@fb.com>
2022-08-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel borkmann says: ==================== The following pull-request contains BPF updates for your *net* tree. We've added 11 non-merge commits during the last 14 day(s) which contain a total of 13 files changed, 61 insertions(+), 24 deletions(-). The main changes are: 1) Fix BPF verifier's precision tracking around BPF ring buffer, from Kumar Kartikeya Dwivedi. 2) Fix regression in tunnel key infra when passing FLOWI_FLAG_ANYSRC, from Eyal Birger. 3) Fix insufficient permissions for bpf_sys_bpf() helper, from YiFei Zhu. 4) Fix splat from hitting BUG when purging effective cgroup programs, from Pu Lehui. 5) Fix range tracking for array poke descriptors, from Daniel Borkmann. 6) Fix corrupted packets for XDP_SHARED_UMEM in aligned mode, from Magnus Karlsson. 7) Fix NULL pointer splat in BPF sockmap sk_msg_recvmsg(), from Liu Jian. 8) Add READ_ONCE() to bpf_jit_limit when reading from sysctl, from Kuniyuki Iwashima. 9) Add BPF selftest lru_bug check to s390x deny list, from Daniel Müller. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-25libbpf: add map_get_fd_by_id and map_delete_elem in light skeletonBenjamin Tissoires
This allows to have a better control over maps from the kernel when preloading eBPF programs. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220824134055.1328882-8-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-26selftests/powerpc: Add a test for execute-only memoryNicholas Miehlbradt
This selftest is designed to cover execute-only protections on the Radix MMU but will also work with Hash. The tests are based on those found in pkey_exec_test with modifications to use the generic mprotect() instead of the pkey variants. Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220817050640.406017-2-ruscur@russell.cc
2022-08-25bpf: Add CGROUP prefix to cgroup_iter_orderHao Luo
bpf_cgroup_iter_order is globally visible but the entries do not have CGROUP prefix. As requested by Andrii, put a CGROUP in the names in bpf_cgroup_iter_order. This patch fixes two previous commits: one introduced the API and the other uses the API in bpf selftest (that is, the selftest cgroup_hierarchical_stats). I tested this patch via the following command: test_progs -t cgroup,iter,btf_dump Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter") Fixes: 88886309d2e8 ("selftests/bpf: add a selftest for cgroup hierarchical stats collection") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220825223936.1865810-1-haoluo@google.com Signed-off-by: Martin KaFai Lau <kafai@fb.com>
2022-08-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()") c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer") https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/ https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25Merge tag 'net-6.0-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from ipsec and netfilter (with one broken Fixes tag). Current release - new code bugs: - dsa: don't dereference NULL extack in dsa_slave_changeupper() - dpaa: fix <1G ethernet on LS1046ARDB - neigh: don't call kfree_skb() under spin_lock_irqsave() Previous releases - regressions: - r8152: fix the RX FIFO settings when suspending - dsa: microchip: keep compatibility with device tree blobs with no phy-mode - Revert "net: macsec: update SCI upon MAC address change." - Revert "xfrm: update SA curlft.use_time", comply with RFC 2367 Previous releases - always broken: - netfilter: conntrack: work around exceeded TCP receive window - ipsec: fix a null pointer dereference of dst->dev on a metadata dst in xfrm_lookup_with_ifid - moxa: get rid of asymmetry in DMA mapping/unmapping - dsa: microchip: make learning configurable and keep it off while standalone - ice: xsk: prohibit usage of non-balanced queue id - rxrpc: fix locking in rxrpc's sendmsg Misc: - another chunk of sysctl data race silencing" * tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) net: lantiq_xrx200: restore buffer if memory allocation failed net: lantiq_xrx200: fix lock under memory pressure net: lantiq_xrx200: confirm skb is allocated before using net: stmmac: work around sporadic tx issue on link-up ionic: VF initial random MAC address if no assigned mac ionic: fix up issues with handling EAGAIN on FW cmds ionic: clear broken state on generation change rxrpc: Fix locking in rxrpc's sendmsg net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2 MAINTAINERS: rectify file entry in BONDING DRIVER i40e: Fix incorrect address type for IPv6 flow rules ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter net: Fix a data-race around sysctl_somaxconn. net: Fix a data-race around netdev_unregister_timeout_secs. net: Fix a data-race around gro_normal_batch. net: Fix data-races around sysctl_devconf_inherit_init_net. net: Fix data-races around sysctl_fb_tunnels_only_for_init_net. net: Fix a data-race around netdev_budget_usecs. net: Fix data-races around sysctl_max_skb_frags. net: Fix a data-race around netdev_budget. ...
2022-08-25selftests/net: fix reinitialization of TEST_PROGS in net self tests.Adel Abouchaev
Assinging will drop all previous tests. Fixes: b690842d12fd ("selftests/net: test l2 tunnel TOS/TTL inheriting") Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20220824184351.3759862-1-adel.abushaev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25selftests/bpf: Add regression test for pruning fixKumar Kartikeya Dwivedi
Add a test to ensure we do mark_chain_precision for the argument type ARG_CONST_ALLOC_SIZE_OR_ZERO. For other argument types, this was already done, but propagation for missing for this case. Without the fix, this test case loads successfully. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220823185500.467-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25bpftool: Fix a wrong type cast in btf_dumper_intLam Thai
When `data` points to a boolean value, casting it to `int *` is problematic and could lead to a wrong value being passed to `jsonw_bool`. Change the cast to `bool *` instead. Fixes: b12d6ec09730 ("bpf: btf: add btf print functionality") Signed-off-by: Lam Thai <lamthai@arista.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220824225859.9038-1-lamthai@arista.com
2022-08-25selftests/bpf: add a selftest for cgroup hierarchical stats collectionYosry Ahmed
Add a selftest that tests the whole workflow for collecting, aggregating (flushing), and displaying cgroup hierarchical stats. TL;DR: - Userspace program creates a cgroup hierarchy and induces memcg reclaim in parts of it. - Whenever reclaim happens, vmscan_start and vmscan_end update per-cgroup percpu readings, and tell rstat which (cgroup, cpu) pairs have updates. - When userspace tries to read the stats, vmscan_dump calls rstat to flush the stats, and outputs the stats in text format to userspace (similar to cgroupfs stats). - rstat calls vmscan_flush once for every (cgroup, cpu) pair that has updates, vmscan_flush aggregates cpu readings and propagates updates to parents. - Userspace program makes sure the stats are aggregated and read correctly. Detailed explanation: - The test loads tracing bpf programs, vmscan_start and vmscan_end, to measure the latency of cgroup reclaim. Per-cgroup readings are stored in percpu maps for efficiency. When a cgroup reading is updated on a cpu, cgroup_rstat_updated(cgroup, cpu) is called to add the cgroup to the rstat updated tree on that cpu. - A cgroup_iter program, vmscan_dump, is loaded and pinned to a file, for each cgroup. Reading this file invokes the program, which calls cgroup_rstat_flush(cgroup) to ask rstat to propagate the updates for all cpus and cgroups that have updates in this cgroup's subtree. Afterwards, the stats are exposed to the user. vmscan_dump returns 1 to terminate iteration early, so that we only expose stats for one cgroup per read. - An ftrace program, vmscan_flush, is also loaded and attached to bpf_rstat_flush. When rstat flushing is ongoing, vmscan_flush is invoked once for each (cgroup, cpu) pair that has updates. cgroups are popped from the rstat tree in a bottom-up fashion, so calls will always be made for cgroups that have updates before their parents. The program aggregates percpu readings to a total per-cgroup reading, and also propagates them to the parent cgroup. After rstat flushing is over, all cgroups will have correct updated hierarchical readings (including all cpus and all their descendants). - Finally, the test creates a cgroup hierarchy and induces memcg reclaim in parts of it, and makes sure that the stats collection, aggregation, and reading workflow works as expected. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220824233117.1312810-6-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25selftests/bpf: extend cgroup helpersYosry Ahmed
This patch extends bpf selft cgroup_helpers [ID] n various ways: - Add enable_controllers() that allows tests to enable all or a subset of controllers for a specific cgroup. - Add join_cgroup_parent(). The cgroup workdir is based on the pid, therefore a spawned child cannot join the same cgroup hierarchy of the test through join_cgroup(). join_cgroup_parent() is used in child processes to join a cgroup under the parent's workdir. - Add write_cgroup_file() and write_cgroup_file_parent() (similar to join_cgroup_parent() above). - Add get_root_cgroup() for tests that need to do checks on root cgroup. - Distinguish relative and absolute cgroup paths in function arguments. Now relative paths are called relative_path, and absolute paths are called cgroup_path. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220824233117.1312810-5-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25selftests/bpf: Test cgroup_iter.Hao Luo
Add a selftest for cgroup_iter. The selftest creates a mini cgroup tree of the following structure: ROOT (working cgroup) | PARENT / \ CHILD1 CHILD2 and tests the following scenarios: - invalid cgroup fd. - pre-order walk over descendants from PARENT. - post-order walk over descendants from PARENT. - walk of ancestors from PARENT. - process only a single object (i.e. PARENT). - early termination. Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220824233117.1312810-3-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25bpf: Introduce cgroup iterHao Luo
Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: - walking a cgroup's descendants in pre-order. - walking a cgroup's descendants in post-order. - walking a cgroup's ancestors. - process only the given cgroup. When attaching cgroup_iter, one can set a cgroup to the iter_link created from attaching. This cgroup is passed as a file descriptor or cgroup id and serves as the starting point of the walk. If no cgroup is specified, the starting point will be the root cgroup v2. For walking descendants, one can specify the order: either pre-order or post-order. For walking ancestors, the walk starts at the specified cgroup and ends at the root. One can also terminate the walk early by returning 1 from the iter program. Note that because walking cgroup hierarchy holds cgroup_mutex, the iter program is called with cgroup_mutex held. Currently only one session is supported, which means, depending on the volume of data bpf program intends to send to user space, the number of cgroups that can be walked is limited. For example, given the current buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can be walked is 512. This is a limitation of cgroup_iter. If the output data is larger than the kernel buffer size, after all data in the kernel buffer is consumed by user space, the subsequent read() syscall will signal EOPNOTSUPP. In order to work around, the user may have to update their program to reduce the volume of data sent to output. For example, skip some uninteresting cgroups. In future, we may extend bpf_iter flags to allow customizing buffer size. Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220824233117.1312810-2-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25x86/sev: Mark snp_abort() noreturnBorislav Petkov
Mark both the function prototype and definition as noreturn in order to prevent the compiler from doing transformations which confuse objtool like so: vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction This triggers with gcc-12. Add it and sev_es_terminate() to the objtool noreturn tracking array too. Sort it while at it. Suggested-by: Michael Matz <matz@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220824152420.20547-1-bp@alien8.de
2022-08-24selftests/net: Add sk_bind_sendto_listen and sk_connect_zero_addrJoanne Koong
This patch adds 2 new tests: sk_bind_sendto_listen and sk_connect_zero_addr. The sk_bind_sendto_listen test exercises the path where a socket's rcv saddr changes after it has been added to the binding tables, and then a listen() on the socket is invoked. The listen() should succeed. The sk_bind_sendto_listen test is copied over from one of syzbot's tests: https://syzkaller.appspot.com/x/repro.c?x=1673a38df00000 The sk_connect_zero_addr test exercises the path where the socket was never previously added to the binding tables and it gets assigned a saddr upon a connect() to address 0. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24selftests/net: Add test for timing a bind request to a port with a populated ↵Joanne Koong
bhash entry This test populates the bhash table for a given port with MAX_THREADS * MAX_CONNECTIONS sockets, and then times how long a bind request on the port takes. When populating the bhash table, we create the sockets and then bind the sockets to the same address and port (SO_REUSEADDR and SO_REUSEPORT are set). When timing how long a bind on the port takes, we bind on a different address without SO_REUSEPORT set. We do not set SO_REUSEPORT because we are interested in the case where the bind request does not go through the tb->fastreuseport path, which is fragile (eg tb->fastreuseport path does not work if binding with a different uid). To run the script: Usage: ./bind_bhash.sh [-6 | -4] [-p port] [-a address] 6: use ipv6 4: use ipv4 port: Port number address: ip address Without any arguments, ./bind_bhash.sh defaults to ipv6 using ip address "2001:0db8:0:f101::1" on port 443. On my local machine, I see: ipv4: before - 0.002317 seconds with bhash2 - 0.000020 seconds ipv6: before - 0.002431 seconds with bhash2 - 0.000021 seconds Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24selftests/bpf: Fix wrong size passed to bpf_setsockopt()Yang Yingliang
sizeof(new_cc) is not real memory size that new_cc points to; introduce a new_cc_len to store the size and then pass it to bpf_setsockopt(). Fixes: 31123c0360e0 ("selftests/bpf: bpf_setsockopt tests") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220824013907.380448-1-yangyingliang@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-24selftests/bpf: Add cb_refs test to s390x deny listDaniel Müller
The cb_refs BPF selftest is failing execution on s390x machines. This is a newly added test that requires a feature not presently supported on this architecture. Denylist the test for this architecture. Fixes: 3cf7e7d8685c ("selftests/bpf: Add tests for reference state fixes for callbacks") Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220824163906.1186832-1-deso@posteo.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-24selftests/bpf: Add tests for reference state fixes for callbacksKumar Kartikeya Dwivedi
These are regression tests to ensure we don't end up in invalid runtime state for helpers that execute callbacks multiple times. It exercises the fixes to verifier callback handling for reference state in previous patches. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220823013226.24988-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23selftests/bpf: Make sure bpf_{g,s}et_retval is exposed everywhereStanislav Fomichev
For each hook, have a simple bpf_set_retval(bpf_get_retval) program and make sure it loads for the hooks we want. The exceptions are the hooks which don't propagate the error to the callers: - sockops - recvmsg - getpeername - getsockname - cg_skb ingress and egress Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220823222555.523590-6-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf: update bpf_{g,s}et_retval documentationStanislav Fomichev
* replace 'syscall' with 'upper layers', still mention that it's being exported via syscall errno * describe what happens in set_retval(-EPERM) + return 1 * describe what happens with bind's 'return 3' Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220823222555.523590-5-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf, selftests: Test BPF_FLOW_DISSECTOR_CONTINUEShmulik Ladkani
The dissector program returns BPF_FLOW_DISSECTOR_CONTINUE (and avoids setting skb->flow_keys or last_dissection map) in case it encounters IP packets whose (outer) source address is 127.0.0.127. Additional test is added to prog_tests/flow_dissector.c which sets this address as test's pkk.iph.saddr, with the expected retval of BPF_FLOW_DISSECTOR_CONTINUE. Also, legacy test_flow_dissector.sh was similarly augmented. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-5-shmulik.ladkani@gmail.com
2022-08-23bpf, test_run: Propagate bpf_flow_dissect's retval to user's ↵Shmulik Ladkani
bpf_attr.test.retval Formerly, a boolean denoting whether bpf_flow_dissect returned BPF_OK was set into 'bpf_attr.test.retval'. Augment this, so users can check the actual return code of the dissector program under test. Existing prog_tests/flow_dissector*.c tests were correspondingly changed to check against each test's expected retval. Also, tests' resulting 'flow_keys' are verified only in case the expected retval is BPF_OK. This allows adding new tests that expect non BPF_OK. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-4-shmulik.ladkani@gmail.com
2022-08-23bpf, flow_dissector: Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode for bpf progsShmulik Ladkani
Currently, attaching BPF_PROG_TYPE_FLOW_DISSECTOR programs completely replaces the flow-dissector logic with custom dissection logic. This forces implementors to write programs that handle dissection for any flows expected in the namespace. It makes sense for flow-dissector BPF programs to just augment the dissector with custom logic (e.g. dissecting certain flows or custom protocols), while enjoying the broad capabilities of the standard dissector for any other traffic. Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode. Flow-dissector BPF programs may return this to indicate no dissection was made, and fallback to the standard dissector is requested. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-3-shmulik.ladkani@gmail.com