linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-03-23	perf test: Add a shell test for 'perf stat --bpf-counters' new option	Song Liu
	Add a test to compare the output of perf-stat with and without option --bpf-counters. If the difference is more than 10%, the test is considered as failed. Committer testing: # perf test bpf-counters 86: perf stat --bpf-counters test : Ok # perf test -v bpf-counters 86: perf stat --bpf-counters test : --- start --- test child forked, pid 2433339 test child finished with 0 ---- end ---- perf stat --bpf-counters test: Ok # Signed-off-by: Song Liu <songliubraving@fb.com> Requested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/EC00E37D-8587-4662-8E30-7AD5F874FA84@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-23	perf stat: Measure 't0' and 'ref_time' after enable_counters()	Song Liu
	Take measurements of 't0' and 'ref_time' after enable_counters(), so that they only measure the time consumed when the counters are enabled. Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Andi Kleen <andi@firstfloor.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: kernel-team@fb.com Link: http://lore.kernel.org/lkml/20210316211837.910506-3-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-23	perf stat: Introduce 'bperf' to share hardware PMCs with BPF	Song Liu
	The perf tool uses performance monitoring counters (PMCs) to monitor system performance. The PMCs are limited hardware resources. For example, Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu. Modern data center systems use these PMCs in many different ways: system level monitoring, (maybe nested) container level monitoring, per process monitoring, profiling (in sample mode), etc. In some cases, there are more active perf_events than available hardware PMCs. To allow all perf_events to have a chance to run, it is necessary to do expensive time multiplexing of events. On the other hand, many monitoring tools count the common metrics (cycles, instructions). It is a waste to have multiple tools create multiple perf_events of "cycles" and occupy multiple PMCs. bperf tries to reduce such wastes by allowing multiple perf_events of "cycles" or "instructions" (at different scopes) to share PMUs. Instead of having each perf-stat session to read its own perf_events, bperf uses BPF programs to read the perf_events and aggregate readings to BPF maps. Then, the perf-stat session(s) reads the values from these BPF maps. Please refer to the comment before the definition of bperf_ops for the description of bperf architecture. bperf is off by default. To enable it, pass --bpf-counters option to perf-stat. bperf uses a BPF hashmap to share information about BPF programs and maps used by bperf. This map is pinned to bpffs. The default path is /sys/fs/bpf/perf_attr_map. The user could change the path with option --bpf-attr-map. Committer testing: # dmesg\|grep "Performance Events" -A5 [ 0.225277] Performance Events: Fam17h+ core perfctr, AMD PMU driver. [ 0.225280] ... version: 0 [ 0.225280] ... bit width: 48 [ 0.225281] ... generic registers: 6 [ 0.225281] ... value mask: 0000ffffffffffff [ 0.225281] ... max period: 00007fffffffffff # # for a in $(seq 6) ; do perf stat -a -e cycles,instructions sleep 100000 & done [1] 2436231 [2] 2436232 [3] 2436233 [4] 2436234 [5] 2436235 [6] 2436236 # perf stat -a -e cycles,instructions sleep 0.1 Performance counter stats for 'system wide': 310,326,987 cycles (41.87%) 236,143,290 instructions # 0.76 insn per cycle (41.87%) 0.100800885 seconds time elapsed # We can see that the counters were enabled for this workload 41.87% of the time. Now with --bpf-counters: # for a in $(seq 32) ; do perf stat --bpf-counters -a -e cycles,instructions sleep 100000 & done [1] 2436514 [2] 2436515 [3] 2436516 [4] 2436517 [5] 2436518 [6] 2436519 [7] 2436520 [8] 2436521 [9] 2436522 [10] 2436523 [11] 2436524 [12] 2436525 [13] 2436526 [14] 2436527 [15] 2436528 [16] 2436529 [17] 2436530 [18] 2436531 [19] 2436532 [20] 2436533 [21] 2436534 [22] 2436535 [23] 2436536 [24] 2436537 [25] 2436538 [26] 2436539 [27] 2436540 [28] 2436541 [29] 2436542 [30] 2436543 [31] 2436544 [32] 2436545 # # ls -la /sys/fs/bpf/perf_attr_map -rw-------. 1 root root 0 Mar 23 14:53 /sys/fs/bpf/perf_attr_map # bpftool map \| grep bperf \| wc -l 64 # # bpftool map \| tail 1265: percpu_array name accum_readings flags 0x0 key 4B value 24B max_entries 1 memlock 4096B 1266: hash name filter flags 0x0 key 4B value 4B max_entries 1 memlock 4096B 1267: array name bperf_fo.bss flags 0x400 key 4B value 8B max_entries 1 memlock 4096B btf_id 996 pids perf(2436545) 1268: percpu_array name accum_readings flags 0x0 key 4B value 24B max_entries 1 memlock 4096B 1269: hash name filter flags 0x0 key 4B value 4B max_entries 1 memlock 4096B 1270: array name bperf_fo.bss flags 0x400 key 4B value 8B max_entries 1 memlock 4096B btf_id 997 pids perf(2436541) 1285: array name pid_iter.rodata flags 0x480 key 4B value 4B max_entries 1 memlock 4096B btf_id 1017 frozen pids bpftool(2437504) 1286: array flags 0x0 key 4B value 32B max_entries 1 memlock 4096B # # bpftool map dump id 1268 \| tail value (CPU 21): 8f f3 bc ca 00 00 00 00 80 fd 2a d1 4d 00 00 00 80 fd 2a d1 4d 00 00 00 value (CPU 22): 7e d5 64 4d 00 00 00 00 a4 8a 2e ee 4d 00 00 00 a4 8a 2e ee 4d 00 00 00 value (CPU 23): a7 78 3e 06 01 00 00 00 b2 34 94 f6 4d 00 00 00 b2 34 94 f6 4d 00 00 00 Found 1 element # bpftool map dump id 1268 \| tail value (CPU 21): c6 8b d9 ca 00 00 00 00 20 c6 fc 83 4e 00 00 00 20 c6 fc 83 4e 00 00 00 value (CPU 22): 9c b4 d2 4d 00 00 00 00 3e 0c df 89 4e 00 00 00 3e 0c df 89 4e 00 00 00 value (CPU 23): 18 43 66 06 01 00 00 00 5b 69 ed 83 4e 00 00 00 5b 69 ed 83 4e 00 00 00 Found 1 element # bpftool map dump id 1268 \| tail value (CPU 21): f2 6e db ca 00 00 00 00 92 67 4c ba 4e 00 00 00 92 67 4c ba 4e 00 00 00 value (CPU 22): dc 8e e1 4d 00 00 00 00 d9 32 7a c5 4e 00 00 00 d9 32 7a c5 4e 00 00 00 value (CPU 23): bd 2b 73 06 01 00 00 00 7c 73 87 bf 4e 00 00 00 7c 73 87 bf 4e 00 00 00 Found 1 element # # perf stat --bpf-counters -a -e cycles,instructions sleep 0.1 Performance counter stats for 'system wide': 119,410,122 cycles 152,105,479 instructions # 1.27 insn per cycle 0.101395093 seconds time elapsed # See? We had the counters enabled all the time. Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: kernel-team@fb.com Link: http://lore.kernel.org/lkml/20210316211837.910506-2-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-23	perf tools: Fix various typos in comments	Ingo Molnar
	Fix ~124 single-word typos and a few spelling errors in the perf tooling code, accumulated over the years. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-23	Merge tag 'linux-kselftest-kunit-fixes-5.12-rc5.1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit fixes from Shuah Khan: "Two fixes to the kunit tool from David Gow" * tag 'linux-kselftest-kunit-fixes-5.12-rc5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: tool: Disable PAGE_POISONING under --alltests kunit: tool: Fix a python tuple typing error
2021-03-23	kselftest/arm64: mte: ksm_options: Fix fscanf warning	Andre Przywara
	Out of the box Ubuntu's 20.04 compiler warns about missing return value checks for fscanf() calls. Make GCC happy by checking whether we actually parsed one integer. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Mark Brown <broone@kernel.org> Link: https://lore.kernel.org/r/20210319165334.29213-4-andre.przywara@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-03-23	kselftest/arm64: mte: Fix pthread linking	Andre Przywara
	The GCC manual suggests to use -pthread, when linking with the PThread library, also to add this switch to both the compilation and linking stages. Do as the manual says, to fix compilation with Ubuntu's 20.04 toolchain, which was getting -lpthread too early on the command line: ------------ /usr/bin/ld: /tmp/cc5zbo2A.o: in function `execute_test': tools/testing/selftests/arm64/mte/check_gcr_el1_cswitch.c:86: undefined reference to `pthread_create' /usr/bin/ld: tools/testing/selftests/arm64/mte/check_gcr_el1_cswitch.c:90: undefined reference to `pthread_join' ------------ Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Mark Brown <broone@kernel.org> Link: https://lore.kernel.org/r/20210319165334.29213-3-andre.przywara@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-03-23	kselftest/arm64: mte: Fix compilation with native compiler	Andre Przywara
	The mte selftest Makefile contains a check for GCC, to add the memtag -march flag to the compiler options. This check fails if the compiler is not explicitly specified, so reverts to the standard "cc", in which case --version doesn't mention the "gcc" string we match against: $ cc --version \| head -n 1 cc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 This will not add the -march switch to the command line, so compilation fails: mte_helper.S: Assembler messages: mte_helper.S:25: Error: selected processor does not support `irg x0,x0,xzr' mte_helper.S:38: Error: selected processor does not support `gmi x1,x0,xzr' ... Actually clang accepts the same -march option as well, so we can just drop this check and add this unconditionally to the command line, to avoid any future issues with this check altogether (gcc actually prints basename(argv[0]) when called with --version). Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Mark Brown <broone@kernel.org> Link: https://lore.kernel.org/r/20210319165334.29213-2-andre.przywara@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-03-23	firmware_loader: Remove unnecessary conversion to bool	Jiapeng Chong
	Fix the following coccicheck warnings: ./tools/testing/selftests/firmware/fw_namespace.c:98:54-59: WARNING: conversion to bool not needed here. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/1613639529-41139-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-22	libbpf: Skip BTF fixup if object file has no BTF	Andrii Nakryiko
	Skip BTF fixup step when input object file is missing BTF altogether. Fixes: 8fd27bf69b86 ("libbpf: Add BPF static linker BTF and BTF.ext support") Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/bpf/20210319205909.1748642-3-andrii@kernel.org
2021-03-22	timekeeping, clocksource: Fix various typos in comments	Ingo Molnar
	Fix ~56 single-word typos in timekeeping & clocksource code comments. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-kernel@vger.kernel.org
2021-03-22	torture: Fix kvm.sh --datestamp regex check	Paul E. McKenney
	Some versions of grep are happy to interpret a nonsensically placed "-" within a "[]" pattern as a dash, while others give an error message. This commit therefore places the "-" at the end of the expression where it was supposed to be in the first place. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Consolidate qemu-cmd duration editing into kvm-transform.sh	Paul E. McKenney
	Currently, kvm-again.sh updates the duration in the "seconds=" comment in the qemu-cmd file, but kvm-transform.sh updates the duration in the actual qemu command arguments. This is an accident waiting to happen. This commit therefore consolidates these updates into kvm-transform.sh. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Print proper vmlinux path for kvm-again.sh runs	Paul E. McKenney
	The kvm-again.sh script does not copy over the vmlinux files due to their large size. This means that a gdb run must use the vmlinux file from the original "res" directory. This commit therefore finds that directory and prints it out so that the user can copy and pasted the gdb command just as for the initial run. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Make TORTURE_TRUST_MAKE available in kvm-again.sh environment	Paul E. McKenney
	Because the TORTURE_TRUST_MAKE environment variable is not recorded, kvm-again.sh runs can result in the parse-build.sh script emitting false-positive "BUG: TREE03 no build" messages. These messages are intended to complain about any lack of compiler invocations when the --trust-make flag is not given to kvm.sh. However, when this flag is given to kvm.sh (and thus when TORTURE_TRUST_MAKE=y), lack of compiler invocations is expected behavior when rebuilding from identical source code. This commit therefore makes kvm-test-1-run.sh record the value of the TORTURE_TRUST_MAKE environment variable as an additional comment in the qemu-cmd file, and also makes kvm-again.sh reconstitute that variable from that comment. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Make kvm-transform.sh update jitter commands	Paul E. McKenney
	When rerunning an old run using kvm-again.sh, the jitter commands will re-use the original "res" directory. This works, but is clearly an accident waiting to happen. And this accident will happen with remote runs, where the original directory lives on some other system. This commit therefore updates the qemu-cmd commands to use the new res directory created for this specific run. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Add --duration argument to kvm-again.sh	Paul E. McKenney
	This commit adds a --duration argument to kvm-again.sh to allow the user to override the --duration specified for the original kvm.sh run. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Add kvm-again.sh to rerun a previous torture-test	Paul E. McKenney
	This commit adds a kvm-again.sh script that, given the results directory of a torture-test run, re-runs that test. This means that the kernels need not be rebuilt, but it also is a step towards running torture tests on remote systems. This commit also adds a kvm-test-1-run-batch.sh script that runs one batch out of the torture test. The idea is to copy a results directory tree to remote systems, then use kvm-test-1-run-batch.sh to run batches on these systems. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Create a "batches" file for build reuse	Paul E. McKenney
	This commit creates a "batches" file in the res/$ds directory, where $ds is the datestamp. This file contains the batches and the number of CPUs, for example: 1 TREE03 16 1 SRCU-P 8 2 TREE07 16 2 TREE01 8 3 TREE02 8 3 TREE04 8 3 TREE05 8 4 SRCU-N 4 4 TRACE01 4 4 TRACE02 4 4 RUDE01 2 4 RUDE01.2 2 4 TASKS01 2 4 TASKS03 2 4 SRCU-t 1 4 SRCU-u 1 4 TASKS02 1 4 TINY01 1 5 TINY02 1 5 TREE09 1 The first column is the batch number, the second the scenario number (possibly suffixed by a repetition number, as in "RUDE01.2"), and the third is the number of CPUs required by that scenario. The last line shows the number of CPUs expected by this batch file, which allows the run to be re-batched if a different number of CPUs is available. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: De-capitalize TORTURE_SUITE	Paul E. McKenney
	Although it might be unlikely that someone would name a scenario "TORTURE_SUITE", they are within their rights to do so. This script therefore renames the "TORTURE_SUITE" file in the top-level date-stamped directory within "res" to "torture_suite" to avoid this name collision. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Make upper-case-only no-dot no-slash scenario names official	Paul E. McKenney
	This commit enforces the defacto restriction on scenario names, which is that they contain neither "/", ".", nor lowercase alphabetic characters. This restriction avoids collisions between scenario names and the torture scripting's files and directories. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Rename SRCU-t and SRCU-u to avoid lowercase characters	Paul E. McKenney
	The convention that scenario names are all uppercase has two exceptions, SRCU-t and SRCU-u. This commit therefore renames them to SRCU-T and SRCU-U, respectively, to bring them in line with this convention. This in turn permits tighter argument checking in the torture-test scripting. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Remove no-mpstat error message	Paul E. McKenney
	The cpus2use.sh script complains if the mpstat command is not available, and instead uses all available CPUs. Unfortunately, this complaint goes to stdout, where it confuses invokers who expect a single number. This commit removes this error message in order to avoid this confusion. The tendency of late has been to give rcutorture a full system, so this should not cause issues. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Record kvm-test-1-run.sh and kvm-test-1-run-qemu.sh PIDs	Paul E. McKenney
	This commit records the process IDs of the kvm-test-1-run.sh and kvm-test-1-run-qemu.sh scripts to ease monitoring of remotely running instances of these scripts. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Record jitter start/stop commands	Paul E. McKenney
	Distributed runs of rcutorture will need to start and stop jittering on the remote hosts, which means that the commands must be communicated to those hosts. The commit therefore causes kvm.sh to place these commands in new TORTURE_JITTER_START and TORTURE_JITTER_STOP environment variables to communicate them to the scripts that will set this up. In addition, this commit causes kvm-test-1-run.sh to append these commands to each generated qemu-cmd file, which allows any remotely executing script to extract the needed commands from this file. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Extract kvm-test-1-run-qemu.sh from kvm-test-1-run.sh	Paul E. McKenney
	Currently, kvm-test-1-run.sh both builds and runs an rcutorture kernel, which is inconvenient when it is necessary to re-run an old run or to carry out a run on a remote system. This commit therefore extracts the portion of kvm-test-1-run.sh that invoke qemu to actually run rcutorture and places it in kvm-test-1-run-qemu.sh. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Record TORTURE_KCONFIG_GDB_ARG in qemu-cmd	Paul E. McKenney
	When re-running old rcutorture builds, if the original run involved gdb, the re-run also needs to do so. This commit therefore records the TORTURE_KCONFIG_GDB_ARG environment variable into the qemu-cmd file so that the re-run can access it. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	torture: Abstract jitter.sh start/stop into scripts	Paul E. McKenney
	This commit creates jitterstart.sh and jitterstop.sh scripts that handle the starting and stopping of the jitter.sh scripts. These must be sourced using the bash "." command to allow the generated script to wait on the backgrounded jitter.sh scripts. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22	pm-graph: Fix typo "accesible"	Ricardo Ribalda
	Trivial fix. Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-22	kselftest/arm64: sve: Do not use non-canonical FFR register value	Andre Przywara
	The "First Fault Register" (FFR) is an SVE register that mimics a predicate register, but clears bits when a load or store fails to handle an element of a vector. The supposed usage scenario is to initialise this register (using SETFFR), then read it later on to learn about elements that failed to load or store. Explicit writes to this register using the WRFFR instruction are only supposed to restore values previously read from the register (for context-switching only). As the manual describes, this register holds only certain values, it: "... contains a monotonic predicate value, in which starting from bit 0 there are zero or more 1 bits, followed only by 0 bits in any remaining bit positions." Any other value is UNPREDICTABLE and is not supposed to be "restored" into the register. The SVE test currently tries to write a signature pattern into the register, which is not a canonical FFR value. Apparently the existing setups treat UNPREDICTABLE as "read-as-written", but a new implementation actually only stores canonical values. As a consequence, the sve-test fails immediately when comparing the FFR value: ----------- # ./sve-test Vector length: 128 bits PID: 207 Mismatch: PID=207, iteration=0, reg=48 Expected [cf00] Got [0f00] Aborted ----------- Fix this by only populating the FFR with proper canonical values. Effectively the requirement described above limits us to 17 unique values over 16 bits worth of FFR, so we condense our signature down to 4 bits (2 bits from the PID, 2 bits from the generation) and generate the canonical pattern from it. Any bits describing elements above the minimum 128 bit are set to 0. This aligns the FFR usage to the architecture and fixes the test on microarchitectures implementing FFR in a more restricted way. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviwed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210319120128.29452-1-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-03-21	Merge branch 'linus' into x86/cleanups, to resolve conflict	Ingo Molnar
	Conflicts: arch/x86/kernel/kprobes/ftrace.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-20	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	David S. Miller
	Alexei Starovoitov says: ==================== pull-request: bpf 2021-03-20 The following pull-request contains BPF updates for your net tree. We've added 5 non-merge commits during the last 3 day(s) which contain a total of 8 files changed, 155 insertions(+), 12 deletions(-). The main changes are: 1) Use correct nops in fexit trampoline, from Stanislav. 2) Fix BTF dump, from Jean-Philippe. 3) Fix umd memory leak, from Zqiang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-19	selftests/bpf: Add selftest for pointer-to-array-of-struct BTF dump	Jean-Philippe Brucker
	Bpftool used to issue forward declarations for a struct used as part of a pointer to array, which is invalid. Add a test to check that the struct is fully defined in this case: @@ -134,9 +134,9 @@ }; }; -struct struct_in_array {}; +struct struct_in_array; -struct struct_in_array_typed {}; +struct struct_in_array_typed; typedef struct struct_in_array_typed struct_in_array_t[2]; @@ -189,3 +189,7 @@ struct struct_with_embedded_stuff _14; }; +struct struct_in_array {}; + +struct struct_in_array_typed {}; + ... #13/1 btf_dump: syntax:FAIL Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210319112554.794552-3-jean-philippe@linaro.org
2021-03-19	libbpf: Fix BTF dump of pointer-to-array-of-struct	Jean-Philippe Brucker
	The vmlinux.h generated from BTF is invalid when building drivers/phy/ti/phy-gmii-sel.c with clang: vmlinux.h:61702:27: error: array type has incomplete element type ‘struct reg_field’ 61702 \| const struct reg_field (regfields)[3]; \| ^~~~~~~~~ bpftool generates a forward declaration for this struct regfield, which compilers aren't happy about. Here's a simplified reproducer: struct inner { int val; }; struct outer { struct inner (ptr_to_array)[2]; } A; After build with clang -> bpftool btf dump c -> clang/gcc: ./def-clang.h:11:23: error: array has incomplete element type 'struct inner' struct inner (ptr_to_array)[2]; Member ptr_to_array of struct outer is a pointer to an array of struct inner. In the DWARF generated by clang, struct outer appears before struct inner, so when converting BTF of struct outer into C, bpftool issues a forward declaration to struct inner. With GCC the DWARF info is reversed so struct inner gets fully defined. That forward declaration is not sufficient when compilers handle an array of the struct, even when it's only used through a pointer. Note that we can trigger the same issue with an intermediate typedef: struct inner { int val; }; typedef struct inner inner2_t[2]; struct outer { inner2_t ptr_to_array; } A; Becomes: struct inner; typedef struct inner inner2_t[2]; And causes: ./def-clang.h:10:30: error: array has incomplete element type 'struct inner' typedef struct inner inner2_t[2]; To fix this, clear through_ptr whenever we encounter an intermediate array, to make the inner struct part of a strong link and force full declaration. Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210319112554.794552-2-jean-philippe@linaro.org
2021-03-19	libbpf: Add explicit padding to btf_dump_emit_type_decl_opts	KP Singh
	Similar to https://lore.kernel.org/bpf/20210313210920.1959628-2-andrii@kernel.org/ When DECLARE_LIBBPF_OPTS is used with inline field initialization, e.g: DECLARE_LIBBPF_OPTS(btf_dump_emit_type_decl_opts, opts, .field_name = var_ident, .indent_level = 2, .strip_mods = strip_mods, ); and compiled in debug mode, the compiler generates code which leaves the padding uninitialized and triggers errors within libbpf APIs which require strict zero initialization of OPTS structs. Adding anonymous padding field fixes the issue. Fixes: 9f81654eebe8 ("libbpf: Expose BTF-to-C type declaration emitting API") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210319192117.2310658-1-kpsingh@kernel.org
2021-03-19	selftests: forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value	Hangbin Liu
	The ECN bit defines ECT(1) = 1, ECT(0) = 2. So inner 0x02 + outer 0x01 should be inner ECT(0) + outer ECT(1). Based on the description of __INET_ECN_decapsulate, the final decapsulate value should be ECT(1). So fix the test expect value to 0x01. Before the fix: TEST: VXLAN: ECN decap: 01/02->0x02 [FAIL] Expected to capture 10 packets, got 0. After the fix: TEST: VXLAN: ECN decap: 01/02->0x01 [ OK ] Fixes: a0b61f3d8ebf ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-19	selftests/sgx: Improve error detection and messages	Dave Hansen
	The SGX device file (/dev/sgx_enclave) is unusual in that it requires execute permissions. It has to be both "chmod +x" and be on a filesystem without 'noexec'. In the future, udev and systemd should get updates to set up systems automatically. But, for now, nobody's systems do this automatically, and everybody gets error messages like this when running ./test_sgx: 0x0000000000000000 0x0000000000002000 0x03 0x0000000000002000 0x0000000000001000 0x05 0x0000000000003000 0x0000000000003000 0x03 mmap() failed, errno=1. That isn't very user friendly, even for forgetful kernel developers. Further, the test case is rather haphazard about its use of fprintf() versus perror(). Improve the error messages. Use perror() where possible. Lastly, do some sanity checks on opening and mmap()ing the device file so that we can get a decent error message out to the user. Now, if your user doesn't have permission, you'll get the following: $ ls -l /dev/sgx_enclave crw------- 1 root root 10, 126 Mar 18 11:29 /dev/sgx_enclave $ ./test_sgx Unable to open /dev/sgx_enclave: Permission denied If you then 'chown dave:dave /dev/sgx_enclave' (or whatever), but you leave execute permissions off, you'll get: $ ls -l /dev/sgx_enclave crw------- 1 dave dave 10, 126 Mar 18 11:29 /dev/sgx_enclave $ ./test_sgx no execute permissions on device file If you fix that with "chmod ug+x /dev/sgx" but you leave /dev as noexec, you'll get this: $ mount \| grep "/dev .noexec" udev on /dev type devtmpfs (rw,nosuid,noexec,...) $ ./test_sgx ERROR: mmap for exec: Operation not permitted mmap() succeeded for PROT_READ, but failed for PROT_EXEC check that user has execute permissions on /dev/sgx_enclave and that /dev does not have noexec set: 'mount \| grep "/dev .noexec"' That can be fixed with: mount -o remount,noexec /devESC Hopefully, the combination of better error messages and the search engines indexing this message will help people fix their systems until we do this properly. [ bp: Improve error messages more. ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lore.kernel.org/r/20210318194301.11D9A984@viggo.jf.intel.com
2021-03-18	selftests: net: forwarding: Fix a typo	Bhaskar Chowdhury
	s/verfied/verified/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18	selftests/bpf: Add multi-file statically linked BPF object file test	Andrii Nakryiko
	Add Makefile infra to specify multi-file BPF object files (and derivative skeletons). Add first selftest validating BPF static linker can merge together successfully two independent BPF object files and resulting object and skeleton are correct and usable. Use the same F(F(F(X))) = F(F(X)) identity test on linked object files as for the case of single BPF object files. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-13-andrii@kernel.org
2021-03-18	selftests/bpf: Pass all BPF .o's through BPF static linker	Andrii Nakryiko
	Pass all individual BPF object files (generated from progs/*.c) through `bpftool gen object` command to validate that BPF static linker doesn't corrupt them. As an additional sanity checks, validate that passing resulting object files through linker again results in identical ELF files. Exact same ELF contents can be guaranteed only after two passes, as after the first pass ELF sections order changes, and thus .BTF.ext data sections order changes. That, in turn, means that strings are added into the final BTF string sections in different order, so .BTF strings data might not be exactly the same. But doing another round of linking afterwards should result in the identical ELF file, which is checked with additional `diff` command. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-12-andrii@kernel.org
2021-03-18	selftests/bpf: Re-generate vmlinux.h and BPF skeletons if bpftool changed	Andrii Nakryiko
	Trigger vmlinux.h and BPF skeletons re-generation if detected that bpftool was re-compiled. Otherwise full `make clean` is required to get updated skeletons, if bpftool is modified. Fixes: acbd06206bbb ("selftests/bpf: Add vmlinux.h selftest exercising tracing of syscalls") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-11-andrii@kernel.org
2021-03-18	bpftool: Add `gen object` command to perform BPF static linking	Andrii Nakryiko
	Add `bpftool gen object <output-file> <input_file>...` command to statically link multiple BPF ELF object files into a single output BPF ELF object file. This patch also updates bash completions and man page. Man page gets a short section on `gen object` command, but also updates the skeleton example to show off workflow for BPF application with two .bpf.c files, compiled individually with Clang, then resulting object files are linked together with `gen object`, and then final object file is used to generate usable BPF skeleton. This should help new users understand realistic workflow w.r.t. compiling mutli-file BPF application. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20210318194036.3521577-10-andrii@kernel.org
2021-03-18	bpftool: Add ability to specify custom skeleton object name	Andrii Nakryiko
	Add optional name OBJECT_NAME parameter to `gen skeleton` command to override default object name, normally derived from input file name. This allows much more flexibility during build time. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-9-andrii@kernel.org
2021-03-18	libbpf: Add BPF static linker BTF and BTF.ext support	Andrii Nakryiko
	Add .BTF and .BTF.ext static linking logic. When multiple BPF object files are linked together, their respective .BTF and .BTF.ext sections are merged together. BTF types are not just concatenated, but also deduplicated. .BTF.ext data is grouped by type (func info, line info, core_relos) and target section names, and then all the records are concatenated together, preserving their relative order. All the BTF type ID references and string offsets are updated as necessary, to take into account possibly deduplicated strings and types. BTF DATASEC types are handled specially. Their respective var_secinfos are accumulated separately in special per-section data and then final DATASEC types are emitted at the very end during bpf_linker__finalize() operation, just before emitting final ELF output file. BTF data can also provide "section annotations" for some extern variables. Such concept is missing in ELF, but BTF will have DATASEC types for such special extern datasections (e.g., .kconfig, .ksyms). Such sections are called "ephemeral" internally. Internally linker will keep metadata for each such section, collecting variables information, but those sections won't be emitted into the final ELF file. Also, given LLVM/Clang during compilation emits BTF DATASECS that are incomplete, missing section size and variable offsets for static variables, BPF static linker will initially fix up such DATASECs, using ELF symbols data. The final DATASECs will preserve section sizes and all variable offsets. This is handled correctly by libbpf already, so won't cause any new issues. On the other hand, it's actually a nice property to have a complete BTF data without runtime adjustments done during bpf_object__open() by libbpf. In that sense, BPF static linker is also a BTF normalizer. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-8-andrii@kernel.org
2021-03-18	libbpf: Add BPF static linker APIs	Andrii Nakryiko
	Introduce BPF static linker APIs to libbpf. BPF static linker allows to perform static linking of multiple BPF object files into a single combined resulting object file, preserving all the BPF programs, maps, global variables, etc. Data sections (.bss, .data, .rodata, .maps, maps, etc) with the same name are concatenated together. Similarly, code sections are also concatenated. All the symbols and ELF relocations are also concatenated in their respective ELF sections and are adjusted accordingly to the new object file layout. Static variables and functions are handled correctly as well, adjusting BPF instructions offsets to reflect new variable/function offset within the combined ELF section. Such relocations are referencing STT_SECTION symbols and that stays intact. Data sections in different files can have different alignment requirements, so that is taken care of as well, adjusting sizes and offsets as necessary to satisfy both old and new alignment requirements. DWARF data sections are stripped out, currently. As well as LLLVM_ADDRSIG section, which is ignored by libbpf in bpf_object__open() anyways. So, in a way, BPF static linker is an analogue to `llvm-strip -g`, which is a pretty nice property, especially if resulting .o file is then used to generate BPF skeleton. Original string sections are ignored and instead we construct our own set of unique strings using libbpf-internal `struct strset` API. To reduce the size of the patch, all the .BTF and .BTF.ext processing was moved into a separate patch. The high-level API consists of just 4 functions: - bpf_linker__new() creates an instance of BPF static linker. It accepts output filename and (currently empty) options struct; - bpf_linker__add_file() takes input filename and appends it to the already processed ELF data; it can be called multiple times, one for each BPF ELF object file that needs to be linked in; - bpf_linker__finalize() needs to be called to dump final ELF contents into the output file, specified when bpf_linker was created; after bpf_linker__finalize() is called, no more bpf_linker__add_file() and bpf_linker__finalize() calls are allowed, they will return error; - regardless of whether bpf_linker__finalize() was called or not, bpf_linker__free() will free up all the used resources. Currently, BPF static linker doesn't resolve cross-object file references (extern variables and/or functions). This will be added in the follow up patch set. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-7-andrii@kernel.org
2021-03-18	libbpf: Add generic BTF type shallow copy API	Andrii Nakryiko
	Add btf__add_type() API that performs shallow copy of a given BTF type from the source BTF into the destination BTF. All the information and type IDs are preserved, but all the strings encountered are added into the destination BTF and corresponding offsets are rewritten. BTF type IDs are assumed to be correct or such that will be (somehow) modified afterwards. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-6-andrii@kernel.org
2021-03-18	libbpf: Extract internal set-of-strings datastructure APIs	Andrii Nakryiko
	Extract BTF logic for maintaining a set of strings data structure, used for BTF strings section construction in writable mode, into separate re-usable API. This data structure is going to be used by bpf_linker to maintains ELF STRTAB section, which has the same layout as BTF strings section. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-5-andrii@kernel.org
2021-03-18	libbpf: Rename internal memory-management helpers	Andrii Nakryiko
	Rename btf_add_mem() and btf_ensure_mem() helpers that abstract away details of dynamically resizable memory to use libbpf_ prefix, as they are not BTF-specific. No functional changes. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-4-andrii@kernel.org
2021-03-18	libbpf: Generalize BTF and BTF.ext type ID and strings iteration	Andrii Nakryiko
	Extract and generalize the logic to iterate BTF type ID and string offset fields within BTF types and .BTF.ext data. Expose this internally in libbpf for re-use by bpf_linker. Additionally, complete strings deduplication handling for BTF.ext (e.g., CO-RE access strings), which was previously missing. There previously was no case of deduplicating .BTF.ext data, but bpf_linker is going to use it. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-3-andrii@kernel.org
2021-03-18	libbpf: Expose btf_type_by_id() internally	Andrii Nakryiko
	btf_type_by_id() is internal-only convenience API returning non-const pointer to struct btf_type. Expose it outside of btf.c for re-use. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-2-andrii@kernel.org