summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2023-05-07Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tool updates from Arnaldo Carvalho de Melo: "Third version of perf tool updates, with the build problems with with using a 'vmlinux.h' generated from the main build fixed, and the bpf skeleton build disabled by default. Build: - Require libtraceevent to build, one can disable it using NO_LIBTRACEEVENT=1. It is required for tools like 'perf sched', 'perf kvm', 'perf trace', etc. libtraceevent is available in most distros so installing 'libtraceevent-devel' should be a one-time event to continue building perf as usual. Using NO_LIBTRACEEVENT=1 produces tooling that is functional and sufficient for lots of users not interested in those libtraceevent dependent features. - Allow Python support in 'perf script' when libtraceevent isn't linked, as not all features requires it, for instance Intel PT does not use tracepoints. - Error if the python interpreter needed for jevents to work isn't available and NO_JEVENTS=1 isn't set, preventing a build without support for JSON vendor events, which is a rare but possible condition. The two check error messages: $(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.) $(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.) - Make libbpf 1.0 the minimum required when building with out of tree, distro provided libbpf. - Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++ demangler, add 'perf test' entry for it. - Make binutils libraries opt in, as distros disable building with it due to licensing, they were used for C++ demangling, for instance. - Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or equivalent) isn't installed, we'll just have a build warning: Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev - Add a feature test for scandirat(), that is not implemented so far in musl and uclibc, disabling features that need it, such as scanning for tracepoints in /sys/kernel/tracing/events. perf BPF filters: - New feature where BPF can be used to filter samples, for instance: $ sudo ./perf record -e cycles --filter 'period > 1000' true $ sudo ./perf script perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms]) perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms]) perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms]) perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms]) perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms]) true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms]) true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) - In addition to 'period' (PERF_SAMPLE_PERIOD), the other PERF_SAMPLE_ can be used for filtering, and also some other sample accessible values, from tools/perf/Documentation/perf-record.txt: Essentially the BPF filter expression is: <term> <operator> <value> (("," | "||") <term> <operator> <value>)* The <term> can be one of: ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr, code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat, p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock, mem_dtlb, mem_blk, mem_hops The <operator> can be one of: ==, !=, >, >=, <, <=, & The <value> can be one of: <number> (for any term) na, load, store, pfetch, exec (for mem_op) l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl) na, none, hit, miss, hitm, fwd, peer (for mem_snoop) remote (for mem_remote) na, locked (for mem_locked) na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb) na, by_data, by_addr (for mem_blk) hops0, hops1, hops2, hops3 (for mem_hops) perf lock contention: - Show lock type with address. - Track and show mmap_lock, siglock and per-cpu rq_lock with address. This is done for mmap_lock by following the current->mm pointer: $ sudo ./perf lock con -abl -- sleep 10 contended total wait max wait avg wait address symbol ... 16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640 17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0 3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock 3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58 1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70 9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock 14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0 3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock 16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560 11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock 1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8 1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock 581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058 5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070 112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120 381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock 255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80 - Update default map size to 16384. - Allocate single letter option -M for --map-nr-entries, as it is proving being frequently used. - Fix struct rq lock access for older kernels with BPF's CO-RE (Compile once, run everywhere). - Fix problems found with MSAn. perf report/top: - Add inline information when using --call-graph=fp or lbr, as was already done to the --call-graph=dwarf callchain mode. - Improve the 'srcfile' sort key performance by really using an optimization introduced in 6.2 for the 'srcline' sort key that avoids calling addr2line for comparision with each sample. perf sched: - Make 'perf sched latency/map/replay' to use "sched:sched_waking" instead of "sched:sched_waking", consistent with 'perf record' since d566a9c2d482 ("perf sched: Prefer sched_waking event when it exists"). perf ftrace: - Make system wide the default target for latency subcommand, run the following command then generate some network traffic and press control+C: # perf ftrace latency -T __kfree_skb ^C DURATION | COUNT | GRAPH | 0 - 1 us | 27 | ############# | 1 - 2 us | 22 | ########### | 2 - 4 us | 8 | #### | 4 - 8 us | 5 | ## | 8 - 16 us | 24 | ############ | 16 - 32 us | 2 | # | 32 - 64 us | 1 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | # perf top: - Add --branch-history (LBR: Last Branch Record) option, just like already available for 'perf record'. - Fix segfault in thread__comm_len() where thread->comm was being used outside thread->comm_lock. perf annotate: - Allow configuring objdump and addr2line in ~/.perfconfig., so that you can use alternative binaries, such as llvm's. perf kvm: - Add TUI mode for 'perf kvm stat report'. Reference counting: - Add reference count checking infrastructure to check for use after free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs, more to come. To build with it use -DREFCNT_CHECKING=1 in the make command line to build tools/perf. Documented at: https://perf.wiki.kernel.org/index.php/Reference_Count_Checking - The above caught, for instance, fix, present in this series: - Fix maps use after put in 'perf test "Share thread maps"': 'maps' is copied from leader, but the leader is put on line 79 and then 'maps' is used to read the reference count below - so a use after put, with the put of maps happening within thread__put. Fixed by reversing the order of puts so that the leader is put last. - Also several fixes were made to places where reference counts were not being held. - Make this one of the tests in 'make -C tools/perf build-test' to regularly build test it and to make sure no direct access to the reference counted structs are made, doing that via accessors to check the validity of the struct pointer. ARM64: - Fix 'perf report' segfault when filtering coresight traces by sparse lists of CPUs. - Add support for 'simd' as a sort field for 'perf report', to show ARM's NEON SIMD's predicate flags: "partial" and "empty". arm64 vendor events: - Add N1 metrics. Intel vendor events: - Add graniterapids, grandridge and sierraforrest events. - Refresh events for: alderlake, aldernaken, broadwell, broadwellde, broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex, jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids, silvermont, skylake, tigerlake and westmereep-dp - Refresh metrics for alderlake-n, broadwell, broadwellde, broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and skylakex. perf stat: - Implement --topdown using JSON metrics. - Add TopdownL1 JSON metric as a default if present, but disable it for now for some Intel hybrid architectures, a series of patches addressing this is being reviewed and will be submitted for v6.5. - Use metrics for --smi-cost. - Update topdown documentation. Vendor events (JSON) infrastructure: - Add support for computing and printing metric threshold values. For instance, here is one found in thesapphirerapids json file: { "BriefDescription": "Percentage of cycles spent in System Management Interrupts.", "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)", "MetricGroup": "smi", "MetricName": "smi_cycles", "MetricThreshold": "smi_cycles > 0.1", "ScaleUnit": "100%" }, - Test parsing metric thresholds with the fake PMU in 'perf test pmu-events'. - Support for printing metric thresholds in 'perf list'. - Add --metric-no-threshold option to 'perf stat'. - Add rand (reverse and) and has_pmem (optane memory) support to metrics. - Sort list of input files to avoid depending on the order from readdir() helping in obtaining reproducible builds. S/390: - Add common metrics: - CPI (cycles per instruction), prbstate (ratio of instructions executed in problem state compared to total number of instructions), l1mp (Level one instruction and data cache misses per 100 instructions). - Add cache metrics for z13, z14, z15 and z16. - Add metric for TLB and cache. ARM: - Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE (Memory Tagging Extension) and MOPS (Memory Operations) load/store. Intel PT hardware tracing: - Add event type names UINTR (User interrupt delivered) and UIRET (Exiting from user interrupt routine), documented in table 32-50 "CFE Packet Type and Vector Fields Details" in the Intel Processor Trace chapter of The Intel SDM Volume 3 version 078. - Add support for new branch instructions ERETS and ERETU. - Fix CYC timestamps after standalone CBR ARM CoreSight hardware tracing: - Allow user to override timestamp and contextid settings. - Fix segfault in dso lookup. - Fix timeless decode mode detection. - Add separate decode paths for timeless and per-thread modes. auxtrace: - Fix address filter entire kernel size. Miscellaneous: - Fix use-after-free and unaligned bugs in the PLT handling routines. - Use zfree() to reduce chances of use after free. - Add missing 0x prefix for addresses printed in hexadecimal in 'perf probe'. - Suppress massive unsupported target platform errors in the unwind code. - Fix return incorrect build_id size in elf_read_build_id(). - Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 . - Add missing new parameter in kfree_skb tracepoint to the python scripts using it. - Add 'perf bench syscall fork' benchmark. - Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in 'perf mem'. - Fix wrong size expectation for perf test 'Setup struct perf_event_attr' caused by the patch adding perf_event_attr::config3. - Fix some spelling mistakes" * tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits) Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL" Revert "perf build: Warn for BPF skeletons if endian mismatches" perf metrics: Fix SEGV with --for-each-cgroup perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE perf stat: Separate bperf from bpf_profiler perf test record+probe_libc_inet_pton: Fix call chain match on x86_64 perf test record+probe_libc_inet_pton: Fix call chain match on s390 perf tracepoint: Fix memory leak in is_valid_tracepoint() perf cs-etm: Add fix for coresight trace for any range of CPUs perf build: Fix unescaped # in perf build-test perf unwind: Suppress massive unsupported target platform errors perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it perf script: Print raw ip instead of binary offset for callchain perf symbols: Fix return incorrect build_id size in elf_read_build_id() perf list: Modify the warning message about scandirat(3) perf list: Fix memory leaks in print_tracepoint_events() perf lock contention: Rework offset calculation with BPF CO-RE perf lock contention: Fix struct rq lock access perf stat: Disable TopdownL1 on hybrid perf stat: Avoid SEGV on counter->name ...
2023-05-06Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"Arnaldo Carvalho de Melo
This reverts commit a980755beb5aca9002e1c95ba519b83a44242b5b. We need to better polish building with BPF skels, so revert back to making it an experimental feature that has to be explicitely enabled using BUILD_BPF_SKEL=1. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-06Revert "perf build: Warn for BPF skeletons if endian mismatches"Arnaldo Carvalho de Melo
This reverts commit 51924ae69eea5bc90b5da525fbcf4bbd5f8551b3. We need to better polish building with BPF skels, so revert back to making it an experimental feature that has to be explicitely enabled using BUILD_BPF_SKEL=1. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-05Merge tag 'net-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - sched: act_pedit: free pedit keys on bail from offset check Current release - new code bugs: - pds_core: - Kconfig fixes (DEBUGFS and AUXILIARY_BUS) - fix mutex double unlock in error path Previous releases - regressions: - sched: cls_api: remove block_cb from driver_list before freeing - nf_tables: fix ct untracked match breakage - eth: mtk_eth_soc: drop generic vlan rx offload - sched: flower: fix error handler on replace Previous releases - always broken: - tcp: fix skb_copy_ubufs() vs BIG TCP - ipv6: fix skb hash for some RST packets - af_packet: don't send zero-byte data in packet_sendmsg_spkt() - rxrpc: timeout handling fixes after moving client call connection to the I/O thread - ixgbe: fix panic during XDP_TX with > 64 CPUs - igc: RMW the SRRCTL register to prevent losing timestamp config - dsa: mt7530: fix corrupt frames using TRGMII on 40 MHz XTAL MT7621 - r8152: - fix flow control issue of RTL8156A - fix the poor throughput for 2.5G devices - move setting r8153b_rx_agg_chg_indicate() to fix coalescing - enable autosuspend - ncsi: clear Tx enable mode when handling a Config required AEN - octeontx2-pf: macsec: fixes for CN10KB ASIC rev Misc: - 9p: remove INET dependency" * tag 'net-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) net: bcmgenet: Remove phy_stop() from bcmgenet_netif_stop() pds_core: fix mutex double unlock in error path net/sched: flower: fix error handler on replace Revert "net/sched: flower: Fix wrong handle assignment during filter change" net/sched: flower: fix filter idr initialization net: fec: correct the counting of XDP sent frames bonding: add xdp_features support net: enetc: check the index of the SFI rather than the handle sfc: Add back mailing list virtio_net: suppress cpu stall when free_unused_bufs ice: block LAN in case of VF to VF offload net: dsa: mt7530: fix network connectivity with multiple CPU ports net: dsa: mt7530: fix corrupt frames using trgmii on 40 MHz XTAL MT7621 9p: Remove INET dependency netfilter: nf_tables: fix ct untracked match breakage af_packet: Don't send zero-byte data in packet_sendmsg_spkt(). igc: read before write to SRRCTL register pds_core: add AUXILIARY_BUS and NET_DEVLINK to Kconfig pds_core: remove CONFIG_DEBUG_FS from makefile ionic: catch failure from devlink_alloc ...
2023-05-05perf metrics: Fix SEGV with --for-each-cgroupIan Rogers
Ensure the metric threshold is copied correctly or else a use of uninitialized memory happens. Fixes: d0a3052f6faefffc ("perf metric: Compute and print threshold values") Reported-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230505204119.3443491-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-05perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used ↵Arnaldo Carvalho de Melo
structs + CO-RE Linus reported a build break due to using a vmlinux without a BTF elf section to generate the vmlinux.h header with bpftool for use in the BPF tools in tools/perf/util/bpf_skel/*.bpf.c. Instead add a vmlinux.h file with the structs needed with the fields the tools need, marking the structs with __attribute__((preserve_access_index)), so that libbpf's CO-RE code can fixup the struct field offsets. In some cases the vmlinux.h file that was being generated by bpftool from the kernel BTF information was not needed at all, just including linux/bpf.h, sometimes linux/perf_event.h was enough as non-UAPI types were not being used. To keep te patch small, include those UAPI headers from the trimmed down vmlinux.h file, that then provides the tools with just the structs and the subset of its fields needed for them. Testing it: # perf lock contention -b find / > /dev/null ^C contended total wait max wait avg wait type caller 7 53.59 us 10.86 us 7.66 us rwlock:R start_this_handle+0xa0 2 30.35 us 21.99 us 15.17 us rwsem:R iterate_dir+0x52 1 9.04 us 9.04 us 9.04 us rwlock:W start_this_handle+0x291 1 8.73 us 8.73 us 8.73 us spinlock raw_spin_rq_lock_nested+0x1e # # perf lock contention -abl find / > /dev/null ^C contended total wait max wait avg wait address symbol 1 262.96 ms 262.96 ms 262.96 ms ffff8e67502d0170 (mutex) 12 244.24 us 39.91 us 20.35 us ffff8e6af56f8070 mmap_lock (rwsem) 7 30.28 us 6.85 us 4.33 us ffff8e6c865f1d40 rq_lock (spinlock) 3 7.42 us 4.03 us 2.47 us ffff8e6c864b1d40 rq_lock (spinlock) 2 3.72 us 2.19 us 1.86 us ffff8e6c86571d40 rq_lock (spinlock) 1 2.42 us 2.42 us 2.42 us ffff8e6c86471d40 rq_lock (spinlock) 4 2.11 us 559 ns 527 ns ffffffff9a146c80 rcu_state (spinlock) 3 1.45 us 818 ns 482 ns ffff8e674ae8384c (rwlock) 1 870 ns 870 ns 870 ns ffff8e68456ee060 (rwlock) 1 663 ns 663 ns 663 ns ffff8e6c864f1d40 rq_lock (spinlock) 1 573 ns 573 ns 573 ns ffff8e6c86531d40 rq_lock (spinlock) 1 472 ns 472 ns 472 ns ffff8e6c86431740 (spinlock) 1 397 ns 397 ns 397 ns ffff8e67413a4f04 (spinlock) # # perf test offcpu 95: perf record offcpu profiling tests : Ok # # perf kwork latency --use-bpf Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end | -------------------------------------------------------------------------------------------------------------------------------- (w)flush_memcg_stats_dwork | 0000 | 1056.212 ms | 2 | 2112.345 ms | 550113.229573 s | 550115.341919 s | (w)toggle_allocation_gate | 0000 | 10.144 ms | 62 | 416.389 ms | 550113.453518 s | 550113.869907 s | (w)0xffff8e6748e28080 | 0002 | 0.623 ms | 1 | 0.623 ms | 550110.989841 s | 550110.990464 s | (w)vmstat_shepherd | 0000 | 0.586 ms | 10 | 2.828 ms | 550111.971536 s | 550111.974364 s | (w)vmstat_update | 0007 | 0.363 ms | 5 | 1.634 ms | 550113.222520 s | 550113.224154 s | (w)vmstat_update | 0000 | 0.324 ms | 10 | 2.827 ms | 550111.971526 s | 550111.974354 s | (w)0xffff8e674c5f4a58 | 0002 | 0.102 ms | 5 | 0.134 ms | 550110.989839 s | 550110.989972 s | (w)psi_avgs_work | 0001 | 0.086 ms | 3 | 0.107 ms | 550114.957852 s | 550114.957959 s | (w)psi_avgs_work | 0000 | 0.079 ms | 5 | 0.100 ms | 550118.605668 s | 550118.605768 s | (w)kfree_rcu_monitor | 0006 | 0.079 ms | 1 | 0.079 ms | 550110.925821 s | 550110.925900 s | (w)psi_avgs_work | 0004 | 0.079 ms | 1 | 0.079 ms | 550109.581835 s | 550109.581914 s | (w)psi_avgs_work | 0001 | 0.078 ms | 1 | 0.078 ms | 550109.197809 s | 550109.197887 s | (w)psi_avgs_work | 0002 | 0.077 ms | 5 | 0.086 ms | 550110.669819 s | 550110.669905 s | <SNIP> # strace -e bpf -o perf-stat-bpf-counters.output perf stat -e cycles --bpf-counters sleep 1 Performance counter stats for 'sleep 1': 6,197,983 cycles 1.003922848 seconds time elapsed 0.000000000 seconds user 0.002032000 seconds sys # head -7 perf-stat-bpf-counters.output bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/perf_attr_map", bpf_fd=0, file_flags=0}, 16) = 3 bpf(BPF_OBJ_GET_INFO_BY_FD, {info={bpf_fd=3, info_len=88, info=0x7ffcead64990}}, 16) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=3, key=0x24129e0, value=0x7ffcead65a48, flags=BPF_ANY}, 32) = 0 bpf(BPF_LINK_GET_FD_BY_ID, {link_id=1252}, 12) = -1 ENOENT (No such file or directory) bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65780, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, +func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 116) = 4 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65920, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, +func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 128) = 4 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\20\0\0\0\20\0\0\0\5\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=45, btf_log_size=0, btf_log_level=0}, 28) = 4 # Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Song Liu <song@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Co-developed-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/lkml/ZFU1PJrn8YtHIqno@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-05perf stat: Separate bperf from bpf_profilerDmitrii Dolgov
It seems that perf stat -b <prog id> doesn't produce any results: $ perf stat -e cycles -b 4 -I 10000 -vvv Control descriptor is not initialized cycles: 0 0 0 time counts unit events 10.007641640 <not supported> cycles Looks like this happens because fentry/fexit progs are getting loaded, but the corresponding perf event is not enabled and not added into the events bpf map. I think there is some mixing up between two type of bpf support, one for bperf and one for bpf_profiler. Both are identified via evsel__is_bpf, based on which perf events are enabled, but for the latter (bpf_profiler) a perf event is required. Using evsel__is_bperf to check only bperf produces expected results: $ perf stat -e cycles -b 4 -I 10000 -vvv Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: size 136 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ [...perf_event_attr for other CPUs...] ------------------------------------------------------------ cycles: 309426 169009 169009 time counts unit events 10.010091271 309426 cycles The final numbers correspond (at least in the level of magnitude) to the same metric obtained via bpftool. Fixes: 112cb56164bc2108 ("perf stat: Introduce config stat.bpf-counter-events") Reviewed-by: Song Liu <song@kernel.org> Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com> Tested-by: Song Liu <song@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230412182316.11628-1-9erthalion6@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-04Merge tag 'mm-stable-2023-05-03-16-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - Some DAMON cleanups from Kefeng Wang - Some KSM work from David Hildenbrand, to make the PR_SET_MEMORY_MERGE ioctl's behavior more similar to KSM's behavior. [ Andrew called these "final", but I suspect we'll have a series fixing up the fact that the last commit in the dmapools series in the previous pull seems to have unintentionally just reverted all the other commits in the same series.. - Linus ] * tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: hwpoison: coredump: support recovery from dump_user_range() mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page() mm/ksm: move disabling KSM from s390/gmap code to KSM code selftests/ksm: ksm_functional_tests: add prctl unmerge test mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0 mm/damon/paddr: fix missing folio_sz update in damon_pa_young() mm/damon/paddr: minor refactor of damon_pa_mark_accessed_or_deactivate() mm/damon/paddr: minor refactor of damon_pa_pageout()
2023-05-04Merge tag 'loongarch-6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Better backtraces for humanization - Relay BCE exceptions to userland as SIGSEGV - Provide kernel fpu functions - Optimize memory ops (memset/memcpy/memmove) - Optimize checksum and crc32(c) calculation - Add ARCH_HAS_FORTIFY_SOURCE selection - Add function error injection support - Add ftrace with direct call support - Add basic perf tools support * tag 'loongarch-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (24 commits) tools/perf: Add basic support for LoongArch LoongArch: ftrace: Add direct call trampoline samples support LoongArch: ftrace: Add direct call support LoongArch: ftrace: Implement ftrace_find_callable_addr() to simplify code LoongArch: ftrace: Fix build error if DYNAMIC_FTRACE_WITH_REGS is not set LoongArch: ftrace: Abstract DYNAMIC_FTRACE_WITH_ARGS accesses LoongArch: Add support for function error injection LoongArch: Add ARCH_HAS_FORTIFY_SOURCE selection LoongArch: crypto: Add crc32 and crc32c hw acceleration LoongArch: Add checksum optimization for 64-bit system LoongArch: Optimize memory ops (memset/memcpy/memmove) LoongArch: Provide kernel fpu functions LoongArch: Relay BCE exceptions to userland as SIGSEGV with si_code=SEGV_BNDERR LoongArch: Tweak the BADV and CPUCFG.PRID lines in show_regs() LoongArch: Humanize the ESTAT line when showing registers LoongArch: Humanize the ECFG line when showing registers LoongArch: Humanize the EUEN line when showing registers LoongArch: Humanize the PRMD line when showing registers LoongArch: Humanize the CRMD line when showing registers LoongArch: Fix format of CSR lines during show_regs() ...
2023-05-03perf test record+probe_libc_inet_pton: Fix call chain match on x86_64Thomas Richter
The test case probe libc's inet_pton & backtrace it with ping fails with Fedora 38 on x86_64. Function getaddrinfo() does not show up in the call chain anymore: # ./perf script ping 1803 [000] 728.567146: probe_libc:inet_pton: (7f5275afc840) 133840 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6) 27b4a __libc_start_call_main+0x7a (/usr/lib64/libc.so.6) 27c0b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6) ping 1803 [000] 728.567184: probe_libc:inet_pton: (7f5275afc840) 133840 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6) 493e main+0xcde (/usr/bin/ping) 27b4a __libc_start_call_main+0x7a (/usr/lib64/libc.so.6) # which causes the test case to fail. Remove function getaddrinfo() from list of expected functions. Output before: # ./perf test 'libc' 91: probe libc's inet_pton & backtrace it with ping : FAILED! # Output after: # ./perf test 'libc' 91: probe libc's inet_pton & backtrace it with ping : Ok # Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20230503081255.3372986-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-03perf test record+probe_libc_inet_pton: Fix call chain match on s390Thomas Richter
With Fedora 38 the perf test 86 probe libc's inet_pton fails on s390. The call chain of the ping command changed. The functions text_to_binary_address() and gaih_inet() do not show up in the call chain anymore. Output before: # ./perf test -v 86 86: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 541050 fgrep: warning: fgrep is obsolescent; using grep -F fgrep: warning: fgrep is obsolescent; using grep -F BFD: DWARF error: could not find variable specification at offset 0x22011 ... ping 541078 [002] 348826.679581: probe_libc:inet_pton_1: (3ffad84b940) 14b940 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6) 10e9c3 __GI_getaddrinfo+0xeb3 (inlined) 4397 main+0x737 (/usr/bin/ping) FAIL: expected backtrace entry "gaih_inet.*\+0x[[:xdigit:]]\ +[[:space:]]\(/usr/lib64/libc.so.6|inlined\)$" got "4397 main+0x737 (/usr/bin/ping)" test child finished with -1 ---- end ---- probe libc's inet_pton & backtrace it with ping: FAILED! # Output after: # ./perf test -v 86 86: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 541098 fgrep: warning: fgrep is obsolescent; using grep -F fgrep: warning: fgrep is obsolescent; using grep -F BFD: DWARF error: could not find variable specification at offset 0x309d1 ... ping 541126 [006] 349309.099067: probe_libc:inet_pton_1: (3ffb7f4b940) 14b940 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6) 10e9c3 __GI_getaddrinfo+0xeb3 (inlined) 4397 main+0x737 (/usr/bin/ping) test child finished with 0 ---- end ---- probe libc's inet_pton & backtrace it with ping: Ok # Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20230503081134.3372415-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-03selftests: netfilter: fix libmnl pkg-config usageJeremy Sowden
1. Don't hard-code pkg-config 2. Remove distro-specific default for CFLAGS 3. Use pkg-config for LDLIBS Fixes: a50a88f026fb ("selftests: netfilter: fix a build error on openSUSE") Suggested-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-05-02selftests/ksm: ksm_functional_tests: add prctl unmerge testDavid Hildenbrand
Let's test whether setting PR_SET_MEMORY_MERGE to 0 after setting it to 1 will unmerge pages, similar to how setting MADV_UNMERGEABLE after setting MADV_MERGEABLE would. Link: https://lkml.kernel.org/r/20230422205420.30372-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Stefan Roesch <shr@devkernel.io> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02perf tracepoint: Fix memory leak in is_valid_tracepoint()Yang Jihong
When is_valid_tracepoint() returns 1, need to call put_events_file() to free `dir_path`. Fixes: 25a7d914274de386 ("perf parse-events: Use get/put_events_file()") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230421025953.173826-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-02perf cs-etm: Add fix for coresight trace for any range of CPUsGanapatrao Kulkarni
The current implementation supports coresight trace decode for a range of CPUs, if the first CPU is CPU0. Perf report segfaults, if tried for sparse CPUs list and also for any range of CPUs(non zero first CPU). Adding a fix to perf report for any range of CPUs and for sparse list. Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20230421055253.83912-1-gankulkarni@os.amperecomputing.com Cc: suzuki.poulose@arm.com Cc: acme@kernel.org Cc: mathieu.poirier@linaro.org Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: darren@os.amperecomputing.com Cc: scclevenger@os.amperecomputing.com Cc: scott@os.amperecomputing.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-02perf build: Fix unescaped # in perf build-testJames Clark
With the following bash and make versions: $ make --version GNU Make 4.2.1 Built for aarch64-unknown-linux-gnu $ bash --version GNU bash, version 5.0.17(1)-release (aarch64-unknown-linux-gnu) This error is encountered when running the build-test target: $ make -C tools/perf build-test tests/make:181: *** unterminated call to function 'shell': missing ')'. Stop. make: *** [Makefile:103: build-test] Error 2 Fix it by escaping the # which was causing make to interpret the rest of the line as a comment leaving the unclosed opening bracket. Fixes: 56d5229471ee1634 ("tools build: Pass libbpf feature only if libbpf 1.0+") Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230425104414.1723571-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-02perf unwind: Suppress massive unsupported target platform errorsChangbin Du
When cross-analyzing perf data recorded on an another platform, massive unsupported target platform errors are printed. So let's show this message as warning and only once. Signed-off-by: Changbin Du <changbin.du@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hui Wang <hw.huiwang@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230426032246.3608596-1-changbin.du@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-02perf script: Add new parameter in kfree_skb tracepoint to the python scripts ↵Sriram Yagnaraman
using it Include reason parameter that was added in commit c504e5c2f9648a1e ("net: skb: introduce kfree_skb_reason()") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Link: https://lore.kernel.org/r/20230426104149.14089-1-sriram.yagnaraman@est.tech Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-02perf script: Print raw ip instead of binary offset for callchainChangbin Du
Before this, the raw ip is printed for non-callchain and dso offset for callchain. This inconsistent output for address may confuse people. And mostly what we expect is the raw ip. 'dso offset' is printed in callchain: $ perf script ... ls 1341034 2739463.008343: 2162417 cycles: ffffffff99d657a7 [unknown] ([unknown]) ffffffff99e00b67 [unknown] ([unknown]) 235d3 memset+0x53 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) # dso offset a61b _dl_map_object+0x1bb (/usr/lib/x86_64-linux-gnu/ld-2.31.so) raw ip is printed for non-callchain: $ perf script -G ... ls 1341034 2739463.008876: 2053304 cycles: ffffffffc1596923 [unknown] ([unknown]) ls 1341034 2739463.009381: 1917049 cycles: 14def8e149e6 __strcoll_l+0xd96 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) # raw ip Let's have consistent output for it. Later I'll add a new field 'dsoff' to print dso offset. Signed-off-by: Changbin Du <changbin.du@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hui Wang <hw.huiwang@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230418031825.1262579-2-changbin.du@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-02perf symbols: Fix return incorrect build_id size in elf_read_build_id()Yang Jihong
In elf_read_build_id(), if gnu build_id is found, should return the size of the actually copied data. If descsz is greater thanBuild_ID_SIZE, write_buildid data access may occur. Fixes: be96ea8ffa788dcc ("perf symbols: Fix issue with binaries using 16-bytes buildids (v2)") Reported-by: Will Ochowicz <Will.Ochowicz@genusplc.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Will Ochowicz <Will.Ochowicz@genusplc.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/lkml/CWLP265MB49702F7BA3D6D8F13E4B1A719C649@CWLP265MB4970.GBRP265.PROD.OUTLOOK.COM/T/ Link: https://lore.kernel.org/r/20230427012841.231729-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-02perf list: Modify the warning message about scandirat(3)Namhyung Kim
It should mention scandirat() instead of scandir(). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230427230502.1526136-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-02perf list: Fix memory leaks in print_tracepoint_events()Namhyung Kim
It should free entries (not only the array) filled by scandirat() after use. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230427230502.1526136-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-02perf lock contention: Rework offset calculation with BPF CO-RENamhyung Kim
It seems BPF CO-RE reloc doesn't work well with the pattern that gets the field-offset only. Use offsetof() to make it explicit so that the compiler would generate the correct code. Fixes: 0c1228486befa3d6 ("perf lock contention: Support pre-5.14 kernels") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Co-developed-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Link: https://lore.kernel.org/r/20230427234833.1576130-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-02perf lock contention: Fix struct rq lock accessNamhyung Kim
The BPF CO-RE's ignore suffix rule requires three underscores. Otherwise it'd fail like below: $ sudo perf lock contention -ab libbpf: prog 'collect_lock_syms': BPF program load failed: Invalid argument libbpf: prog 'collect_lock_syms': -- BEGIN PROG LOAD LOG -- reg type unsupported for arg#0 function collect_lock_syms#380 ; int BPF_PROG(collect_lock_syms) 0: (b7) r6 = 0 ; R6_w=0 1: (b7) r7 = 0 ; R7_w=0 2: (b7) r9 = 1 ; R9_w=1 3: <invalid CO-RE relocation> failed to resolve CO-RE relocation <byte_off> [381] struct rq__new.__lock (0:0 @ offset 0) Fixes: 0c1228486befa3d6 ("perf lock contention: Support pre-5.14 kernels") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230427234833.1576130-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "s390: - More phys_to_virt conversions - Improvement of AP management for VSIE (nested virtualization) ARM64: - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements. x86: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features - Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - AMD SVM: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts - Intel AMX: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - AMX selftests improvements - Misc cleanups MIPS: - Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: - Drop unnecessary casts from "void *" throughout kvm_main.c - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: - Fix goof introduced by the conversion to rST" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits) KVM: s390: pci: fix virtual-physical confusion on module unload/load KVM: s390: vsie: clarifications on setting the APCB KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: selftests: Test the PMU event "Instructions retired" KVM: selftests: Copy full counter values from guest in PMU event filter test KVM: selftests: Use error codes to signal errors in PMU event filter test KVM: selftests: Print detailed info in PMU event filter asserts KVM: selftests: Add helpers for PMC asserts in PMU event filter test KVM: selftests: Add a common helper for the PMU event filter guest code KVM: selftests: Fix spelling mistake "perrmited" -> "permitted" KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: nvhe: Synchronise with page table walker on vcpu run KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: selftests: Add test to verify KVM's supported XCR0 ...
2023-05-01tools/perf: Add basic support for LoongArchHuacai Chen
Add basic support for LoongArch, which is very similar to the MIPS version. Signed-off-by: Ming Wang <wangming01@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-04-30Merge tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds
Pull compute express link updates from Dan Williams: "DOE support is promoted from drivers/cxl/ to drivers/pci/ with Bjorn's blessing, and the CXL core continues to mature its media management capabilities with support for listing and injecting media errors. Some late fixes that missed v6.3-final are also included: - Refactor the DOE infrastructure (Data Object Exchange PCI-config-cycle mailbox) to be a facility of the PCI core rather than the CXL core. This is foundational for upcoming support for PCI device-attestation and PCIe / CXL link encryption. - Add support for retrieving and injecting poison for CXL memory expanders. This enabling uses trace-events to convey CXL media error records to user tooling. It includes translation of device-local addresses (DPA) to system physical addresses (SPA) and their corresponding CXL region. - Fixes for decoder enumeration that missed v6.3-final - Miscellaneous fixups" * tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (38 commits) cxl/test: Add mock test for set_timestamp cxl/mbox: Update CMD_RC_TABLE tools/testing/cxl: Require CONFIG_DEBUG_FS tools/testing/cxl: Add a sysfs attr to test poison inject limits tools/testing/cxl: Use injected poison for get poison list tools/testing/cxl: Mock the Clear Poison mailbox command tools/testing/cxl: Mock the Inject Poison mailbox command cxl/mem: Add debugfs attributes for poison inject and clear cxl/memdev: Trace inject and clear poison as cxl_poison events cxl/memdev: Warn of poison inject or clear to a mapped region cxl/memdev: Add support for the Clear Poison mailbox command cxl/memdev: Add support for the Inject Poison mailbox command tools/testing/cxl: Mock support for Get Poison List cxl/trace: Add an HPA to cxl_poison trace events cxl/region: Provide region info to the cxl_poison trace event cxl/memdev: Add trigger_poison_list sysfs attribute cxl/trace: Add TRACE support for CXL media-error records cxl/mbox: Add GET_POISON_LIST mailbox command cxl/mbox: Initialize the poison state cxl/mbox: Restrict poison cmds to debugfs cxl_raw_allow_all ...
2023-04-29Merge tag 'cgroup-for-6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - cpuset changes including the fix for an incorrect interaction with CPU hotplug and an optimization - Other doc and cosmetic changes * tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: docs: cgroup-v1/cpusets: update libcgroup project link cgroup/cpuset: Minor updates to test_cpuset_prs.sh cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset cpuset: Clean up cpuset_node_allowed cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers
2023-04-28perf stat: Disable TopdownL1 on hybridIan Rogers
Bugs with event parsing, event grouping and metrics causes the TopdownL1 metricgroup to crash the perf command. Temporarily disable the group if no events/metrics are spcecified. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kang Minchul <tegongkang@gmail.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20230428073809.1803624-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-28perf stat: Avoid SEGV on counter->nameIan Rogers
Switch to use evsel__name() that doesn't return NULL for hardware and similar events. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kang Minchul <tegongkang@gmail.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20230426070050.1315519-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-28Merge tag 'riscv-for-linus-6.4-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for runtime detection of the Svnapot extension - Support for Zicboz when clearing pages - We've moved to GENERIC_ENTRY - Support for !MMU on rv32 systems - The linear region is now mapped via huge pages - Support for building relocatable kernels - Support for the hwprobe interface - Various fixes and cleanups throughout the tree * tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (57 commits) RISC-V: hwprobe: Explicity check for -1 in vdso init RISC-V: hwprobe: There can only be one first riscv: Allow to downgrade paging mode from the command line dt-bindings: riscv: add sv57 mmu-type RISC-V: hwprobe: Remove __init on probe_vendor_features() riscv: Use --emit-relocs in order to move .rela.dyn in init riscv: Check relocations at compile time powerpc: Move script to check relocations at compile time in scripts/ riscv: Introduce CONFIG_RELOCATABLE riscv: Move .rela.dyn outside of init to avoid empty relocations riscv: Prepare EFI header for relocatable kernels riscv: Unconditionnally select KASAN_VMALLOC if KASAN riscv: Fix ptdump when KASAN is enabled riscv: Fix EFI stub usage of KASAN instrumented strcmp function riscv: Move DTB_EARLY_BASE_VA to the kernel address space riscv: Rework kasan population functions riscv: Split early and final KASAN population functions riscv: Use PUD/P4D/PGD pages for the linear mapping riscv: Move the linear mapping creation in its own function riscv: Get rid of riscv_pfn_base variable ...
2023-04-28Merge tag 'powerpc-6.4-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for building the kernel using PC-relative addressing on Power10. - Allow HV KVM guests on Power10 to use prefixed instructions. - Unify support for the P2020 CPU (85xx) into a single machine description. - Always build the 64-bit kernel with 128-bit long double. - Drop support for several obsolete 2000's era development boards as identified by Paul Gortmaker. - A series fixing VFIO on Power since some generic changes. - Various other small features and fixes. Thanks to Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu, Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel Stanley, Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro Yamada, Michael Neuling, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Nick Desaulniers, Nysal Jan K.A, Pali Rohár, Paul Gortmaker, Paul Mackerras, Petr Vaněk, Randy Dunlap, Rob Herring, Sachin Sant, Sean Christopherson, Segher Boessenkool, and Timothy Pearson. * tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits) powerpc/64s: Disable pcrel code model on Clang powerpc: Fix merge conflict between pcrel and copy_thread changes powerpc/configs/powernv: Add IGB=y powerpc/configs/64s: Drop JFS Filesystem powerpc/configs/64s: Use EXT4 to mount EXT2 filesystems powerpc/configs: Make pseries_defconfig an alias for ppc64le_guest powerpc/configs: Make pseries_le an alias for ppc64le_guest powerpc/configs: Incorporate generic kvm_guest.config into guest configs powerpc/configs: Add IBMVETH=y and IBMVNIC=y to guest configs powerpc/configs/64s: Enable Device Mapper options powerpc/configs/64s: Enable PSTORE powerpc/configs/64s: Enable VLAN support powerpc/configs/64s: Enable BLK_DEV_NVME powerpc/configs/64s: Drop REISERFS powerpc/configs/64s: Use SHA512 for module signatures powerpc/configs/64s: Enable IO_STRICT_DEVMEM powerpc/configs/64s: Enable SCHEDSTATS powerpc/configs/64s: Enable DEBUG_VM & other options powerpc/configs/64s: Enable EMULATED_STATS powerpc/configs/64s: Enable KUNIT and most tests ...
2023-04-28Merge tag 'trace-tools-v6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing tools updates from Steven Rostedt: - Add auto-analysis only option to rtla/timerlat Add an --aa-only option to the tooling to perform only the auto analysis and not to parse and format the data. - Other minor fixes and clean ups * tag 'trace-tools-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rtla/timerlat: Fix "Previous IRQ" auto analysis' line rtla/timerlat: Add auto-analysis only option rv: Remove redundant assignment to variable retval rv: Fix addition on an uninitialized variable 'run' rtla: Add .gitignore file
2023-04-28Merge tag 'trace-v6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - User events are finally ready! After lots of collaboration between various parties, we finally locked down on a stable interface for user events that can also work with user space only tracing. This is implemented by telling the kernel (or user space library, but that part is user space only and not part of this patch set), where the variable is that the application uses to know if something is listening to the trace. There's also an interface to tell the kernel about these events, which will show up in the /sys/kernel/tracing/events/user_events/ directory, where it can be enabled. When it's enabled, the kernel will update the variable, to tell the application to start writing to the kernel. See https://lwn.net/Articles/927595/ - Cleaned up the direct trampolines code to simplify arm64 addition of direct trampolines. Direct trampolines use the ftrace interface but instead of jumping to the ftrace trampoline, applications (mostly BPF) can register their own trampoline for performance reasons. - Some updates to the fprobe infrastructure. fprobes are more efficient than kprobes, as it does not need to save all the registers that kprobes on ftrace do. More work needs to be done before the fprobes will be exposed as dynamic events. - More updates to references to the obsolete path of /sys/kernel/debug/tracing for the new /sys/kernel/tracing path. - Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer line by line instead of all at once. There are users in production kernels that have a large data dump that originally used printk() directly, but the data dump was larger than what printk() allowed as a single print. Using seq_buf() to do the printing fixes that. - Add /sys/kernel/tracing/touched_functions that shows all functions that was every traced by ftrace or a direct trampoline. This is used for debugging issues where a traced function could have caused a crash by a bpf program or live patching. - Add a "fields" option that is similar to "raw" but outputs the fields of the events. It's easier to read by humans. - Some minor fixes and clean ups. * tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (41 commits) ring-buffer: Sync IRQ works before buffer destruction tracing: Add missing spaces in trace_print_hex_seq() ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus recordmcount: Fix memory leaks in the uwrite function tracing/user_events: Limit max fault-in attempts tracing/user_events: Prevent same address and bit per process tracing/user_events: Ensure bit is cleared on unregister tracing/user_events: Ensure write index cannot be negative seq_buf: Add seq_buf_do_printk() helper tracing: Fix print_fields() for __dyn_loc/__rel_loc tracing/user_events: Set event filter_type from type ring-buffer: Clearly check null ptr returned by rb_set_head_page() tracing: Unbreak user events tracing/user_events: Use print_format_fields() for trace output tracing/user_events: Align structs with tabs for readability tracing/user_events: Limit global user_event count tracing/user_events: Charge event allocs to cgroups tracing/user_events: Update documentation for ABI tracing/user_events: Use write ABI in example tracing/user_events: Add ABI self-test ...
2023-04-28Merge tag 'objtool-core-2023-04-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did this inconsistently follow this new, common convention, and fix all the fallout that objtool can now detect statically - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code - Generate ORC data for __pfx code - Add more __noreturn annotations to various kernel startup/shutdown and panic functions - Misc improvements & fixes * tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/hyperv: Mark hv_ghcb_terminate() as noreturn scsi: message: fusion: Mark mpt_halt_firmware() __noreturn x86/cpu: Mark {hlt,resume}_play_dead() __noreturn btrfs: Mark btrfs_assertfail() __noreturn objtool: Include weak functions in global_noreturns check cpu: Mark nmi_panic_self_stop() __noreturn cpu: Mark panic_smp_self_stop() __noreturn arm64/cpu: Mark cpu_park_loop() and friends __noreturn x86/head: Mark *_start_kernel() __noreturn init: Mark start_kernel() __noreturn init: Mark [arch_call_]rest_init() __noreturn objtool: Generate ORC data for __pfx code x86/linkage: Fix padding for typed functions objtool: Separate prefix code from stack validation code objtool: Remove superfluous dead_end_function() check objtool: Add symbol iteration helpers objtool: Add WARN_INSN() scripts/objdump-func: Support multiple functions context_tracking: Fix KCSAN noinstr violation objtool: Add stackleak instrumentation to uaccess safe list ...
2023-04-28Merge tag 'x86_mm_for_6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 LAM (Linear Address Masking) support from Dave Hansen: "Add support for the new Linear Address Masking CPU feature. This is similar to ARM's Top Byte Ignore and allows userspace to store metadata in some bits of pointers without masking it out before use" * tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA selftests/x86/lam: Add test cases for LAM vs thread creation selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA test cases for linear-address masking selftests/x86/lam: Add inherit test cases for linear-address masking selftests/x86/lam: Add io_uring test cases for linear-address masking selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address masking selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking x86/mm/iommu/sva: Make LAM and SVA mutually exclusive iommu/sva: Replace pasid_valid() helper with mm_valid_pasid() mm: Expose untagging mask in /proc/$PID/status x86/mm: Provide arch_prctl() interface for LAM x86/mm: Reduce untagged_addr() overhead for systems without LAM x86/uaccess: Provide untagged_addr() and remove tags before address check mm: Introduce untagged_addr_remote() x86/mm: Handle LAM on context switch x86: CPUID and CR3/CR4 flags for Linear Address Masking x86: Allow atomic MM_CONTEXT flags setting x86/mm: Rework address range check in get_user() and put_user()
2023-04-28selftests: srv6: make srv6_end_dt46_l3vpn_test more robustAndrea Mayer
On some distributions, the rp_filter is automatically set (=1) by default on a netdev basis (also on VRFs). In an SRv6 End.DT46 behavior, decapsulated IPv4 packets are routed using the table associated with the VRF bound to that tunnel. During lookup operations, the rp_filter can lead to packet loss when activated on the VRF. Therefore, we chose to make this selftest more robust by explicitly disabling the rp_filter during tests (as it is automatically set by some Linux distributions). Fixes: 03a0b567a03d ("selftests: seg6: add selftest for SRv6 End.DT46 Behavior") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Tested-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-27Merge tag 'mm-nonmm-stable-2023-04-27-16-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly singleton patches all over the place. Series of note are: - updates to scripts/gdb from Glenn Washburn - kexec cleanups from Bjorn Helgaas" * tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (50 commits) mailmap: add entries for Paul Mackerras libgcc: add forward declarations for generic library routines mailmap: add entry for Oleksandr ocfs2: reduce ioctl stack usage fs/proc: add Kthread flag to /proc/$pid/status ia64: fix an addr to taddr in huge_pte_offset() checkpatch: introduce proper bindings license check epoll: rename global epmutex scripts/gdb: add GDB convenience functions $lx_dentry_name() and $lx_i_dentry() scripts/gdb: create linux/vfs.py for VFS related GDB helpers uapi/linux/const.h: prefer ISO-friendly __typeof__ delayacct: track delays from IRQ/SOFTIRQ scripts/gdb: timerlist: convert int chunks to str scripts/gdb: print interrupts scripts/gdb: raise error with reduced debugging information scripts/gdb: add a Radix Tree Parser lib/rbtree: use '+' instead of '|' for setting color. proc/stat: remove arch_idle_time() checkpatch: check for misuse of the link tags checkpatch: allow Closes tags with links ...
2023-04-27Merge tag 'mm-stable-2023-04-27-15-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
2023-04-27Merge tag 'sh-for-v6.4-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux Pull sh updates from John Paul Adrian Glaubitz: "This is a bit larger than my previous one and mainly consists of clean-up work in the arch/sh directory by Geert Uytterhoeven and Randy Dunlap. Additionally, this fixes a bug in the Storage Queue code that was discovered while I was reviewing a patch to switch the code to the bitmap API by Christophe Jaillet. So this contains both a fix for the original bug in the Storage Queue code that can be backported later as well as the Christophe's patch to swich the code to the bitmap API. Summary: - Use generic GCC library routines - sq: Use the bitmap API when applicable - sq: Fix incorrect element size for allocating bitmap buffer - pci: Remove unused variable in SH-7786 PCI Express code - mcount.S: fix build error when PRINTK is not enabled - remove sh5/sh64 last fragments - math-emu: fix macro redefined warning - init: use OF_EARLY_FLATTREE for early init - nmi_debug: fix return value of __setup handler - SH2007: drop the bad URL info" * tag 'sh-for-v6.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux: sh: Replace <uapi/asm/types.h> by <asm-generic/int-ll64.h> sh: Use generic GCC library routines sh: sq: Use the bitmap API when applicable sh: sq: Fix incorrect element size for allocating bitmap buffer sh: pci: Remove unused variable in SH-7786 PCI Express code sh: mcount.S: fix build error when PRINTK is not enabled sh: remove sh5/sh64 last fragments sh: math-emu: fix macro redefined warning sh: init: use OF_EARLY_FLATTREE for early init sh: nmi_debug: fix return value of __setup handler sh: SH2007: drop the bad URL info
2023-04-27Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "virtio,vhost,vdpa: features, fixes, and cleanups: - reduction in interrupt rate in virtio - perf improvement for VDUSE - scalability for vhost-scsi - non power of 2 ring support for packed rings - better management for mlx5 vdpa - suspend for snet - VIRTIO_F_NOTIFICATION_DATA - shared backend with vdpa-sim-blk - user VA support in vdpa-sim - better struct packing for virtio and fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (52 commits) vhost_vdpa: fix unmap process in no-batch mode MAINTAINERS: make me a reviewer of VIRTIO CORE AND NET DRIVERS tools/virtio: fix build caused by virtio_ring changes virtio_ring: add a struct device forward declaration vdpa_sim_blk: support shared backend vdpa_sim: move buffer allocation in the devices vdpa/snet: use likely/unlikely macros in hot functions vdpa/snet: implement kick_vq_with_data callback virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support virtio: add VIRTIO_F_NOTIFICATION_DATA feature support vdpa/snet: support the suspend vDPA callback vdpa/snet: support getting and setting VQ state MAINTAINERS: add vringh.h to Virtio Core and Net Drivers vringh: address kdoc warnings vdpa: address kdoc warnings virtio_ring: don't update event idx on get_buf vdpa_sim: add support for user VA vdpa_sim: replace the spinlock with a mutex to protect the state vdpa_sim: use kthread worker vdpa_sim: make devices agnostic for work management ...
2023-04-27Merge tag 'driver-core-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...
2023-04-27Merge tag 'for-linus-2023042601' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID updates from Jiri Kosina: - import a bunch of HID selftests from out-of-tree hid-tools project (Benjamin Tissoires) - drastically reducing Bluetooth disconnects on hid-nintendo driven devices (Daniel J. Ogorchock) - lazy initialization of battery interfaces in wacom driver (Jason Gerecke) - generic support for all Kye tablets (David Yang) - proper rumble queue overrun handling in hid-nintendo (Daniel J. Ogorchock) - support for ADC measurement in logitech-hidpp driver (Bastien Nocera) - reset GPIO support in i2c-hid (Hans de Goede) - improved handling of generic "Digitizer" usage (Jason Gerecke) - support for KEY_CAMERA_FOCUS (Feng Qi) - quirks for Apple Geyser 3 and Apple Geyser 4 (Alex Henrie) - assorted functional fixes and device ID additions * tag 'for-linus-2023042601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (54 commits) HID: amd_sfh: Fix max supported HID devices HID: wacom: generic: Set battery quirk only when we see battery data HID: wacom: Lazy-init batteries HID: Ignore battery for ELAN touchscreen on ROG Flow X13 GV301RA HID: asus: explicitly include linux/leds.h HID: lg-g15: explicitly include linux/leds.h HID: steelseries: explicitly include linux/leds.h HID: apple: Set the tilde quirk flag on the Geyser 3 HID: apple: explicitly include linux/leds.h HID: mcp2221: fix get and get_direction for gpio HID: mcp2221: fix report layout for gpio get HID: wacom: Set a default resolution for older tablets HID: i2c-hid-of: Add reset GPIO support to i2c-hid-of HID: i2c-hid-of: Allow using i2c-hid-of on non OF platforms HID: i2c-hid-of: Consistenly use dev local variable in probe() HID: kye: Fix rdesc for kye tablets HID: amd_sfh: Support for additional light sensor HID: amd_sfh: Handle "no sensors" enabled for SFH1.1 HID: amd_sfh: Increase sensor command timeout for SFH1.1 HID: amd_sfh: Correct the stop all command ...
2023-04-27Merge tag 'sound-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "At this time, it's an interesting mixture of changes for both old and new stuff. Majority of changes are about ASoC (lots of systematic changes for converting remove callbacks to void, and cleanups), while we got the fixes and the enhancements of very old PCI cards, too. Here are some highlights: ALSA/ASoC Core: - Continued effort of more ASoC core cleanups - Minor improvements for XRUN handling in indirect PCM helpers - Code refactoring of PCM core code ASoC: - Continued feature and simplification work on SOF, including addition of a no-DSP mode for bringup, HDA MLink and extensions to the IPC4 protocol - Hibernation support for CS35L45 - More DT binding conversions - Support for Cirrus Logic CS35L56, Freescale QMC, Maxim MAX98363, nVidia systems with MAX9809x and RT5631, Realtek RT712, Renesas R-Car Gen4, Rockchip RK3588 and TI TAS5733 ALSA: - Lots of works for legacy emu10k1 and ymfpci PCI drivers - PCM kselftest fixes and enhancements" * tag 'sound-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (586 commits) ALSA: emu10k1: use high-level I/O in set_filterQ() ALSA: emu10k1: use high-level I/O functions also during init ALSA: emu10k1: fix error handling in snd_audigy_i2c_volume_put() ALSA: emu10k1: don't stop DSP in _snd_emu10k1_{,audigy_}init_efx() ALSA: emu10k1: fix SNDRV_EMU10K1_IOCTL_SINGLE_STEP ALSA: emu10k1: skip Sound Blaster-specific hacks for E-MU cards ALSA: emu10k1: fixup DSP defines ALSA: emu10k1: pull in some register definitions from kX-project ALSA: emu10k1: remove some bogus defines ALSA: emu10k1: eliminate some unused defines ALSA: emu10k1: fix lineup of EMU_HANA_* defines ALSA: emu10k1: comment updates ALSA: emu10k1: fix snd_emu1010_fpga_read() input masking for rev2 cards ALSA: emu10k1: remove unused emu->pcm_playback_efx_substream field ALSA: emu10k1: remove unused `resume` parameter from snd_emu10k1_init() ALSA: emu10k1: minor optimizations ALSA: emu10k1: remove remaining cruft from snd_emu10k1_emu1010_init() ALSA: emu10k1: remove apparently pointless EMU_HANA_OPTION_CARDS reads ALSA: emu10k1: remove apparently pointless FPGA reads ALSA: emu10k1: stop doing weird things with HCFG in snd_emu10k1_emu1010_init() ...
2023-04-27Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Two series: - Reorganize how the hardware page table objects are managed, particularly their destruction flow. Increase the selftest test coverage in this area by creating a more complete mock iommu driver. This is preparation to add a replace operation for HWPT binding, which is done but waiting for the VFIO parts to complete so there is a user. - Split the iommufd support for "access" to make it two step - allocate an access then link it to an IOAS. Update VFIO and have VFIO always create an access even for the VFIO mdevs that never do DMA. This is also preperation for the replace VFIO series that will allow replace to work on access types as well. Three minor fixes: - Sykzaller found the selftest code didn't check for overflow when processing user VAs - smatch noted a .data item should have been static - Add a selftest that reproduces a syzkaller bug for batch carry already fixed in rc" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (21 commits) iommufd/selftest: Cover domain unmap with huge pages and access iommufd/selftest: Set varaiable mock_iommu_device storage-class-specifier to static vfio: Check the presence for iommufd callbacks in __vfio_register_dev() vfio/mdev: Uses the vfio emulated iommufd ops set in the mdev sample drivers vfio-iommufd: Make vfio_iommufd_emulated_bind() return iommufd_access ID vfio-iommufd: No need to record iommufd_ctx in vfio_device iommufd: Create access in vfio_iommufd_emulated_bind() iommu/iommufd: Pass iommufd_ctx pointer in iommufd_get_ioas() iommufd/selftest: Catch overflow of uptr and length iommufd/selftest: Add a selftest for iommufd_device_attach() with a hwpt argument iommufd/selftest: Make selftest create a more complete mock device iommufd/selftest: Rename the remaining mock device_id's to stdev_id iommufd/selftest: Rename domain_id to hwpt_id for FIXTURE iommufd_mock_domain iommufd/selftest: Rename domain_id to stdev_id for FIXTURE iommufd_ioas iommufd/selftest: Rename the sefltest 'device_id' to 'stdev_id' iommufd: Make iommufd_hw_pagetable_alloc() do iopt_table_add_domain() iommufd: Move iommufd_device to iommufd_private.h iommufd: Move ioas related HWPT destruction into iommufd_hw_pagetable_destroy() iommufd: Consistently manage hwpt_item iommufd: Add iommufd_lock_obj() around the auto-domains hwpts ...
2023-04-26Merge tag 'net-next-6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the default value allows for better BIG TCP performances - Reduce compound page head access for zero-copy data transfers - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible - Threaded NAPI improvements, adding defer skb free support and unneeded softirq avoidance - Address dst_entry reference count scalability issues, via false sharing avoidance and optimize refcount tracking - Add lockless accesses annotation to sk_err[_soft] - Optimize again the skb struct layout - Extends the skb drop reasons to make it usable by multiple subsystems - Better const qualifier awareness for socket casts BPF: - Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses - Add a new BPF netfilter program type and minimal support to hook BPF programs to netfilter hooks such as prerouting or forward - Add more precise memory usage reporting for all BPF map types - Adds support for using {FOU,GUE} encap with an ipip device operating in collect_md mode and add a set of BPF kfuncs for controlling encap params - Allow BPF programs to detect at load time whether a particular kfunc exists or not, and also add support for this in light skeleton - Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities - Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps - Enable RCU semantics for task BPF kptrs and allow referenced kptr tasks to be stored in BPF maps - Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree - Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them - Add ARM32 USDT support to libbpf - Improve bpftool's visual program dump which produces the control flow graph in a DOT format by adding C source inline annotations Protocols: - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value indicates the provenance of the IP address - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition - Add the handshake upcall mechanism, allowing the user-space to implement generic TLS handshake on kernel's behalf - Bridge: support per-{Port, VLAN} neighbor suppression, increasing resilience to nodes failures - SCTP: add support for Fair Capacity and Weighted Fair Queueing schedulers - MPTCP: delay first subflow allocation up to its first usage. This will allow for later better LSM interaction - xfrm: Remove inner/outer modes from input/output path. These are not needed anymore - WiFi: - reduced neighbor report (RNR) handling for AP mode - HW timestamping support - support for randomized auth/deauth TA for PASN privacy - per-link debugfs for multi-link - TC offload support for mac80211 drivers - mac80211 mesh fast-xmit and fast-rx support - enable Wi-Fi 7 (EHT) mesh support Netfilter: - Add nf_tables 'brouting' support, to force a packet to be routed instead of being bridged - Update bridge netfilter and ovs conntrack helpers to handle IPv6 Jumbo packets properly, i.e. fetch the packet length from hop-by-hop extension header. This is needed for BIT TCP support - The iptables 32bit compat interface isn't compiled in by default anymore - Move ip(6)tables builtin icmp matches to the udptcp one. This has the advantage that icmp/icmpv6 match doesn't load the iptables/ip6tables modules anymore when iptables-nft is used - Extended netlink error report for netdevice in flowtables and netdev/chains. Allow for incrementally add/delete devices to netdev basechain. Allow to create netdev chain without device Driver API: - Remove redundant Device Control Error Reporting Enable, as PCI core has already error reporting enabled at enumeration time - Move Multicast DB netlink handlers to core, allowing devices other then bridge to use them - Allow the page_pool to directly recycle the pages from safely localized NAPI - Implement lockless TX queue stop/wake combo macros, allowing for further code de-duplication and sanitization - Add YNL support for user headers and struct attrs - Add partial YNL specification for devlink - Add partial YNL specification for ethtool - Add tc-mqprio and tc-taprio support for preemptible traffic classes - Add tx push buf len param to ethtool, specifies the maximum number of bytes of a transmitted packet a driver can push directly to the underlying device - Add basic LED support for switch/phy - Add NAPI documentation, stop relaying on external links - Convert dsa_master_ioctl() to netdev notifier. This is a preparatory work to make the hardware timestamping layer selectable by user space - Add transceiver support and improve the error messages for CAN-FD controllers New hardware / drivers: - Ethernet: - AMD/Pensando core device support - MediaTek MT7981 SoC - MediaTek MT7988 SoC - Broadcom BCM53134 embedded switch - Texas Instruments CPSW9G ethernet switch - Qualcomm EMAC3 DWMAC ethernet - StarFive JH7110 SoC - NXP CBTX ethernet PHY - WiFi: - Apple M1 Pro/Max devices - RealTek rtl8710bu/rtl8188gu - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset - Bluetooth: - Realtek RTL8821CS, RTL8851B, RTL8852BS - Mediatek MT7663, MT7922 - NXP w8997 - Actions Semi ATS2851 - QTI WCN6855 - Marvell 88W8997 - Can: - STMicroelectronics bxcan stm32f429 Drivers: - Ethernet NICs: - Intel (1G, icg): - add tracking and reporting of QBV config errors - add support for configuring max SDU for each Tx queue - Intel (100G, ice): - refactor mailbox overflow detection to support Scalable IOV - GNSS interface optimization - Intel (i40e): - support XDP multi-buffer - nVidia/Mellanox: - add the support for linux bridge multicast offload - enable TC offload for egress and engress MACVLAN over bond - add support for VxLAN GBP encap/decap flows offload - extend packet offload to fully support libreswan - support tunnel mode in mlx5 IPsec packet offload - extend XDP multi-buffer support - support MACsec VLAN offload - add support for dynamic msix vectors allocation - drop RX page_cache and fully use page_pool - implement thermal zone to report NIC temperature - Netronome/Corigine: - add support for multi-zone conntrack offload - Solarflare/Xilinx: - support offloading TC VLAN push/pop actions to the MAE - support TC decap rules - support unicast PTP - Other NICs: - Broadcom (bnxt): enforce software based freq adjustments only on shared PHC NIC - RealTek (r8169): refactor to addess ASPM issues during NAPI poll - Micrel (lan8841): add support for PTP_PF_PEROUT - Cadence (macb): enable PTP unicast - Engleder (tsnep): add XDP socket zero-copy support - virtio-net: implement exact header length guest feature - veth: add page_pool support for page recycling - vxlan: add MDB data path support - gve: add XDP support for GQI-QPL format - geneve: accept every ethertype - macvlan: allow some packets to bypass broadcast queue - mana: add support for jumbo frame - Ethernet high-speed switches: - Microchip (sparx5): Add support for TC flower templates - Ethernet embedded switches: - Broadcom (b54): - configure 6318 and 63268 RGMII ports - Marvell (mv88e6xxx): - faster C45 bus scan - Microchip: - lan966x: - add support for IS1 VCAP - better TX/RX from/to CPU performances - ksz9477: add ETS Qdisc support - ksz8: enhance static MAC table operations and error handling - sama7g5: add PTP capability - NXP (ocelot): - add support for external ports - add support for preemptible traffic classes - Texas Instruments: - add CPSWxG SGMII support for J7200 and J721E - Intel WiFi (iwlwifi): - preparation for Wi-Fi 7 EHT and multi-link support - EHT (Wi-Fi 7) sniffer support - hardware timestamping support for some devices/firwmares - TX beacon protection on newer hardware - Qualcomm 802.11ax WiFi (ath11k): - MU-MIMO parameters support - ack signal support for management packets - RealTek WiFi (rtw88): - SDIO bus support - better support for some SDIO devices (e.g. MAC address from efuse) - RealTek WiFi (rtw89): - HW scan support for 8852b - better support for 6 GHz scanning - support for various newer firmware APIs - framework firmware backwards compatibility - MediaTek WiFi (mt76): - P2P support - mesh A-MSDU support - EHT (Wi-Fi 7) support - coredump support" * tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits) net: phy: hide the PHYLIB_LEDS knob net: phy: marvell-88x2222: remove unnecessary (void*) conversions tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. net: amd: Fix link leak when verifying config failed net: phy: marvell: Fix inconsistent indenting in led_blink_set lan966x: Don't use xdp_frame when action is XDP_TX tsnep: Add XDP socket zero-copy TX support tsnep: Add XDP socket zero-copy RX support tsnep: Move skb receive action to separate function tsnep: Add functions for queue enable/disable tsnep: Rework TX/RX queue initialization tsnep: Replace modulo operation with mask net: phy: dp83867: Add led_brightness_set support net: phy: Fix reading LED reg property drivers: nfc: nfcsim: remove return value check of `dev_dir` net: phy: dp83867: Remove unnecessary (void*) conversions net: ethtool: coalesce: try to make user settings stick twice net: mana: Check if netdev/napi_alloc_frag returns single page net: mana: Rename mana_refill_rxoob and remove some empty lines net: veth: add page_pool stats ...
2023-04-26Merge branch 'for-6.4/tests' into for-linusJiri Kosina
- import of bunch of HID selftests from out-of-tree hid-tools project (Benjamin Tissoires)
2023-04-26Merge tag 'kvm-x86-selftests-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM selftests, and an AMX/XCR0 bugfix, for 6.4: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Overhaul the AMX selftests to improve coverage and cleanup the test - Misc cleanups
2023-04-26Merge tag 'kvm-x86-pmu-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 PMU changes for 6.4: - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - Misc cleanups and fixes
2023-04-26Merge tag 'kvmarm-6.4' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.4 - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements.