linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-02-13	selftests/resctrl: Remove unnecessary __u64 -> unsigned long conversion	Ilpo Järvinen
	Perf counters are __u64 but the code converts them to unsigned long before printing them out. Remove unnecessary type conversion and retain the perf originating value as __u64. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Split show_cache_info() to test specific and generic parts	Ilpo Järvinen
	show_cache_info() calculates results and provides generic cache information. This makes it hard to alter pass/fail conditions. Separate the test specific checks into CAT and CMT test files and leave only the generic information part into show_cache_info(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Split measure_cache_vals()	Ilpo Järvinen
	measure_cache_vals() does a different thing depending on the test case that called it: - For CAT, it measures LLC misses through perf. - For CMT, it measures LLC occupancy through resctrl. Split these two functionalities into own functions the CAT and CMT tests can call directly. Replace passing the struct resctrl_val_param parameter with the filename because it's more generic and all those functions need out of resctrl_val. Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Exclude shareable bits from schemata in CAT test	Ilpo Järvinen
	CAT test doesn't take shareable bits into account, i.e., the test might be sharing cache with some devices (e.g., graphics). Introduce get_mask_no_shareable() and use it to provision an environment for CAT test where the allocated LLC is isolated better. Excluding shareable_bits may create hole(s) into the cbm_mask, thus add a new helper count_contiguous_bits() to find the longest contiguous set of CBM bits. create_bit_mask() is needed by an upcoming CAT test rewrite so make it available in resctrl.h right away. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Create cache_portion_size() helper	Ilpo Järvinen
	CAT and CMT tests calculate size of the cache portion for the n-bits cache allocation on their own. Add cache_portion_size() helper that calculates size of the cache portion for the given number of bits and use it to replace the existing span calculations. This also prepares for the new CAT test that will need to determine the size of the cache portion also during results processing. Rename also 'cache_size' local variables to 'cache_total_size' to prevent misinterpretations. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Mark get_cache_size() cache_type const	Ilpo Järvinen
	get_cache_size() does not modify cache_type so it could be const. Mark cache_type const so that const char * can be passed to it. This prevents warnings once many of the test parameters are marked const. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Refactor get_cbm_mask() and rename to get_full_cbm()	Ilpo Järvinen
	Callers of get_cbm_mask() are required to pass a string into which the capacity bitmask (CBM) is read. Neither CAT nor CMT tests need the bitmask as string but just convert it into an unsigned long value. Another limitation is that the bit mask reader can only read .../cbm_mask files. Generalize the bit mask reading function into get_bit_mask() such that it can be used to handle other files besides the .../cbm_mask and handles the unsigned long conversion within get_bit_mask() using fscanf(). Change get_cbm_mask() to use get_bit_mask() and rename it to get_full_cbm() to better indicate what the function does. Return error from get_full_cbm() if the bitmask is zero for some reason because it makes the code more robust as the selftests naturally assume the bitmask has some bits. Also mark cache_type const while at it and remove useless comments that are related to processing of CBM bits. Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Refactor fill_buf functions	Ilpo Järvinen
	There are unnecessary nested calls in fill_buf.c: - run_fill_buf() calls fill_cache() - alloc_buffer() calls malloc_and_init_memory() Simplify the code flow and remove those unnecessary call levels by moving the called code inside the calling function and remove the duplicated error print. Resolve the difference in run_fill_buf() and fill_cache() parameter name into 'buf_size' which is more descriptive than 'span'. Also, while moving the allocation related code, rename 'p' into 'buf' to be consistent in naming the variables. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Split fill_buf to allow tests finer-grained control	Ilpo Järvinen
	MBM, MBA and CMT test cases call run_fill_buf() that in turn calls fill_cache() to alloc and loop indefinitely around the buffer. This binds buffer allocation and running the benchmark into a single bundle so that a selftest cannot allocate a buffer once and reuse it. CAT test doesn't want to loop around the buffer continuously and after rewrite it needs the ability to allocate the buffer separately. Split buffer allocation out of fill_cache() into alloc_buffer(). This change is part of preparation for the new CAT test that allocates a buffer and does multiple passes over the same buffer (but not in an infinite loop). Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Change function comments to say < 0 on error	Ilpo Järvinen
	A number function comments state the function return non-zero on failure but in reality they can only return 0 on success and < 0 on error. Update the comments to say < 0 on error to match the behavior. While at it, improve cat_val() comment to state that 0 means the test was run (either pass or fail). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Don't use ctrlc_handler() outside signal handling	Ilpo Järvinen
	perf_event_open_llc_miss() calls ctrlc_handler() to cleanup if perf_event_open() returns an error. Those cleanups, however, are not the responsibility of perf_event_open_llc_miss() and it thus interferes unnecessarily with the usual cleanup pattern. Worse yet, ctrlc_handler() calls exit() in the end preventing the ordinary cleanup done in the calling function from executing. ctrlc_handler() should only be used as a signal handler, not during normal error handling. Remove call to ctrlc_handler() from perf_event_open_llc_miss(). As unmounting resctrlfs and test cleanup are already handled properly by error rollbacks in the calling functions, no other changes are necessary. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Return -1 instead of errno on error	Ilpo Järvinen
	A number of functions in the resctrl selftests return errno. It is problematic because errno is positive which is often counterintuitive. Also, every site returning errno prints the error message already with ksft_perror() so there is not much added value in returning the precise error code. Simply convert all places returning errno to return -1 that is typical userspace error code in case of failures. While at it, improve resctrl_val() comment to state that 0 means the test was run (either pass or fail). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Convert perror() to ksft_perror() or ksft_print_msg()	Ilpo Järvinen
	The resctrl selftest code contains a number of perror() calls. Some of them come with hash character and some don't. The kselftest framework provides ksft_perror() that is compatible with test output formatting so it should be used instead of adding custom hash signs. Some perror() calls are too far away from anything that sets error. For those call sites, ksft_print_msg() must be used instead. Convert perror() to ksft_perror() or ksft_print_msg(). Other related changes: - Remove hash signs - Remove trailing stops & newlines from ksft_perror() - Add terminating newlines for converted ksft_print_msg() - Use consistent capitalization - Small fixes/tweaks to typos & grammar of the messages - Extract error printing out of PARENT_EXIT() to be able to differentiate Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	libbpf: Add support to GCC in CORE macro definitions	Cupertino Miranda
	Due to internal differences between LLVM and GCC the current implementation for the CO-RE macros does not fit GCC parser, as it will optimize those expressions even before those would be accessible by the BPF backend. As examples, the following would be optimized out with the original definitions: - As enums are converted to their integer representation during parsing, the IR would not know how to distinguish an integer constant from an actual enum value. - Types need to be kept as temporary variables, as the existing type casts of the 0 address (as expanded for LLVM), are optimized away by the GCC C parser, never really reaching GCCs IR. Although, the macros appear to add extra complexity, the expanded code is removed from the compilation flow very early in the compilation process, not really affecting the quality of the generated assembly. Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240213173543.1397708-1-cupertino.miranda@oracle.com
2024-02-13	bpf: Abstract loop unrolling pragmas in BPF selftests	Jose E. Marchesi
	[Changes from V1: - Avoid conflict by rebasing with latest master.] Some BPF tests use loop unrolling compiler pragmas that are clang specific and not supported by GCC. These pragmas, along with their GCC equivalences are: #pragma clang loop unroll_count(N) #pragma GCC unroll N #pragma clang loop unroll(full) #pragma GCC unroll 65534 #pragma clang loop unroll(disable) #pragma GCC unroll 1 #pragma unroll [aka #pragma clang loop unroll(enable)] There is no GCC equivalence to this pragma. It enables unrolling on loops that the compiler would not ordinarily unroll even with -O2\|-funroll-loops, but it is not equivalent to full unrolling either. This patch adds a new header progs/bpf_compiler.h that defines the following macros, which correspond to each pair of compiler-specific pragmas above: __pragma_loop_unroll_count(N) __pragma_loop_unroll_full __pragma_loop_no_unroll __pragma_loop_unroll The selftests using loop unrolling pragmas are then changed to include the header and use these macros in place of the explicit pragmas. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240208203612.29611-1-jose.marchesi@oracle.com
2024-02-13	selftests/bpf: Ensure fentry prog cannot attach to bpf_spin_{lock,unlcok}()	Yonghong Song
	Add two tests to ensure fentry programs cannot attach to bpf_spin_{lock,unlock}() helpers. The tracing_failure.c files can be used in the future for other tracing failure cases. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240207070107.335341-1-yonghong.song@linux.dev
2024-02-13	selftests: net: more pmtu.sh fixes	Paolo Abeni
	The netdev CI is reporting failures for the pmtu test: [ 115.929264] br0: port 2(vxlan_a) entered forwarding state # 2024/02/08 17:33:22 socat[7871] E bind(7, {AF=10 [0000:0000:0000:0000:0000:0000:0000:0000]:50000}, 28): Address already in use # 2024/02/08 17:33:22 socat[7877] E write(7, 0x5598fb6ff000, 8192): Connection refused # TEST: IPv6, bridged vxlan4: PMTU exceptions [FAIL] # File size 0 mismatches exepcted value in locally bridged vxlan test The root cause is apparently a socket created by a previous iteration of the relevant loop still lasting in LAST_ACK state. Note that even the file size check is racy, the receiver process dumping the file could still be running in background Allow the listener to bound on the same local port via SO_REUSEADDR and collect file output file size only after the listener completion. Fixes: 136a1b434bbb ("selftests: net: test vxlan pmtu exceptions with tcp") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/4f51c11a1ce7ca7a4dabd926cffff63dadac9ba1.1707731086.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13	selftests: net: more strict check in net_helper	Paolo Abeni
	The helper waiting for a listener port can match any socket whose hexadecimal representation of source or destination addresses matches that of the given port. Additionally, any socket state is accepted. All the above can let the helper return successfully before the relevant listener is actually ready, with unexpected results. So far I could not find any related failure in the netdev CI, but the next patch is going to make the critical event more easily reproducible. Address the issue matching the port hex only vs the relevant socket field and additionally checking the socket state for TCP sockets. Fixes: 3bdd9fd29cb0 ("selftests/net: synchronize udpgro tests' tx and rx connection") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/192b3dbc443d953be32991d1b0ca432bd4c65008.1707731086.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13	selftests: net: cope with slow env in so_txtime.sh test	Paolo Abeni
	The mentioned test is failing in slow environments: # SO_TXTIME ipv4 clock monotonic # ./so_txtime: recv: timeout: Resource temporarily unavailable not ok 1 selftests: net: so_txtime.sh # exit=1 Tuning the tolerance in the test binary is error-prone and doomed to failures is slow-enough environment. Just resort to suppress any error in such cases. Note to suppress them we need first to refactor a bit the code moving it to explicit error handling. Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/2142d9ed4b5c5aa07dd1b455779625d91b175373.1707730902.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13	selftests: net: cope with slow env in gro.sh test	Paolo Abeni
	The gro self-tests sends the packets to be aggregated with multiple write operations. When running is slow environment, it's hard to guarantee that the GRO engine will wait for the last packet in an intended train. The above causes almost deterministic failures in our CI for the 'large' test-case. Address the issue explicitly ignoring failures for such case in slow environments (KSFT_MACHINE_SLOW==true). Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/97d3ba83f5a2bfeb36f6bc0fb76724eb3dafb608.1707729403.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13	KVM: selftests: Print timer ctl register in ISTATUS assertion	Oliver Upton
	Zenghui noted that the test assertion for the ISTATUS bit is printing the current timer value instead of the control register in the case of failure. While the assertion is sound, printing CNT isn't informative. Change things around to actually print the CTL register value instead. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Closes: https://lore.kernel.org/kvmarm/3188e6f1-f150-f7d0-6c2b-5b7608b0b012@huawei.com/ Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev> Link: https://lore.kernel.org/r/20240212210932.3095265-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-12	selftests: net: ip_local_port_range: define IPPROTO_MPTCP	Maxim Galaganov
	Older glibc's netinet/in.h may leave IPPROTO_MPTCP undefined when building ip_local_port_range.c, that leads to "error: use of undeclared identifier 'IPPROTO_MPTCP'". Define IPPROTO_MPTCP in such cases, just like in other MPTCP selftests. Fixes: 122db5e3634b ("selftests/net: add MPTCP coverage for IP_LOCAL_PORT_RANGE") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/netdev/CA+G9fYvGO5q4o_Td_kyQgYieXWKw6ktMa-Q0sBu6S-0y3w2aEQ@mail.gmail.com/ Signed-off-by: Maxim Galaganov <max@internet.ru> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/r/20240209132512.254520-1-max@internet.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-12	KVM: selftests: Fix GUEST_PRINTF() format warnings in ARM code	Sean Christopherson
	Fix a pile of -Wformat warnings in the KVM ARM selftests code, almost all of which are benign "long" versus "long long" issues (selftests are 64-bit only, and the guest printf code treats "ll" the same as "l"). The code itself isn't problematic, but the warnings make it impossible to build ARM selftests with -Werror, which does detect real issues from time to time. Opportunistically have GUEST_ASSERT_BITMAP_REG() interpret set_expected, which is a bool, as an unsigned decimal value, i.e. have it print '0' or '1' instead of '0x0' or '0x1'. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240202234603.366925-1-seanjc@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-12	perf maps: Locking tidy up of nr_maps	Ian Rogers
	After this change maps__nr_maps is only used by tests, existing users are migrated to maps__empty. Compute maps__empty under the read lock. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-7-irogers@google.com
2024-02-12	perf maps: Hide maps internals	Ian Rogers
	Move the struct into the C file. Add maps__equal to work around exposing the struct for reference count checking. Add accessors for the unwind_libunwind_ops. Move maps_list_node to its only use in symbol.c. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-6-irogers@google.com
2024-02-12	perf maps: Get map before returning in maps__find_next_entry	Ian Rogers
	Finding a map is done under a lock, returning the map without a reference count means it can be removed without notice and causing uses after free. Grab a reference count to the map within the lock region and return this. Fix up locations that need a map__put following this. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-5-irogers@google.com
2024-02-12	perf maps: Get map before returning in maps__find_by_name	Ian Rogers
	Finding a map is done under a lock, returning the map without a reference count means it can be removed without notice and causing uses after free. Grab a reference count to the map within the lock region and return this. Fix up locations that need a map__put following this. Also fix some reference counted pointer comparisons. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-4-irogers@google.com
2024-02-12	perf maps: Get map before returning in maps__find	Ian Rogers
	Finding a map is done under a lock, returning the map without a reference count means it can be removed without notice and causing uses after free. Grab a reference count to the map within the lock region and return this. Fix up locations that need a map__put following this. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-3-irogers@google.com
2024-02-12	perf maps: Switch from rbtree to lazily sorted array for addresses	Ian Rogers
	Maps is a collection of maps primarily sorted by the starting address of the map. Prior to this change the maps were held in an rbtree requiring 4 pointers per node. Prior to reference count checking, the rbnode was embedded in the map so 3 pointers per node were necessary. This change switches the rbtree to an array lazily sorted by address, much as the array sorting nodes by name. 1 pointer is needed per node, but to avoid excessive resizing the backing array may be twice the number of used elements. Meaning the memory overhead is roughly half that of the rbtree. For a perf record with "--no-bpf-event -g -a" of true, the memory overhead of perf inject is reduce fom 3.3MB to 3MB, so 10% or 300KB is saved. Map inserts always happen at the end of the array. The code tracks whether the insertion violates the sorting property. O(log n) rb-tree complexity is switched to O(1). Remove slides the array, so O(log n) rb-tree complexity is degraded to O(n). A find may need to sort the array using qsort which is O(n*log n), but in general the maps should be sorted and so average performance should be O(log n) as with the rbtree. An rbtree node consumes a cache line, but with the array 4 nodes fit on a cache line. Iteration is simplified to scanning an array rather than pointer chasing. Overall it is expected the performance after the change should be comparable to before, but with half of the memory consumed. To avoid a list and repeated logic around splitting maps, maps__merge_in is rewritten in terms of maps__fixup_overlap_and_insert. maps_merge_in splits the given mapping inserting remaining gaps. maps__fixup_overlap_and_insert splits the existing mappings, then adds the incoming mapping. By adding the new mapping first, then re-inserting the existing mappings the splitting behavior matches. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-2-irogers@google.com
2024-02-12	Merge branch 'perf-tools' into perf-tools-next	Namhyung Kim
	To get some fixes in the perf test and JSON metrics into the development branch. Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-02-12	selftests/net: Adding test cases of replacing routes and route advertisements.	Kui-Feng Lee
	Add tests of changing permanent routes to temporary routes and the reversed case to make sure GC working correctly in these cases. Add tests for the temporary routes from RA. The existing device will be deleted between tests to remove all routes associated with it, so that the earlier tests don't mess up the later ones. Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Tested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12	selftests: net: ignore timing errors in txtimestamp if KSFT_MACHINE_SLOW	Paolo Abeni
	This test is time sensitive. It may fail on virtual machines and for debug builds. Similar to commit c41dfb0dfbec ("selftests/net: ignore timing errors in so_txtime if KSFT_MACHINE_SLOW"), optionally suppress failure for timing errors (only). Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12	tools/rtla: Exit with EXIT_SUCCESS when help is invoked	John Kacur
	Fix rtla so that the following commands exit with 0 when help is invoked rtla osnoise top -h rtla osnoise hist -h rtla timerlat top -h rtla timerlat hist -h Link: https://lore.kernel.org/linux-trace-devel/20240203001607.69703-1-jkacur@redhat.com Cc: stable@vger.kernel.org Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode") Signed-off-by: John Kacur <jkacur@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-02-12	tools/rtla: Replace setting prio with nice for SCHED_OTHER	limingming3
	Since the sched_priority for SCHED_OTHER is always 0, it makes no sence to set it. Setting nice for SCHED_OTHER seems more meaningful. Link: https://lkml.kernel.org/r/20240207065142.1753909-1-limingming3@lixiang.com Cc: stable@vger.kernel.org Fixes: b1696371d865 ("rtla: Helper functions for rtla") Signed-off-by: limingming3 <limingming3@lixiang.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-02-12	tools/rv: Fix curr_reactor uninitialized variable	Daniel Bristot de Oliveira
	clang is reporting: $ make HOSTCC=clang CC=clang LLVM_IAS=1 clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -I include -c -o src/in_kernel.o src/in_kernel.c [...] src/in_kernel.c:227:6: warning: variable 'curr_reactor' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] 227 \| if (!end) \| ^~~~ src/in_kernel.c:242:9: note: uninitialized use occurs here 242 \| return curr_reactor; \| ^~~~~~~~~~~~ src/in_kernel.c:227:2: note: remove the 'if' if its condition is always false 227 \| if (!end) \| ^~~~~~~~~ 228 \| goto out_free; \| ~~~~~~~~~~~~~ src/in_kernel.c:221:6: warning: variable 'curr_reactor' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] 221 \| if (!start) \| ^~~~~~ src/in_kernel.c:242:9: note: uninitialized use occurs here 242 \| return curr_reactor; \| ^~~~~~~~~~~~ src/in_kernel.c:221:2: note: remove the 'if' if its condition is always false 221 \| if (!start) \| ^~~~~~~~~~~ 222 \| goto out_free; \| ~~~~~~~~~~~~~ src/in_kernel.c:215:20: note: initialize the variable 'curr_reactor' to silence this warning 215 \| char *curr_reactor; \| ^ \| = NULL 2 warnings generated. Which is correct. Setting curr_reactor to NULL avoids the problem. Link: https://lkml.kernel.org/r/3a35551149e5ee0cb0950035afcb8082c3b5d05b.1707217097.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Donald Zickus <dzickus@redhat.com> Fixes: 6d60f89691fc ("tools/rv: Add in-kernel monitor interface") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-02-12	tools/rv: Fix Makefile compiler options for clang	Daniel Bristot de Oliveira
	The following errors are showing up when compiling rv with clang: $ make HOSTCC=clang CC=clang LLVM_IAS=1 [...] clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized $(pkg-config --cflags libtracefs) -I include -c -o src/utils.o src/utils.c clang: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument] warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option] 1 warning generated. clang -o rv -ggdb src/in_kernel.o src/rv.o src/trace.o src/utils.o $(pkg-config --libs libtracefs) src/in_kernel.o: file not recognized: file format not recognized clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [Makefile:110: rv] Error 1 Solve these issues by: - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang - informing the linker about -flto=auto Link: https://lkml.kernel.org/r/ed94a8ddc2ca8c8ef663cfb7ae9dd196c4a66b33.1707217097.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Fixes: 4bc4b131d44c ("rv: Add rv tool") Suggested-by: Donald Zickus <dzickus@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-02-12	tools/rtla: Remove unused sched_getattr() function	Daniel Bristot de Oliveira
	Clang is reporting: $ make HOSTCC=clang CC=clang LLVM_IAS=1 [...] clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -c -o src/utils.o src/utils.c src/utils.c:241:19: warning: unused function 'sched_getattr' [-Wunused-function] 241 \| static inline int sched_getattr(pid_t pid, struct sched_attr *attr, \| ^~~~~~~~~~~~~ 1 warning generated. Which is correct, so remove the unused function. Link: https://lkml.kernel.org/r/eaed7ba122c4ae88ce71277c824ef41cbf789385.1707217097.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Donald Zickus <dzickus@redhat.com> Fixes: b1696371d865 ("rtla: Helper functions for rtla") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-02-12	tools/rtla: Fix clang warning about mount_point var size	Daniel Bristot de Oliveira
	clang is reporting this warning: $ make HOSTCC=clang CC=clang LLVM_IAS=1 [...] clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -c -o src/utils.o src/utils.c src/utils.c:548:66: warning: 'fscanf' may overflow; destination buffer in argument 3 has size 1024, but the corresponding specifier may require size 1025 [-Wfortify-source] 548 \| while (fscanf(fp, "%s %" STR(MAX_PATH) "s %99s %s %d %d\n", mount_point, type) == 2) { \| ^ Increase mount_point variable size to MAX_PATH+1 to avoid the overflow. Link: https://lkml.kernel.org/r/1b46712e93a2f4153909514a36016959dcc4021c.1707217097.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Donald Zickus <dzickus@redhat.com> Fixes: a957cbc02531 ("rtla: Add -C cgroup support") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-02-12	tools/rtla: Fix uninitialized bucket/data->bucket_size warning	Daniel Bristot de Oliveira
	When compiling rtla with clang, I am getting the following warnings: $ make HOSTCC=clang CC=clang LLVM_IAS=1 [..] clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -c -o src/osnoise_hist.o src/osnoise_hist.c src/osnoise_hist.c:138:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] 138 \| if (data->bucket_size) \| ^~~~~~~~~~~~~~~~~ src/osnoise_hist.c:149:6: note: uninitialized use occurs here 149 \| if (bucket < entries) \| ^~~~~~ src/osnoise_hist.c:138:2: note: remove the 'if' if its condition is always true 138 \| if (data->bucket_size) \| ^~~~~~~~~~~~~~~~~~~~~~ 139 \| bucket = duration / data->bucket_size; src/osnoise_hist.c:132:12: note: initialize the variable 'bucket' to silence this warning 132 \| int bucket; \| ^ \| = 0 1 warning generated. [...] clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -c -o src/timerlat_hist.o src/timerlat_hist.c src/timerlat_hist.c:181:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] 181 \| if (data->bucket_size) \| ^~~~~~~~~~~~~~~~~ src/timerlat_hist.c:204:6: note: uninitialized use occurs here 204 \| if (bucket < entries) \| ^~~~~~ src/timerlat_hist.c:181:2: note: remove the 'if' if its condition is always true 181 \| if (data->bucket_size) \| ^~~~~~~~~~~~~~~~~~~~~~ 182 \| bucket = latency / data->bucket_size; src/timerlat_hist.c:175:12: note: initialize the variable 'bucket' to silence this warning 175 \| int bucket; \| ^ \| = 0 1 warning generated. This is a legit warning, but data->bucket_size is always > 0 (see timerlat_hist_parse_args()), so the if is not necessary. Remove the unneeded if (data->bucket_size) to avoid the warning. Link: https://lkml.kernel.org/r/6e1b1665cd99042ae705b3e0fc410858c4c42346.1707217097.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Donald Zickus <dzickus@redhat.com> Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode") Fixes: 829a6c0b5698 ("rtla/osnoise: Add the hist mode") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-02-12	tools/rtla: Fix Makefile compiler options for clang	Daniel Bristot de Oliveira
	The following errors are showing up when compiling rtla with clang: $ make HOSTCC=clang CC=clang LLVM_IAS=1 [...] clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized $(pkg-config --cflags libtracefs) -c -o src/utils.o src/utils.c clang: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument] warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option] 1 warning generated. clang -o rtla -ggdb src/osnoise.o src/osnoise_hist.o src/osnoise_top.o src/rtla.o src/timerlat_aa.o src/timerlat.o src/timerlat_hist.o src/timerlat_top.o src/timerlat_u.o src/trace.o src/utils.o $(pkg-config --libs libtracefs) src/osnoise.o: file not recognized: file format not recognized clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [Makefile:110: rtla] Error 1 Solve these issues by: - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang - informing the linker about -flto=auto Link: https://lore.kernel.org/linux-trace-kernel/567ac1b94effc228ce9a0225b9df7232a9b35b55.1707217097.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Fixes: 1a7b22ab15eb ("tools/rtla: Build with EXTRA_{C,LD}FLAGS") Suggested-by: Donald Zickus <dzickus@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-02-11	bpf: Allow compiler to inline most of bpf_local_storage_lookup()	Marco Elver
	In various performance profiles of kernels with BPF programs attached, bpf_local_storage_lookup() appears as a significant portion of CPU cycles spent. To enable the compiler generate more optimal code, turn bpf_local_storage_lookup() into a static inline function, where only the cache insertion code path is outlined Notably, outlining cache insertion helps avoid bloating callers by duplicating setting up calls to raw_spin_{lock,unlock}_irqsave() (on architectures which do not inline spin_lock/unlock, such as x86), which would cause the compiler produce worse code by deciding to outline otherwise inlinable functions. The call overhead is neutral, because we make 2 calls either way: either calling raw_spin_lock_irqsave() and raw_spin_unlock_irqsave(); or call __bpf_local_storage_insert_cache(), which calls raw_spin_lock_irqsave(), followed by a tail-call to raw_spin_unlock_irqsave() where the compiler can perform TCO and (in optimized uninstrumented builds) turns it into a plain jump. The call to __bpf_local_storage_insert_cache() can be elided entirely if cacheit_lockit is a false constant expression. Based on results from './benchs/run_bench_local_storage.sh' (21 trials, reboot between each trial; x86 defconfig + BPF, clang 16) this produces improvements in throughput and latency in the majority of cases, with an average (geomean) improvement of 8%: +---- Hashmap Control -------------------- \| \| + num keys: 10 \| : <before> \| <after> \| +-+ hashmap (control) sequential get +----------------------+---------------------- \| +- hits throughput \| 14.789 M ops/s \| 14.745 M ops/s ( ~ ) \| +- hits latency \| 67.679 ns/op \| 67.879 ns/op ( ~ ) \| +- important_hits throughput \| 14.789 M ops/s \| 14.745 M ops/s ( ~ ) \| \| + num keys: 1000 \| : <before> \| <after> \| +-+ hashmap (control) sequential get +----------------------+---------------------- \| +- hits throughput \| 12.233 M ops/s \| 12.170 M ops/s ( ~ ) \| +- hits latency \| 81.754 ns/op \| 82.185 ns/op ( ~ ) \| +- important_hits throughput \| 12.233 M ops/s \| 12.170 M ops/s ( ~ ) \| \| + num keys: 10000 \| : <before> \| <after> \| +-+ hashmap (control) sequential get +----------------------+---------------------- \| +- hits throughput \| 7.220 M ops/s \| 7.204 M ops/s ( ~ ) \| +- hits latency \| 138.522 ns/op \| 138.842 ns/op ( ~ ) \| +- important_hits throughput \| 7.220 M ops/s \| 7.204 M ops/s ( ~ ) \| \| + num keys: 100000 \| : <before> \| <after> \| +-+ hashmap (control) sequential get +----------------------+---------------------- \| +- hits throughput \| 5.061 M ops/s \| 5.165 M ops/s (+2.1%) \| +- hits latency \| 198.483 ns/op \| 194.270 ns/op (-2.1%) \| +- important_hits throughput \| 5.061 M ops/s \| 5.165 M ops/s (+2.1%) \| \| + num keys: 4194304 \| : <before> \| <after> \| +-+ hashmap (control) sequential get +----------------------+---------------------- \| +- hits throughput \| 2.864 M ops/s \| 2.882 M ops/s ( ~ ) \| +- hits latency \| 365.220 ns/op \| 361.418 ns/op (-1.0%) \| +- important_hits throughput \| 2.864 M ops/s \| 2.882 M ops/s ( ~ ) \| +---- Local Storage ---------------------- \| \| + num_maps: 1 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 33.005 M ops/s \| 39.068 M ops/s (+18.4%) \| +- hits latency \| 30.300 ns/op \| 25.598 ns/op (-15.5%) \| +- important_hits throughput \| 33.005 M ops/s \| 39.068 M ops/s (+18.4%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 37.151 M ops/s \| 44.926 M ops/s (+20.9%) \| +- hits latency \| 26.919 ns/op \| 22.259 ns/op (-17.3%) \| +- important_hits throughput \| 37.151 M ops/s \| 44.926 M ops/s (+20.9%) \| \| + num_maps: 10 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 32.288 M ops/s \| 38.099 M ops/s (+18.0%) \| +- hits latency \| 30.972 ns/op \| 26.248 ns/op (-15.3%) \| +- important_hits throughput \| 3.229 M ops/s \| 3.810 M ops/s (+18.0%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 34.473 M ops/s \| 41.145 M ops/s (+19.4%) \| +- hits latency \| 29.010 ns/op \| 24.307 ns/op (-16.2%) \| +- important_hits throughput \| 12.312 M ops/s \| 14.695 M ops/s (+19.4%) \| \| + num_maps: 16 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 32.524 M ops/s \| 38.341 M ops/s (+17.9%) \| +- hits latency \| 30.748 ns/op \| 26.083 ns/op (-15.2%) \| +- important_hits throughput \| 2.033 M ops/s \| 2.396 M ops/s (+17.9%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 34.575 M ops/s \| 41.338 M ops/s (+19.6%) \| +- hits latency \| 28.925 ns/op \| 24.193 ns/op (-16.4%) \| +- important_hits throughput \| 11.001 M ops/s \| 13.153 M ops/s (+19.6%) \| \| + num_maps: 17 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 28.861 M ops/s \| 32.756 M ops/s (+13.5%) \| +- hits latency \| 34.649 ns/op \| 30.530 ns/op (-11.9%) \| +- important_hits throughput \| 1.700 M ops/s \| 1.929 M ops/s (+13.5%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 31.529 M ops/s \| 36.110 M ops/s (+14.5%) \| +- hits latency \| 31.719 ns/op \| 27.697 ns/op (-12.7%) \| +- important_hits throughput \| 9.598 M ops/s \| 10.993 M ops/s (+14.5%) \| \| + num_maps: 24 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 18.602 M ops/s \| 19.937 M ops/s (+7.2%) \| +- hits latency \| 53.767 ns/op \| 50.166 ns/op (-6.7%) \| +- important_hits throughput \| 0.776 M ops/s \| 0.831 M ops/s (+7.2%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 21.718 M ops/s \| 23.332 M ops/s (+7.4%) \| +- hits latency \| 46.047 ns/op \| 42.865 ns/op (-6.9%) \| +- important_hits throughput \| 6.110 M ops/s \| 6.564 M ops/s (+7.4%) \| \| + num_maps: 32 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 14.118 M ops/s \| 14.626 M ops/s (+3.6%) \| +- hits latency \| 70.856 ns/op \| 68.381 ns/op (-3.5%) \| +- important_hits throughput \| 0.442 M ops/s \| 0.458 M ops/s (+3.6%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 17.111 M ops/s \| 17.906 M ops/s (+4.6%) \| +- hits latency \| 58.451 ns/op \| 55.865 ns/op (-4.4%) \| +- important_hits throughput \| 4.776 M ops/s \| 4.998 M ops/s (+4.6%) \| \| + num_maps: 100 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 5.281 M ops/s \| 5.528 M ops/s (+4.7%) \| +- hits latency \| 192.398 ns/op \| 183.059 ns/op (-4.9%) \| +- important_hits throughput \| 0.053 M ops/s \| 0.055 M ops/s (+4.9%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 6.265 M ops/s \| 6.498 M ops/s (+3.7%) \| +- hits latency \| 161.436 ns/op \| 152.877 ns/op (-5.3%) \| +- important_hits throughput \| 1.636 M ops/s \| 1.697 M ops/s (+3.7%) \| \| + num_maps: 1000 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 0.355 M ops/s \| 0.354 M ops/s ( ~ ) \| +- hits latency \| 2826.538 ns/op \| 2827.139 ns/op ( ~ ) \| +- important_hits throughput \| 0.000 M ops/s \| 0.000 M ops/s ( ~ ) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 0.404 M ops/s \| 0.403 M ops/s ( ~ ) \| +- hits latency \| 2481.190 ns/op \| 2487.555 ns/op ( ~ ) \| +- important_hits throughput \| 0.102 M ops/s \| 0.101 M ops/s ( ~ ) The on_lookup test in {cgrp,task}_ls_recursion.c is removed because the bpf_local_storage_lookup is no longer traceable and adding tracepoint will make the compiler generate worse code: https://lore.kernel.org/bpf/ZcJmok64Xqv6l4ZS@elver.google.com/ Signed-off-by: Marco Elver <elver@google.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240207122626.3508658-1-elver@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-10	Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7 issues or aren't considered to be needed in earlier kernel versions" * tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) nilfs2: fix potential bug in end_buffer_async_write mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup nilfs2: fix hang in nilfs_lookup_dirty_data_buffers() MAINTAINERS: Leo Yan has moved mm/zswap: don't return LRU_SKIP if we have dropped lru lock fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super mailmap: switch email address for John Moon mm: zswap: fix objcg use-after-free in entry destruction mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range() arch/arm/mm: fix major fault accounting when retrying under per-VMA lock selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page mm: memcg: optimize parent iteration in memcg_rstat_updated() nilfs2: fix data corruption in dsync block recovery for small block sizes mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get() exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand() getrusage: use sig->stats_lock rather than lock_task_sighand() getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() ...
2024-02-10	selftests: tls: use exact comparison in recv_partial	Jakub Kicinski
	This exact case was fail for async crypto and we weren't catching it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-09	perf srcline: Add missed addr2line closes	Ian Rogers
	The child_process for addr2line sets in and out to -1 so that pipes get created. It is the caller's responsibility to close the pipes, finish_command doesn't do it. Add the missed closes. Fixes: b3801e791231 ("perf srcline: Simplify addr2line subprocess") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240201001504.1348511-8-irogers@google.com
2024-02-09	work around gcc bugs with 'asm goto' with outputs	Linus Torvalds
	We've had issues with gcc and 'asm goto' before, and we created a 'asm_volatile_goto()' macro for that in the past: see commits 3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation bug") and a9f180345f53 ("compiler/gcc4: Make quirk for asm_volatile_goto() unconditional"). Then, much later, we ended up removing the workaround in commit 43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR 58670") because we no longer supported building the kernel with the affected gcc versions, but we left the macro uses around. Now, Sean Christopherson reports a new version of a very similar problem, which is fixed by re-applying that ancient workaround. But the problem in question is limited to only the 'asm goto with outputs' cases, so instead of re-introducing the old workaround as-is, let's rename and limit the workaround to just that much less common case. It looks like there are at least two separate issues that all hit in this area: (a) some versions of gcc don't mark the asm goto as 'volatile' when it has outputs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 which is easy to work around by just adding the 'volatile' by hand. (b) Internal compiler errors: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422 which are worked around by adding the extra empty 'asm' as a barrier, as in the original workaround. but the problem Sean sees may be a third thing since it involves bad code generation (not an ICE) even with the manually added 'volatile'. but the same old workaround works for this case, even if this feels a bit like voodoo programming and may only be hiding the issue. Reported-and-tested-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Andrew Pinski <quic_apinski@quicinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-09	perf stat: Support per-cluster aggregation	Yicong Yang
	Some platforms have 'cluster' topology and CPUs in the cluster will share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2 cache (for Intel Jacobsville). Currently parsing and building cluster topology have been supported since [1]. perf stat has already supported aggregation for other topologies like die or socket, etc. It'll be useful to aggregate per-cluster to find problems like L3T bandwidth contention. This patch add support for "--per-cluster" option for per-cluster aggregation. Also update the docs and related test. The output will be like: [root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5 Performance counter stats for 'system wide': S56-D0-CLS158 4 1,321,521,570 LLC-load S56-D0-CLS594 4 794,211,453 LLC-load S56-D0-CLS1030 4 41,623 LLC-load S56-D0-CLS1466 4 41,646 LLC-load S56-D0-CLS1902 4 16,863 LLC-load S56-D0-CLS2338 4 15,721 LLC-load S56-D0-CLS2774 4 22,671 LLC-load [...] On a legacy system without cluster or cluster support, the output will be look like: [root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1 Performance counter stats for 'system wide': S56-D0-CLS0 64 18,011,485 cycles S7182-D0-CLS0 64 16,548,835 cycles Note that this patch doesn't mix the cluster information in the outputs of --per-core to avoid breaking any tools/scripts using it. Note that perf recently supports "--per-cache" aggregation, but it's not the same with the cluster although cluster CPUs may share some cache resources. For example on my machine all clusters within a die share the same L3 cache: $ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list 0-31 $ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list 0-3 [1] commit c5e22feffdd7 ("topology: Represent clusters of CPUs within a die") Tested-by: Jie Zhan <zhanjie9@hisilicon.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: james.clark@arm.com Cc: 21cnbao@gmail.com Cc: prime.zeng@hisilicon.com Cc: Jonathan.Cameron@huawei.com Cc: fanghao11@huawei.com Cc: linuxarm@huawei.com Cc: tim.c.chen@intel.com Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240208024026.2691-1-yangyicong@huawei.com
2024-02-09	perf tools: Remove misleading comments on map functions	Namhyung Kim
	When it converts sample IP to or from objdump-capable one, there's a comment saying that kernel modules have DSO_SPACE__USER. But commit 02213cec64bb ("perf maps: Mark module DSOs with kernel type") changed it and makes the comment confusing. Let's get rid of it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240208181025.1329645-1-namhyung@kernel.org
2024-02-09	perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str()	Yang Jihong
	slist needs to be freed in both error path and normal path in thread_map__new_by_tid_str(). Fixes: b52956c961be3a04 ("perf tools: Allow multiple threads or processes in record, stat, top") Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-6-yangjihong1@huawei.com
2024-02-09	perf sched: Move curr_pid and cpu_last_switched initialization to ↵	Yang Jihong
	perf_sched__{lat\|map\|replay}() The curr_pid and cpu_last_switched are used only for the 'perf sched replay/latency/map'. Put their initialization in perf_sched__{lat\|map\|replay () to reduce unnecessary actions in other commands. Simple functional testing: # perf sched record perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.209 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 16.456 MB perf.data (147907 samples) ] # perf sched lat ------------------------------------------------------------------------------------------------------------------------------------------- Task \| Runtime ms \| Switches \| Avg delay ms \| Max delay ms \| Max delay start \| Max delay end \| ------------------------------------------------------------------------------------------------------------------------------------------- sched-messaging:(401) \| 2990.699 ms \| 38705 \| avg: 0.661 ms \| max: 67.046 ms \| max start: 456532.624830 s \| max end: 456532.691876 s qemu-system-x86:(7) \| 179.764 ms \| 2191 \| avg: 0.152 ms \| max: 21.857 ms \| max start: 456532.576434 s \| max end: 456532.598291 s sshd:48125 \| 0.522 ms \| 2 \| avg: 0.037 ms \| max: 0.046 ms \| max start: 456532.514610 s \| max end: 456532.514656 s <SNIP> ksoftirqd/11:82 \| 0.063 ms \| 1 \| avg: 0.005 ms \| max: 0.005 ms \| max start: 456532.769366 s \| max end: 456532.769371 s kworker/9:0-mm_:34624 \| 0.233 ms \| 20 \| avg: 0.004 ms \| max: 0.007 ms \| max start: 456532.690804 s \| max end: 456532.690812 s migration/13:93 \| 0.000 ms \| 1 \| avg: 0.004 ms \| max: 0.004 ms \| max start: 456532.512669 s \| max end: 456532.512674 s ----------------------------------------------------------------------------------------------------------------- TOTAL: \| 3180.750 ms \| 41368 \| --------------------------------------------------- # echo $? 0 # perf sched map A0 456532.510141 secs A0 => migration/0:15 . 456532.510171 secs . => swapper:0 . B0 456532.510261 secs B0 => migration/1:21 . . 456532.510279 secs <SNIP> L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 . . . . 456532.785979 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 . . . 456532.786054 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 . . 456532.786127 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 . 456532.786197 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 456532.786270 secs # echo $? 0 # perf sched replay run measurement overhead: 108 nsecs sleep measurement overhead: 66473 nsecs the run test took 1000002 nsecs the sleep test took 1082686 nsecs nr_run_events: 49334 nr_sleep_events: 50054 nr_wakeup_events: 34701 target-less wakeups: 165 multi-target wakeups: 766 task 0 ( swapper: 0), nr_events: 15419 task 1 ( swapper: 1), nr_events: 1 task 2 ( swapper: 2), nr_events: 1 <SNIP> task 715 ( sched-messaging: 110248), nr_events: 1438 task 716 ( sched-messaging: 110249), nr_events: 512 task 717 ( sched-messaging: 110250), nr_events: 500 task 718 ( sched-messaging: 110251), nr_events: 537 task 719 ( sched-messaging: 110252), nr_events: 823 ------------------------------------------------------------ #1 : 1325.288, ravg: 1325.29, cpu: 7823.35 / 7823.35 #2 : 1363.606, ravg: 1329.12, cpu: 7655.53 / 7806.56 #3 : 1349.494, ravg: 1331.16, cpu: 7544.80 / 7780.39 #4 : 1311.488, ravg: 1329.19, cpu: 7495.13 / 7751.86 #5 : 1309.902, ravg: 1327.26, cpu: 7266.65 / 7703.34 #6 : 1309.535, ravg: 1325.49, cpu: 7843.86 / 7717.39 #7 : 1316.482, ravg: 1324.59, cpu: 7854.41 / 7731.09 #8 : 1366.604, ravg: 1328.79, cpu: 7955.81 / 7753.57 #9 : 1326.286, ravg: 1328.54, cpu: 7466.86 / 7724.90 #10 : 1356.653, ravg: 1331.35, cpu: 7566.60 / 7709.07 # echo $? 0 Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-5-yangjihong1@huawei.com
2024-02-09	perf sched: Move curr_thread initialization to perf_sched__map()	Yang Jihong
	The curr_thread is used only for the 'perf sched map'. Put initialization in perf_sched__map() to reduce unnecessary actions in other commands. Simple functional testing: # perf sched record perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.197 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 15.526 MB perf.data (140095 samples) ] # perf sched map A0 451264.532445 secs A0 => migration/0:15 . 451264.532468 secs . => swapper:0 . B0 451264.532537 secs B0 => migration/1:21 . . 451264.532560 secs . . C0 451264.532644 secs C0 => migration/2:27 . . . 451264.532668 secs . . . D0 451264.532753 secs D0 => migration/3:33 . . . . 451264.532778 secs . . . . E0 451264.532861 secs E0 => migration/4:39 . . . . . 451264.532886 secs . . . . . F0 451264.532973 secs F0 => migration/5:45 <SNIP> A7 A7 A7 A7 A7 A7 . . . . . . . . . . 451264.790785 secs A7 A7 A7 A7 A7 A7 A7 . . . . . . . . . 451264.790858 secs A7 A7 A7 A7 A7 A7 A7 A7 . . . . . . . . 451264.790934 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 . . . . . . . 451264.791004 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 . . . . . . 451264.791075 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 . . . . . 451264.791143 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 . . . . 451264.791232 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 . . . 451264.791336 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 . . 451264.791407 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 . 451264.791484 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 451264.791553 secs # echo $? 0 Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-4-yangjihong1@huawei.com