linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2024-09-10	btrfs: convert struct btrfs_writepage_fixup to use a folio	Josef Bacik
	Now the fixup creator and consumer use folios, change this to use a folio as well. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert btrfs_writepage_cow_fixup() to use folio	Josef Bacik
	Instead of a page, use a folio for btrfs_writepage_cow_fixup. We already have a folio at the only caller, and the fixup worker uses folios. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert btrfs_writepage_fixup_worker() to use a folio	Josef Bacik
	This function heavily messes with pages, instead update it to use a folio. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert submit_uncompressed_range() to take a folio	Josef Bacik
	This mostly uses folios already, update it to take a folio and update the rest of the function to use the folio instead of the page. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert struct async_chunk to hold a folio	Josef Bacik
	Instead of passing in the page for ->locked_page, make it hold a locked_folio and then update the users of async_chunk to act accordingly. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert btrfs_run_delalloc_range() to take a folio	Josef Bacik
	Now that every function that btrfs_run_delalloc_range calls takes a folio, update it to take a folio and update the callers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert run_delalloc_compressed() to take a folio	Josef Bacik
	This just passes the page into the compressed machinery to keep track of the locked page. Update this to take a folio and convert it to a page where appropriate. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert btrfs_cleanup_ordered_extents() to take a folio	Josef Bacik
	Now that btrfs_cleanup_ordered_extents is operating mostly with folios, update it to use a folio instead of a page, and the update the function and the callers as appropriate. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert btrfs_cleanup_ordered_extents() to use folios	Josef Bacik
	We walk through pages in this function and clear ordered, and the function for this uses folios. Update the function to use a folio for this whole operation. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert run_delalloc_nocow() to take a folio	Josef Bacik
	Now all of the functions that use locked_page in run_delalloc_nocow take a folio, update it to take a folio and update the caller. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert fallback_to_cow() to take a folio	Josef Bacik
	With this we can pass the folio directly into cow_file_range(). Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert cow_file_range() to take a folio	Josef Bacik
	Convert this to take a folio and pass it into all of the various cleanup functions. Update the callers to pass in a folio instead. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert cow_file_range_inline() to take a folio	Josef Bacik
	Now that we want the folio in this function, convert it to take a folio directly and use that. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert run_delalloc_cow() to take a folio	Josef Bacik
	We pass the folio into extent_write_locked_range, go ahead and take a folio to pass along, and update the callers to pass in a folio. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert extent_write_locked_range() to take a folio	Josef Bacik
	This mostly uses folios, convert it to take a folio instead and update the callers to pass in the folio. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert extent_clear_unlock_delalloc() to take a folio	Josef Bacik
	Instead of taking the locked page, take the locked folio so we can pass that into __process_folios_contig. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert process_one_page() to operate only on folios	Josef Bacik
	Now that this mostly uses folios, update it to take folios, use the folios that are passed in, and rename from process_one_page => process_one_folio. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert __process_pages_contig() to take a folio	Josef Bacik
	This operates mostly on folios, update it to take a folio for the locked folio instead of the page, rename from __process_pages_contig => __process_folios_contig. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert __unlock_for_delalloc() to take a folio	Josef Bacik
	All of the callers have a folio at this point, update __unlock_for_delalloc to take a folio so that it's consistent with its callers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert lock_delalloc_pages() to take a folio	Josef Bacik
	Also rename lock_delalloc_pages => lock_delalloc_folios in the process, now that it exclusively works on folios. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert find_lock_delalloc_range() to use a folio	Josef Bacik
	Instead of passing in a page for locked_page, pass in the folio instead. We only use the folio itself to validate some range assumptions, and then pass it into other functions. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert writepage_delalloc() to take a folio	Josef Bacik
	We already use a folio heavily in this function, pass the folio in directly and use it everywhere, only passing the page down to functions that do not take a folio yet. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert btrfs_mark_ordered_io_finished() to take a folio	Josef Bacik
	We only need a folio now, make it take a folio as an argument and update all of the callers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert btrfs_finish_ordered_extent() to take a folio	Josef Bacik
	The callers and callee's of this now all use folios, update it to take a folio as well. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert can_finish_ordered_extent() to use a folio	Josef Bacik
	Pass in a folio instead, and use a folio instead of a page. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: utilize folio more in btrfs_page_mkwrite()	Josef Bacik
	We already have a folio that we're using in btrfs_page_mkwrite, update the rest of the function to use folio everywhere else. This will make it easier on Willy when he drops page->index. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert add_ra_bio_pages() to use only folios	Josef Bacik
	Willy is going to get rid of page->index, and add_ra_bio_pages uses page->index. Make his life easier by converting add_ra_bio_pages to use folios so that we are no longer using page->index. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert __extent_writepage() to be completely folio based	Josef Bacik
	Now that we've gotten most of the helpers updated to only take a folio, update __extent_writepage to only deal in folios. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert extent_write_locked_range() to use folios	Josef Bacik
	Instead of using pages for everything, find a folio and use that. This makes things a bit cleaner as a lot of the functions calls here all take folios. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert __extent_writepage_io() to take a folio	Josef Bacik
	__extent_writepage_io uses page everywhere, but a lot of these functions take a folio. Convert it to use the folio based helpers, and then change it to take a folio as an argument and update its callers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: update the writepage tracepoint to take a folio	Josef Bacik
	Willy is wanting to get rid of page->index, convert the writepage tracepoint to take a folio so we can do folio->index instead of page->index. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert btrfs_do_readpage() to only use a folio	Josef Bacik
	Now that the callers and helpers mostly use folio, convert btrfs_do_readpage to take a folio, and rename it to btrfs_do_read_folio. Update all of the page stuff to use the folio based helpers instead. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert submit_extent_page() to use a folio	Josef Bacik
	The callers of this helper are going to be converted to using a folio, so adjust submit_extent_page to become submit_extent_folio and update it to use all the relevant folio helpers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert begin_page_folio() to take a folio instead	Josef Bacik
	This already uses a folio internally, change it to take a folio as an argument instead. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert end_page_read() to take a folio	Josef Bacik
	We have this helper function to set the page range uptodate once we're done reading it, as well as run fsverity against it. Half of these functions already take a folio, just rename this to end_folio_read and then rework it to take a folio instead, and update everything accordingly. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert btrfs_read_folio() to only use a folio	Josef Bacik
	Currently we're using the page for everything here. Convert this to use the folio helpers instead. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: convert btrfs_readahead() to only use folio	Josef Bacik
	We're the only user of readahead_page_batch(). Convert btrfs_readahead() to use the folio based helpers to do readahead. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: print message on device opening error during mount	Li Zhang
	[ENHANCEMENT] When mounting a btrfs filesystem, the filesystem opens the block device, and if this fails, there is no message about it. Print a message about it to help debugging. [TEST] I have a btrfs filesystem on three block devices, one of which is write-protected, so regular mounts fail, but there is no message in dmesg. /dev/vdb normal /dev/vdc write protected /dev/vdd normal Before patch: $ sudo mount /dev/vdb /mnt/ mount: mount(2) failed: no such file or directory $ sudo dmesg # Show only messages about missing block devices .... [ 352.947196] BTRFS error (device vdb): devid 2 uuid 4ee2c625-a3b2-4fe0-b411-756b23e08533 missing .... After patch: $ sudo mount /dev/vdb /mnt/ mount: mount(2) failed: no such file or directory $ sudo dmesg # Show bdev_file_open_by_path failed. .... [ 352.944328] BTRFS error: failed to open device for path /dev/vdc with flags 0x3: -13 [ 352.947196] BTRFS error (device vdb): missing devid 2 uuid 4ee2c625-a3b2-4fe0-b411-756b23e08533 .... Signed-off-by: Li Zhang <zhanglikernel@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: move uuid tree related code to uuid-tree.[ch]	Qu Wenruo
	Functions btrfs_uuid_scan_kthread() and btrfs_create_uuid_tree() are for UUID tree rescan and creation, it's not suitable for volumes.[ch]. Move them to uuid-tree.[ch] instead. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: reduce size and overhead of extent_map_block_end()	Filipe Manana
	At extent_map_block_end() we are calling the inline functions extent_map_block_start() and extent_map_block_len() multiple times, which results in expanding their code multiple times, increasing the compiled code size and repeating the computations those functions do. Improve this by caching their results in local variables. The size of the module before this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1755770 163800 16920 1936490 1d8c6a fs/btrfs/btrfs.ko And after this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1755656 163800 16920 1936376 1d8bf8 fs/btrfs/btrfs.ko Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: update stripe_extent delete loop assumptions	Johannes Thumshirn
	btrfs_delete_raid_extent() was written under the assumption, that it's call-chain always passes a start, length tuple that matches a single extent. But btrfs_delete_raid_extent() is called by do_free_extent_accounting() which in turn is called by __btrfs_free_extent(). But this call-chain passes in a start address and a length that can possibly match multiple on-disk extents. To make this possible, we have to adjust the start and length of each btree node lookup, to not delete beyond the requested range. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: update stripe extents for existing logical addresses	Johannes Thumshirn
	Update a stripe extent in case of an already existing logical address, but with different physical addresses and/or device id instead of bailing out with EEXIST. This can happen i.e. in case of a device replace operation, where data extents get rewritten to a new disk. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	nvme-tcp: fix link failure for TCP auth	Arnd Bergmann
	The nvme fabric driver calls the nvme_tls_key_lookup() function from nvmf_parse_key() when the keyring is enabled, but this is broken in a configuration with CONFIG_NVME_FABRICS=y and CONFIG_NVME_TCP=m because this leads to the function definition being in a loadable module: x86_64-linux-ld: vmlinux.o: in function `nvmf_parse_key': fabrics.c:(.text+0xb1bdec): undefined reference to `nvme_tls_key_lookup' Move the 'select' up to CONFIG_NVME_FABRICS itself to force this part to be built-in as well if needed. Fixes: 5bc46b49c828 ("nvme-tcp: check for invalidated or revoked key") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-09-10	MAINTAINERS: update Pierre Bossart's email and role	Pierre-Louis Bossart
	Update to permanent address and Reviewer role. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20240910143021.261261-1-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-10	xen: add capability to remap non-RAM pages to different PFNs	Juergen Gross
	When running as a Xen PV dom0 it can happen that the kernel is being loaded to a guest physical address conflicting with the host memory map. In order to be able to resolve this conflict, add the capability to remap non-RAM areas to different guest PFNs. A function to use this remapping information for other purposes than doing the remap will be added when needed. As the number of conflicts should be rather low (currently only machines with max. 1 conflict are known), save the remap data in a small statically allocated array. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-10	ASoC: qcom: sm8250: enable primary mi2s	Jens Reidel
	When using primary mi2s on sm8250-compatible SoCs, the correct clock needs to get enabled to be able to use the mi2s interface. Signed-off-by: Jens Reidel <adrian@travitia.xyz> Tested-by: Danila Tikhonov <danila@jiaxyga.com> # sm7325-nothing-spacewar Link: https://patch.msgid.link/20240826134920.55148-2-adrian@travitia.xyz Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-10	perf trace: Add --force-btf for debugging	Howard Chu
	If --force-btf is enabled, prefer btf_dump general pretty printer to perf trace's customized pretty printers. Mostly for debug purposes. Committer testing: diff before/after shows we need several improvements to be able to compare the changes, first we need to cut off/disable mutable data such as pids and timestamps, then what is left are the buffer addresses passed from userspace, returned from kernel space, maybe we can ask 'perf trace' to go on making those reproducible. That would entail a Pointer Address Translation (PAT) like for networking, that would, for simple, reproducible if not for these details, workloads, that we would then use in our regression tests. Enough digression, this is one such diff: openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY\|CLOEXEC) = 3 -fstat(fd: 3, statbuf: 0x7fff01f212a0) = 0 -read(fd: 3, buf: 0x5596bab2d630, count: 4096) = 2998 -read(fd: 3, buf: 0x5596bab2d630, count: 4096) = 0 +fstat(fd: 3, statbuf: 0x7ffc163cf0e0) = 0 +read(fd: 3, buf: 0x55b4e0631630, count: 4096) = 2998 +read(fd: 3, buf: 0x55b4e0631630, count: 4096) = 0 close(fd: 3) = 0 openat(dfd: CWD, filename: "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) openat(dfd: CWD, filename: "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) @@ -45,7 +45,7 @@ openat(dfd: CWD, filename: "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) openat(dfd: CWD, filename: "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) openat(dfd: CWD, filename: "/usr/share/locale/en/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) -{ .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7fff01f21990) = 0 +(struct __kernel_timespec){.tv_sec = (__kernel_time64_t)1,}, rmtp: 0x7ffc163cf7d0) = The problem more close to our hands is to make the libbpf BTF pretty printer to have a mode that closely resembles what we're trying to resemble: strace output. Being able to run something with 'perf trace' and with 'strace' and get the exact same output should be of interest of anybody wanting to have strace and 'perf trace' regression tested against each other. That last part is 'perf trace' shot at being something so useful as strace... ;-) Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240824163322.60796-8-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10	perf trace: Collect augmented data using BPF	Howard Chu
	Include trace_augment.h for TRACE_AUG_MAX_BUF, so that BPF reads TRACE_AUG_MAX_BUF bytes of buffer maximum. Determine what type of argument and how many bytes to read from user space, us ing the value in the beauty_map. This is the relation of parameter type and its corres ponding value in the beauty map, and how many bytes we read eventually: string: 1 -> size of string (till null) struct: size of struct -> size of struct buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_ MAX_BUF) After reading from user space, we output the augmented data using bpf_perf_event_output(). If the struct augmenter, augment_sys_enter() failed, we fall back to using bpf_tail_call(). I have to make the payload 6 times the size of augmented_arg, to pass the BPF verifier. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-10-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-7-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10	perf trace: Pretty print buffer data	Howard Chu
	Define TRACE_AUG_MAX_BUF in trace_augment.h data, which is the maximum buffer size we can augment. BPF will include this header too. Print buffer in a way that's different than just printing a string, we print all the control characters in \digits (such as \0 for null, and \10 for newline, LF). For character that has a bigger value than 127, we print the digits instead of the character itself as well. Committer notes: Simplified the buffer scnprintf to avoid using multiple buffers as discussed in the patch review thread. We can't really all 'buf' args to SCA_BUF as we're collecting so far just on the sys_enter path, so we would be printing the previous 'read' arg buffer contents, not what the kernel puts there. So instead of: static int syscall_fmt__cmp(const void name, const void fmtp) @@ -1987,8 +1989,6 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt arg, struct tep_format_field - else if (strstr(field->type, "char ") && strstr(field->name, "buf")) - arg->scnprintf = SCA_BUF; Do: static const struct syscall_fmt syscall_fmts[] = { + { .name = "write", .errpid = true, + .arg = { [1] = { .scnprintf = SCA_BUF /* buf */, from_user = true, }, }, }, Signed-off-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-8-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-6-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10	perf trace: Pretty print struct data	Howard Chu
	Change the arg->augmented.args to arg->augmented.args->value to skip the header for customized pretty printers, since we collect data in BPF using the general augment_sys_enter(), which always adds the header. Use btf_dump API to pretty print augmented struct pointer. Prefer existed pretty-printer than btf general pretty-printer. set compact = true and skip_names = true, so that no newline character and argument name are printed. Committer notes: Simplified the btf_dump_snprintf callback to avoid using multiple buffers, as discussed in the thread accessible via the Link tag below. Also made it do: dump_data_opts.skip_names = !arg->trace->show_arg_names; I.e. show the type and struct field names according to that tunable, we probably need another tunable just for this, but for now if the user wants to see syscall names in addition to its value, it makes sense to see the struct field names according to that tunable. Committer testing: The following have explicitely set beautifiers (SCA_FILENAME, SCA_SOCKADDR and SCA_PERF_ATTR), SCA_FILENAME is here just because we have been wiring up the "renameat2" ("renameat" until recently), so it doesn't use the introduced generic fallback (btf_struct_scnprintf(), see the definition of SCA_PERF_ATTR, SCA_SOCKADDR to see the more feature rich beautifiers, that are not using BTF): root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654 0.000 ( 0.039 ms): mv/258478 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0 root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com 0.000 ( 0.014 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 0.040 ( 0.003 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a6980, len: 97, flags: DONTWAIT\|NOSIGNAL) = 97 18.742 ( 0.020 ms): ping/258481 sendto(fd: 5, buff: 0x7ffc04768df0, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20 PING www.google.com (142.251.129.68) 56(84) bytes of data. 18.783 ( 0.012 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0 18.797 ( 0.001 ms): ping/258481 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0 18.815 ( 0.002 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.129.68 }, addrlen: 16) = 0 18.862 ( 0.023 ms): ping/258481 sendto(fd: 3, buff: 0x55bc317a0ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.129.68 }, addr_len: 0x10) = 64 63.330 ( 0.038 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 63.435 ( 0.010 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a8340, len: 110, flags: DONTWAIT\|NOSIGNAL) = 110 64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.2 ms --- www.google.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 44.158/44.158/44.158/0.000 ms root@number:~# perf trace -e perf_event_open perf stat -e instructions,cache-misses,syscalls:sys_entersleep sleep 1.23456789 0.000 ( 0.010 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0xa00000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER\|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599052088, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 0.016 ( 0.002 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0x400000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER\|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599044082, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 1.838 ( 0.006 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5 1.846 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 6 1.849 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 7 1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9 1.853 ( 0.600 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x190 (syscalls:sys_enter_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 10 2.456 ( 0.016 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x196 (syscalls:sys_enter_clock_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 11 Performance counter stats for 'sleep 1.23456789': 1,402,839 cpu_atom/instructions/ <not counted> cpu_core/instructions/ (0.00%) 11,066 cpu_atom/cache-misses/ <not counted> cpu_core/cache-misses/ (0.00%) 0 syscalls:sys_enter_nanosleep 1 syscalls:sys_enter_clock_nanosleep 1.236246714 seconds time elapsed 0.000000000 seconds user 0.001308000 seconds sys root@number:~# Now if we use it even for the ones we have a specific beautifier in tools/perf/trace/beauty, i.e. use btf_struct_scnprintf() for all structs, by adding the following patch: @@ -2316,7 +2316,7 @@ static size_t syscall__scnprintf_args(struct syscall sc, char bf, size_t size, default_scnprintf = sc->arg_fmt[arg.idx].scnprintf; - if (default_scnprintf == NULL \|\| default_scnprintf == SCA_PTR) { + if (1 \|\| (default_scnprintf == NULL \|\| default_scnprintf == SCA_PTR)) { btf_printed = trace__btf_scnprintf(trace, &arg, bf + printed, size - printed, val, field->type); if (btf_printed) { We get: root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com PING www.google.com (142.251.129.68) 56(84) bytes of data. 0.000 ( 0.015 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0 0.046 ( 0.004 ms): ping/283259 sendto(fd: 5, buff: 0x559b008ae980, len: 97, flags: DONTWAIT\|NOSIGNAL) = 97 0.353 ( 0.012 ms): ping/283259 sendto(fd: 5, buff: 0x7ffc01294960, len: 20, addr: (struct sockaddr){.sa_family = (sa_family_t)16,}, addr_len: 0xc) = 20 0.377 ( 0.006 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addrlen: 16) = 0 0.388 ( 0.010 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)10,}, addrlen: 28) = 0 0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0 0.425 ( 0.045 ms): ping/283259 sendto(fd: 3, buff: 0x559b008a8ac0, len: 64, addr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addr_len: 0x10) = 64 64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.1 ms --- www.google.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 44.113/44.113/44.113/0.000 ms 44.849 ( 0.038 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0 44.927 ( 0.006 ms): ping/283259 sendto(fd: 5, buff: 0x559b008b03d0, len: 110, flags: DONTWAIT\|NOSIGNAL) = 110 root@number:~# Which looks sane, i.e.: 18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0 Becomes: 0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0 And. #define AF_UNIX 1 /* Unix domain sockets / #define AF_LOCAL 1 / POSIX name for AF_UNIX / #define AF_INET 2 / Internet IP Protocol / <SNIP> #define AF_INET6 10 / IP version 6 */ And 'D' == 68, so the preexisting sockaddr BPF collector is working with the new generic BTF pretty printer (btf_struct_scnprintf()), its just that it doesn't know about 'struct sockaddr' besides what is in BTF, i.e. its an array of bytes, not an IPv4 address that needs extra massaging. Ditto for the 'struct perf_event_attr' case: 1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9 Becomes: 2.081 ( 0.002 ms): :283304/283304 perf_event_open(attr_uptr: (struct perf_event_attr){.size = (__u32)136,.config = (__u64)17179869187,.sample_type = (__u64)65536,.read_format = (__u64)3,.disabled = (__u64)0x1,.inherit = (__u64)0x1,.enable_on_exec = (__u64)0x1,.exclude_guest = (__u64)0x1,}, pid: 283305 (sleep), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9 hex(17179869187) = 0x400000003, etc. read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING is enum perf_event_read_format { PERF_FORMAT_TOTAL_TIME_ENABLED = 1U << 0, PERF_FORMAT_TOTAL_TIME_RUNNING = 1U << 1, and so on. We need to work with the libbpf btf dump api to get one output that matches the 'perf trace'/strace expectations/format, but having this in this current form is already an improvement to 'perf trace', so lets improve from what we have. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-7-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-5-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>