summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-09-10btrfs: convert struct btrfs_writepage_fixup to use a folioJosef Bacik
Now the fixup creator and consumer use folios, change this to use a folio as well. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_writepage_cow_fixup() to use folioJosef Bacik
Instead of a page, use a folio for btrfs_writepage_cow_fixup. We already have a folio at the only caller, and the fixup worker uses folios. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_writepage_fixup_worker() to use a folioJosef Bacik
This function heavily messes with pages, instead update it to use a folio. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert submit_uncompressed_range() to take a folioJosef Bacik
This mostly uses folios already, update it to take a folio and update the rest of the function to use the folio instead of the page. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert struct async_chunk to hold a folioJosef Bacik
Instead of passing in the page for ->locked_page, make it hold a locked_folio and then update the users of async_chunk to act accordingly. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_run_delalloc_range() to take a folioJosef Bacik
Now that every function that btrfs_run_delalloc_range calls takes a folio, update it to take a folio and update the callers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert run_delalloc_compressed() to take a folioJosef Bacik
This just passes the page into the compressed machinery to keep track of the locked page. Update this to take a folio and convert it to a page where appropriate. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_cleanup_ordered_extents() to take a folioJosef Bacik
Now that btrfs_cleanup_ordered_extents is operating mostly with folios, update it to use a folio instead of a page, and the update the function and the callers as appropriate. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_cleanup_ordered_extents() to use foliosJosef Bacik
We walk through pages in this function and clear ordered, and the function for this uses folios. Update the function to use a folio for this whole operation. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert run_delalloc_nocow() to take a folioJosef Bacik
Now all of the functions that use locked_page in run_delalloc_nocow take a folio, update it to take a folio and update the caller. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert fallback_to_cow() to take a folioJosef Bacik
With this we can pass the folio directly into cow_file_range(). Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert cow_file_range() to take a folioJosef Bacik
Convert this to take a folio and pass it into all of the various cleanup functions. Update the callers to pass in a folio instead. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert cow_file_range_inline() to take a folioJosef Bacik
Now that we want the folio in this function, convert it to take a folio directly and use that. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert run_delalloc_cow() to take a folioJosef Bacik
We pass the folio into extent_write_locked_range, go ahead and take a folio to pass along, and update the callers to pass in a folio. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert extent_write_locked_range() to take a folioJosef Bacik
This mostly uses folios, convert it to take a folio instead and update the callers to pass in the folio. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert extent_clear_unlock_delalloc() to take a folioJosef Bacik
Instead of taking the locked page, take the locked folio so we can pass that into __process_folios_contig. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert process_one_page() to operate only on foliosJosef Bacik
Now that this mostly uses folios, update it to take folios, use the folios that are passed in, and rename from process_one_page => process_one_folio. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert __process_pages_contig() to take a folioJosef Bacik
This operates mostly on folios, update it to take a folio for the locked folio instead of the page, rename from __process_pages_contig => __process_folios_contig. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert __unlock_for_delalloc() to take a folioJosef Bacik
All of the callers have a folio at this point, update __unlock_for_delalloc to take a folio so that it's consistent with its callers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert lock_delalloc_pages() to take a folioJosef Bacik
Also rename lock_delalloc_pages => lock_delalloc_folios in the process, now that it exclusively works on folios. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert find_lock_delalloc_range() to use a folioJosef Bacik
Instead of passing in a page for locked_page, pass in the folio instead. We only use the folio itself to validate some range assumptions, and then pass it into other functions. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert writepage_delalloc() to take a folioJosef Bacik
We already use a folio heavily in this function, pass the folio in directly and use it everywhere, only passing the page down to functions that do not take a folio yet. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_mark_ordered_io_finished() to take a folioJosef Bacik
We only need a folio now, make it take a folio as an argument and update all of the callers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_finish_ordered_extent() to take a folioJosef Bacik
The callers and callee's of this now all use folios, update it to take a folio as well. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert can_finish_ordered_extent() to use a folioJosef Bacik
Pass in a folio instead, and use a folio instead of a page. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: utilize folio more in btrfs_page_mkwrite()Josef Bacik
We already have a folio that we're using in btrfs_page_mkwrite, update the rest of the function to use folio everywhere else. This will make it easier on Willy when he drops page->index. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert add_ra_bio_pages() to use only foliosJosef Bacik
Willy is going to get rid of page->index, and add_ra_bio_pages uses page->index. Make his life easier by converting add_ra_bio_pages to use folios so that we are no longer using page->index. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert __extent_writepage() to be completely folio basedJosef Bacik
Now that we've gotten most of the helpers updated to only take a folio, update __extent_writepage to only deal in folios. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert extent_write_locked_range() to use foliosJosef Bacik
Instead of using pages for everything, find a folio and use that. This makes things a bit cleaner as a lot of the functions calls here all take folios. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert __extent_writepage_io() to take a folioJosef Bacik
__extent_writepage_io uses page everywhere, but a lot of these functions take a folio. Convert it to use the folio based helpers, and then change it to take a folio as an argument and update its callers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: update the writepage tracepoint to take a folioJosef Bacik
Willy is wanting to get rid of page->index, convert the writepage tracepoint to take a folio so we can do folio->index instead of page->index. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_do_readpage() to only use a folioJosef Bacik
Now that the callers and helpers mostly use folio, convert btrfs_do_readpage to take a folio, and rename it to btrfs_do_read_folio. Update all of the page stuff to use the folio based helpers instead. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert submit_extent_page() to use a folioJosef Bacik
The callers of this helper are going to be converted to using a folio, so adjust submit_extent_page to become submit_extent_folio and update it to use all the relevant folio helpers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert begin_page_folio() to take a folio insteadJosef Bacik
This already uses a folio internally, change it to take a folio as an argument instead. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert end_page_read() to take a folioJosef Bacik
We have this helper function to set the page range uptodate once we're done reading it, as well as run fsverity against it. Half of these functions already take a folio, just rename this to end_folio_read and then rework it to take a folio instead, and update everything accordingly. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_read_folio() to only use a folioJosef Bacik
Currently we're using the page for everything here. Convert this to use the folio helpers instead. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_readahead() to only use folioJosef Bacik
We're the only user of readahead_page_batch(). Convert btrfs_readahead() to use the folio based helpers to do readahead. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: print message on device opening error during mountLi Zhang
[ENHANCEMENT] When mounting a btrfs filesystem, the filesystem opens the block device, and if this fails, there is no message about it. Print a message about it to help debugging. [TEST] I have a btrfs filesystem on three block devices, one of which is write-protected, so regular mounts fail, but there is no message in dmesg. /dev/vdb normal /dev/vdc write protected /dev/vdd normal Before patch: $ sudo mount /dev/vdb /mnt/ mount: mount(2) failed: no such file or directory $ sudo dmesg # Show only messages about missing block devices .... [ 352.947196] BTRFS error (device vdb): devid 2 uuid 4ee2c625-a3b2-4fe0-b411-756b23e08533 missing .... After patch: $ sudo mount /dev/vdb /mnt/ mount: mount(2) failed: no such file or directory $ sudo dmesg # Show bdev_file_open_by_path failed. .... [ 352.944328] BTRFS error: failed to open device for path /dev/vdc with flags 0x3: -13 [ 352.947196] BTRFS error (device vdb): missing devid 2 uuid 4ee2c625-a3b2-4fe0-b411-756b23e08533 .... Signed-off-by: Li Zhang <zhanglikernel@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: move uuid tree related code to uuid-tree.[ch]Qu Wenruo
Functions btrfs_uuid_scan_kthread() and btrfs_create_uuid_tree() are for UUID tree rescan and creation, it's not suitable for volumes.[ch]. Move them to uuid-tree.[ch] instead. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: reduce size and overhead of extent_map_block_end()Filipe Manana
At extent_map_block_end() we are calling the inline functions extent_map_block_start() and extent_map_block_len() multiple times, which results in expanding their code multiple times, increasing the compiled code size and repeating the computations those functions do. Improve this by caching their results in local variables. The size of the module before this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1755770 163800 16920 1936490 1d8c6a fs/btrfs/btrfs.ko And after this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1755656 163800 16920 1936376 1d8bf8 fs/btrfs/btrfs.ko Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: update stripe_extent delete loop assumptionsJohannes Thumshirn
btrfs_delete_raid_extent() was written under the assumption, that it's call-chain always passes a start, length tuple that matches a single extent. But btrfs_delete_raid_extent() is called by do_free_extent_accounting() which in turn is called by __btrfs_free_extent(). But this call-chain passes in a start address and a length that can possibly match multiple on-disk extents. To make this possible, we have to adjust the start and length of each btree node lookup, to not delete beyond the requested range. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: update stripe extents for existing logical addressesJohannes Thumshirn
Update a stripe extent in case of an already existing logical address, but with different physical addresses and/or device id instead of bailing out with EEXIST. This can happen i.e. in case of a device replace operation, where data extents get rewritten to a new disk. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10nvme-tcp: fix link failure for TCP authArnd Bergmann
The nvme fabric driver calls the nvme_tls_key_lookup() function from nvmf_parse_key() when the keyring is enabled, but this is broken in a configuration with CONFIG_NVME_FABRICS=y and CONFIG_NVME_TCP=m because this leads to the function definition being in a loadable module: x86_64-linux-ld: vmlinux.o: in function `nvmf_parse_key': fabrics.c:(.text+0xb1bdec): undefined reference to `nvme_tls_key_lookup' Move the 'select' up to CONFIG_NVME_FABRICS itself to force this part to be built-in as well if needed. Fixes: 5bc46b49c828 ("nvme-tcp: check for invalidated or revoked key") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-09-10MAINTAINERS: update Pierre Bossart's email and rolePierre-Louis Bossart
Update to permanent address and Reviewer role. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20240910143021.261261-1-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-10xen: add capability to remap non-RAM pages to different PFNsJuergen Gross
When running as a Xen PV dom0 it can happen that the kernel is being loaded to a guest physical address conflicting with the host memory map. In order to be able to resolve this conflict, add the capability to remap non-RAM areas to different guest PFNs. A function to use this remapping information for other purposes than doing the remap will be added when needed. As the number of conflicts should be rather low (currently only machines with max. 1 conflict are known), save the remap data in a small statically allocated array. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-10ASoC: qcom: sm8250: enable primary mi2sJens Reidel
When using primary mi2s on sm8250-compatible SoCs, the correct clock needs to get enabled to be able to use the mi2s interface. Signed-off-by: Jens Reidel <adrian@travitia.xyz> Tested-by: Danila Tikhonov <danila@jiaxyga.com> # sm7325-nothing-spacewar Link: https://patch.msgid.link/20240826134920.55148-2-adrian@travitia.xyz Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-10perf trace: Add --force-btf for debuggingHoward Chu
If --force-btf is enabled, prefer btf_dump general pretty printer to perf trace's customized pretty printers. Mostly for debug purposes. Committer testing: diff before/after shows we need several improvements to be able to compare the changes, first we need to cut off/disable mutable data such as pids and timestamps, then what is left are the buffer addresses passed from userspace, returned from kernel space, maybe we can ask 'perf trace' to go on making those reproducible. That would entail a Pointer Address Translation (PAT) like for networking, that would, for simple, reproducible if not for these details, workloads, that we would then use in our regression tests. Enough digression, this is one such diff: openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) = 3 -fstat(fd: 3, statbuf: 0x7fff01f212a0) = 0 -read(fd: 3, buf: 0x5596bab2d630, count: 4096) = 2998 -read(fd: 3, buf: 0x5596bab2d630, count: 4096) = 0 +fstat(fd: 3, statbuf: 0x7ffc163cf0e0) = 0 +read(fd: 3, buf: 0x55b4e0631630, count: 4096) = 2998 +read(fd: 3, buf: 0x55b4e0631630, count: 4096) = 0 close(fd: 3) = 0 openat(dfd: CWD, filename: "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) openat(dfd: CWD, filename: "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) @@ -45,7 +45,7 @@ openat(dfd: CWD, filename: "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) openat(dfd: CWD, filename: "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) openat(dfd: CWD, filename: "/usr/share/locale/en/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) -{ .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7fff01f21990) = 0 +(struct __kernel_timespec){.tv_sec = (__kernel_time64_t)1,}, rmtp: 0x7ffc163cf7d0) = The problem more close to our hands is to make the libbpf BTF pretty printer to have a mode that closely resembles what we're trying to resemble: strace output. Being able to run something with 'perf trace' and with 'strace' and get the exact same output should be of interest of anybody wanting to have strace and 'perf trace' regression tested against each other. That last part is 'perf trace' shot at being something so useful as strace... ;-) Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240824163322.60796-8-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf trace: Collect augmented data using BPFHoward Chu
Include trace_augment.h for TRACE_AUG_MAX_BUF, so that BPF reads TRACE_AUG_MAX_BUF bytes of buffer maximum. Determine what type of argument and how many bytes to read from user space, us ing the value in the beauty_map. This is the relation of parameter type and its corres ponding value in the beauty map, and how many bytes we read eventually: string: 1 -> size of string (till null) struct: size of struct -> size of struct buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_ MAX_BUF) After reading from user space, we output the augmented data using bpf_perf_event_output(). If the struct augmenter, augment_sys_enter() failed, we fall back to using bpf_tail_call(). I have to make the payload 6 times the size of augmented_arg, to pass the BPF verifier. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-10-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-7-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf trace: Pretty print buffer dataHoward Chu
Define TRACE_AUG_MAX_BUF in trace_augment.h data, which is the maximum buffer size we can augment. BPF will include this header too. Print buffer in a way that's different than just printing a string, we print all the control characters in \digits (such as \0 for null, and \10 for newline, LF). For character that has a bigger value than 127, we print the digits instead of the character itself as well. Committer notes: Simplified the buffer scnprintf to avoid using multiple buffers as discussed in the patch review thread. We can't really all 'buf' args to SCA_BUF as we're collecting so far just on the sys_enter path, so we would be printing the previous 'read' arg buffer contents, not what the kernel puts there. So instead of: static int syscall_fmt__cmp(const void *name, const void *fmtp) @@ -1987,8 +1989,6 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field - else if (strstr(field->type, "char *") && strstr(field->name, "buf")) - arg->scnprintf = SCA_BUF; Do: static const struct syscall_fmt syscall_fmts[] = { + { .name = "write", .errpid = true, + .arg = { [1] = { .scnprintf = SCA_BUF /* buf */, from_user = true, }, }, }, Signed-off-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-8-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-6-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf trace: Pretty print struct dataHoward Chu
Change the arg->augmented.args to arg->augmented.args->value to skip the header for customized pretty printers, since we collect data in BPF using the general augment_sys_enter(), which always adds the header. Use btf_dump API to pretty print augmented struct pointer. Prefer existed pretty-printer than btf general pretty-printer. set compact = true and skip_names = true, so that no newline character and argument name are printed. Committer notes: Simplified the btf_dump_snprintf callback to avoid using multiple buffers, as discussed in the thread accessible via the Link tag below. Also made it do: dump_data_opts.skip_names = !arg->trace->show_arg_names; I.e. show the type and struct field names according to that tunable, we probably need another tunable just for this, but for now if the user wants to see syscall names in addition to its value, it makes sense to see the struct field names according to that tunable. Committer testing: The following have explicitely set beautifiers (SCA_FILENAME, SCA_SOCKADDR and SCA_PERF_ATTR), SCA_FILENAME is here just because we have been wiring up the "renameat2" ("renameat" until recently), so it doesn't use the introduced generic fallback (btf_struct_scnprintf(), see the definition of SCA_PERF_ATTR, SCA_SOCKADDR to see the more feature rich beautifiers, that are not using BTF): root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654 0.000 ( 0.039 ms): mv/258478 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0 root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com 0.000 ( 0.014 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 0.040 ( 0.003 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a6980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97 18.742 ( 0.020 ms): ping/258481 sendto(fd: 5, buff: 0x7ffc04768df0, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20 PING www.google.com (142.251.129.68) 56(84) bytes of data. 18.783 ( 0.012 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0 18.797 ( 0.001 ms): ping/258481 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0 18.815 ( 0.002 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.129.68 }, addrlen: 16) = 0 18.862 ( 0.023 ms): ping/258481 sendto(fd: 3, buff: 0x55bc317a0ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.129.68 }, addr_len: 0x10) = 64 63.330 ( 0.038 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 63.435 ( 0.010 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a8340, len: 110, flags: DONTWAIT|NOSIGNAL) = 110 64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.2 ms --- www.google.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 44.158/44.158/44.158/0.000 ms root@number:~# perf trace -e perf_event_open perf stat -e instructions,cache-misses,syscalls:sys_enter*sleep* sleep 1.23456789 0.000 ( 0.010 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0xa00000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599052088, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 0.016 ( 0.002 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0x400000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599044082, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 1.838 ( 0.006 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5 1.846 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 6 1.849 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 7 1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9 1.853 ( 0.600 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x190 (syscalls:sys_enter_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 10 2.456 ( 0.016 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x196 (syscalls:sys_enter_clock_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 11 Performance counter stats for 'sleep 1.23456789': 1,402,839 cpu_atom/instructions/ <not counted> cpu_core/instructions/ (0.00%) 11,066 cpu_atom/cache-misses/ <not counted> cpu_core/cache-misses/ (0.00%) 0 syscalls:sys_enter_nanosleep 1 syscalls:sys_enter_clock_nanosleep 1.236246714 seconds time elapsed 0.000000000 seconds user 0.001308000 seconds sys root@number:~# Now if we use it even for the ones we have a specific beautifier in tools/perf/trace/beauty, i.e. use btf_struct_scnprintf() for all structs, by adding the following patch: @@ -2316,7 +2316,7 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size, default_scnprintf = sc->arg_fmt[arg.idx].scnprintf; - if (default_scnprintf == NULL || default_scnprintf == SCA_PTR) { + if (1 || (default_scnprintf == NULL || default_scnprintf == SCA_PTR)) { btf_printed = trace__btf_scnprintf(trace, &arg, bf + printed, size - printed, val, field->type); if (btf_printed) { We get: root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com PING www.google.com (142.251.129.68) 56(84) bytes of data. 0.000 ( 0.015 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0 0.046 ( 0.004 ms): ping/283259 sendto(fd: 5, buff: 0x559b008ae980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97 0.353 ( 0.012 ms): ping/283259 sendto(fd: 5, buff: 0x7ffc01294960, len: 20, addr: (struct sockaddr){.sa_family = (sa_family_t)16,}, addr_len: 0xc) = 20 0.377 ( 0.006 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addrlen: 16) = 0 0.388 ( 0.010 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)10,}, addrlen: 28) = 0 0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0 0.425 ( 0.045 ms): ping/283259 sendto(fd: 3, buff: 0x559b008a8ac0, len: 64, addr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addr_len: 0x10) = 64 64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.1 ms --- www.google.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 44.113/44.113/44.113/0.000 ms 44.849 ( 0.038 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0 44.927 ( 0.006 ms): ping/283259 sendto(fd: 5, buff: 0x559b008b03d0, len: 110, flags: DONTWAIT|NOSIGNAL) = 110 root@number:~# Which looks sane, i.e.: 18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0 Becomes: 0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0 And. #define AF_UNIX 1 /* Unix domain sockets */ #define AF_LOCAL 1 /* POSIX name for AF_UNIX */ #define AF_INET 2 /* Internet IP Protocol */ <SNIP> #define AF_INET6 10 /* IP version 6 */ And 'D' == 68, so the preexisting sockaddr BPF collector is working with the new generic BTF pretty printer (btf_struct_scnprintf()), its just that it doesn't know about 'struct sockaddr' besides what is in BTF, i.e. its an array of bytes, not an IPv4 address that needs extra massaging. Ditto for the 'struct perf_event_attr' case: 1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9 Becomes: 2.081 ( 0.002 ms): :283304/283304 perf_event_open(attr_uptr: (struct perf_event_attr){.size = (__u32)136,.config = (__u64)17179869187,.sample_type = (__u64)65536,.read_format = (__u64)3,.disabled = (__u64)0x1,.inherit = (__u64)0x1,.enable_on_exec = (__u64)0x1,.exclude_guest = (__u64)0x1,}, pid: 283305 (sleep), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9 hex(17179869187) = 0x400000003, etc. read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING is enum perf_event_read_format { PERF_FORMAT_TOTAL_TIME_ENABLED = 1U << 0, PERF_FORMAT_TOTAL_TIME_RUNNING = 1U << 1, and so on. We need to work with the libbpf btf dump api to get one output that matches the 'perf trace'/strace expectations/format, but having this in this current form is already an improvement to 'perf trace', so lets improve from what we have. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-7-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-5-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>