summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-11-18btrfs: drop incompat bit for raid1c34 after last block group is goneDavid Sterba
When there are no raid1c3 or raid1c4 block groups left after balance (either convert or with other filters applied), remove the incompat bit. This is already done for RAID56, do the same for RAID1C34. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: add incompat for raid1 with 3, 4 copiesDavid Sterba
The new raid1c3 and raid1c4 profiles are backward incompatible and the name shall be 'raid1c34', the status can be found in the global supported features in /sys/fs/btrfs/features or in the per-filesystem directory. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: add support for 4-copy replication (raid1c4)David Sterba
Add new block group profile to store 4 copies in a simliar way that current RAID1 does. The profile attributes and constraints are defined in the raid table and used by the same code that already handles the 2- and 3-copy RAID1. The minimum number of devices is 4, the maximum number of devices/chunks that can be lost/damaged is 3. There is no comparable traditional RAID level, the profile is added for future needs to accompany triple-parity and beyond. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: add support for 3-copy replication (raid1c3)David Sterba
Add new block group profile to store 3 copies in a simliar way that current RAID1 does. The profile attributes and constraints are defined in the raid table and used by the same code that already handles the 2-copy RAID1. The minimum number of devices is 3, the maximum number of devices/chunks that can be lost/damaged is 2. Like RAID6 but with 33% space utilization. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: sink write flags to cow_file_range_asyncDavid Sterba
In commit "Btrfs: use REQ_CGROUP_PUNT for worker thread submitted bios", cow_file_range_async gained wbc as a parameter and this makes passing write flags redundant. Set it inside the function and remove the parameter. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: sink write_flags to __extent_writepage_ioDavid Sterba
__extent_writepage reads write flags from wbc and passes both to __extent_writepage_io. This makes write_flags redundant and we can remove it. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18Btrfs: send, skip backreference walking for extents with many referencesFilipe Manana
Backreference walking, which is used by send to figure if it can issue clone operations instead of write operations, can be very slow and use too much memory when extents have many references. This change simply skips backreference walking when an extent has more than 64 references, in which case we fallback to a write operation instead of a clone operation. This limit is conservative and in practice I observed no signicant slowdown with up to 100 references and still low memory usage up to that limit. This is a temporary workaround until there are speedups in the backref walking code, and as such it does not attempt to add extra interfaces or knobs to tweak the threshold. Reported-by: Atemu <atemu.main@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAE4GHgkvqVADtS4AzcQJxo0Q1jKQgKaW3JGp3SGdoinVo=C9eQ@mail.gmail.com/T/#me55dc0987f9cc2acaa54372ce0492c65782be3fa CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18Btrfs: send, allow clone operations within the same fileFilipe Manana
For send we currently skip clone operations when the source and destination files are the same. This is so because clone didn't support this case in its early days, but support for it was added back in May 2013 by commit a96fbc72884fcb ("Btrfs: allow file data clone within a file"). This change adds support for it. Example: $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdd /mnt/sdd $ xfs_io -f -c "pwrite -S 0xab -b 64K 0 64K" /mnt/sdd/foobar $ xfs_io -c "reflink /mnt/sdd/foobar 0 64K 64K" /mnt/sdd/foobar $ btrfs subvolume snapshot -r /mnt/sdd /mnt/sdd/snap $ mkfs.btrfs -f /dev/sde $ mount /dev/sde /mnt/sde $ btrfs send /mnt/sdd/snap | btrfs receive /mnt/sde Without this change file foobar at the destination has a single 128Kb extent: $ filefrag -v /mnt/sde/snap/foobar Filesystem type is: 9123683e File size of /mnt/sde/snap/foobar is 131072 (32 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 31: 0.. 31: 32: last,unknown_loc,delalloc,eof /mnt/sde/snap/foobar: 1 extent found With this we get a single 64Kb extent that is shared at file offsets 0 and 64K, just like in the source filesystem: $ filefrag -v /mnt/sde/snap/foobar Filesystem type is: 9123683e File size of /mnt/sde/snap/foobar is 131072 (32 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 15: 3328.. 3343: 16: shared 1: 16.. 31: 3328.. 3343: 16: 3344: last,shared,eof /mnt/sde/snap/foobar: 2 extents found Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: Ensure we trim ranges across block group boundaryQu Wenruo
[BUG] When deleting large files (which cross block group boundary) with discard mount option, we find some btrfs_discard_extent() calls only trimmed part of its space, not the whole range: btrfs_discard_extent: type=0x1 start=19626196992 len=2144530432 trimmed=1073741824 ratio=50% type: bbio->map_type, in above case, it's SINGLE DATA. start: Logical address of this trim len: Logical length of this trim trimmed: Physically trimmed bytes ratio: trimmed / len Thus leaving some unused space not discarded. [CAUSE] When discard mount option is specified, after a transaction is fully committed (super block written to disk), we begin to cleanup pinned extents in the following call chain: btrfs_commit_transaction() |- btrfs_finish_extent_commit() |- find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY); |- btrfs_discard_extent() However, pinned extents are recorded in an extent_io_tree, which can merge adjacent extent states. When a large file gets deleted and it has adjacent file extents across block group boundary, we will get a large merged range like this: |<--- BG1 --->|<--- BG2 --->| |//////|<-- Range to discard --->|/////| To discard that range, we have the following calls: btrfs_discard_extent() |- btrfs_map_block() | Returned bbio will end at BG1's end. As btrfs_map_block() | never returns result across block group boundary. |- btrfs_issuse_discard() Issue discard for each stripe. So we will only discard the range in BG1, not the remaining part in BG2. Furthermore, this bug is not that reliably observed, for above case, if there is no other extent in BG2, BG2 will be empty and btrfs will trim all space of BG2, covering up the bug. [FIX] - Allow __btrfs_map_block_for_discard() to modify @length parameter btrfs_map_block() uses its @length paramter to notify the caller how many bytes are mapped in current call. With __btrfs_map_block_for_discard() also modifing the @length, btrfs_discard_extent() now understands when to do extra trim. - Call btrfs_map_block() in a loop until we hit the range end Since we now know how many bytes are mapped each time, we can iterate through each block group boundary and issue correct trim for each range. Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: volumes: Use more straightforward way to calculate map lengthQu Wenruo
The old code goes: offset = logical - em->start; length = min_t(u64, em->len - offset, length); Where @length calculation is dependent on offset, it can take reader several more seconds to find it's just the same code as: offset = logical - em->start; length = min_t(u64, em->start + em->len - logical, length); Use above code to make the length calculate independent from other variable, thus slightly increase the readability. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: tree-checker: Check item size before reading file extent typeQu Wenruo
In check_extent_data_item(), we read file extent type without verifying if the item size is valid. Add such check to ensure the file extent type we read is correct. The check is not as accurate as we need to cover both inline and regular extents, so it only checks if the item size is larger or equal to inline header. So the existing size checks on inline/regular extents are still needed. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: clean up locking name in scrub_enumerate_chunks()Dan Carpenter
The "&fs_info->dev_replace.rwsem" and "&dev_replace->rwsem" refer to the same lock but Smatch is not clever enough to figure that out so it leads to static checker warnings. It's better to use it consistently anyway. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: Streamline btrfs_fs_info::backup_root_index semanticsNikolay Borisov
The backup_root_index member stores the index at which the backup root should be saved upon next transaction commit. However, there is a small deviation from this behavior in the form of a check in backup_super_roots which checks if current root generation equals to the generation of the previous root. This can trigger in the following scenario: slot0: gen-2 slot1: gen-1 slot2: gen slot3: unused Now suppose slot3 (which is also the root specified in the super block) is corrupted hence init_tree_roots chooses to use the backup root at slot2, meaning read_backup_root will read slot2 and assign the superblock generation to gen-1. Despite this backup_root_index will point at slot3 because its init happens in init_backup_root_slot, long before any parsing of the backup roots occur. Then on next transaction start, gen-1 will be incremented by 1 making the root's generation equal gen. Subsequently, on transaction commit the following check triggers: if (btrfs_backup_tree_root_gen(root_backup) == btrfs_header_generation(info->tree_root->node)) This causes the 'next_backup', which is the index at which the backup is going to be written to, to set to last_backup, which will be slot2. All of this is a very confusing way of expressing the following invariant: Always write a backup root at the index following the last used backup root. This commit streamlines this logic by setting backup_root_index to the next index after the one used for mount. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: Rename find_oldest_super_backup to init_backup_root_slotNikolay Borisov
The old name name was an awful misnomer because it didn't really find the oldest super backup per-se but rather its slot. For example if we have: slot0: gen - 2 slot1: gen - 1 slot2: gen slot3: empty init_backup_root_slot will return slot3 and not slot0. The new name is more appropriate since the function doesn't care whether there is a valid backup in the returned slot or not. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: Remove unused next_root_backup functionNikolay Borisov
This function has been superseded by previous commits and is no longer used so just remove it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: Don't use objectid_mutex during mountNikolay Borisov
Since the filesystem is not well formed and no trees are loaded it's pointless holding the objectid_mutex. Just remove its usage. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: Factor out tree roots initialization during mountNikolay Borisov
The code responsible for reading and initializing tree roots is scattered in open_ctree among 2 labels, emulating a loop. This is rather confusing to reason about. Instead, factor the code to a new function, init_tree_roots which implements the same logical flow. There are a couple of notable differences, namely: * Instead of using next_backup_root it's using the newly introduced read_backup_root. * If read_backup_root returns an error init_tree_roots propagates the error and there is no special handling of that case e.g. the code jumps straight to 'fail_tree_roots' label. The old code, however, was (erroneously) jumping to 'fail_block_groups' label if next_backup_root did fail, this was unnecessary since the tree roots init logic doesn't modify the state of block groups. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: Add read_backup_rootNikolay Borisov
This function will replace next_root_backup with a much saner/cleaner interface. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: Remove newest_gen argument from find_oldest_super_backupNikolay Borisov
It's no longer needed following cleanups around find_newest_backup_root Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: Cleanup and simplify find_newest_super_backupNikolay Borisov
Backup roots are always written in a circular manner. By definition we can only ever have 1 backup root whose generation equals to that of the superblock. Hence, the 'if' in the for loop will trigger at most once. This is sufficient to return the newest backup root. Furthermore the newest_gen parameter is always set to the generation of the superblock. This value can be obtained from the fs_info. This patch removes the unnecessary code dealing with the wraparound case and makes 'newest_gen' a local variable. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18Btrfs: remove unnecessary delalloc mutex for inodesFilipe Manana
The inode delalloc mutex was added a long time ago by commit f248679e86fea ("Btrfs: add a delalloc mutex to inodes for delalloc reservations"), and the reason for its introduction is not very clear from the change log. It claims it solves bogus warnings from lockdep, however it lacks an example report/warning from lockdep, or any explanation. Since we have enough concurrentcy protection from the locks of the space info and block reserve objects, and such lockdep warnings don't seem to exist anymore (at least on a 5.3 kernel I couldn't get them with fstests, ltp, fs_mark, etc), remove it, simplifying things a bit and decreasing the size of the btrfs_inode structure. With some quick fio tests doing direct IO and mmap writes I couldn't observe any significant performance increase either (direct IO writes that don't increase the file's size don't hold the inode's lock for their entire duration and mmap writes don't hold the inode's lock at all), which are the only type of writes that could see any performance gain due to less serialization. Review feedback from Josef: The problem was taking the i_mutex in mmap, which is how I was protecting delalloc reservations originally. The delalloc mutex didn't come with all of the other dependencies. That's what the lockdep messages were about, removing the lock isn't going to make them appear again. We _had_ to lock around this because we used to do tricks to keep from over-reserving, and if we didn't serialize delalloc reservations we'd end up with ugly accounting problems when we tried to clean things up. However with my recentish changes this isn't the case anymore. Every operation is responsible for reserving its space, and then adding it to the inode. Then cleaning up is straightforward and can't be mucked up by other users. So we no longer need the delalloc mutex to safe us from ourselves. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18Btrfs: remove wait queue from space_info structureFilipe Manana
It is not used anymore since commit 957780eb2788d8 ("Btrfs: introduce ticketed enospc infrastructure"), so just remove it. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: remove cached space_info in btrfs_statfs()Johannes Thumshirn
In btrfs_statfs() we cache fs_info::space_info in a local variable only to use it once in a list_for_each_rcu() statement. Not only is the local variable unnecessary it even makes the code harder to follow as it's not clear which list it is iterating. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: add dedicated members for start and length of a block groupDavid Sterba
The on-disk format of block group item makes use of the key that stores the offset and length. This is further used in the code, although this makes thing harder to understand. The key is also packed so the offset/length is not properly aligned as u64. Add start (key.objectid) and length (key.offset) members to block group and remove the embedded key. When the item is searched or written, a local variable for key is used. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: rename extent buffer block group item accessorsDavid Sterba
Accessors defined by BTRFS_SETGET_FUNCS take a raw extent buffer and manipulate the items there, there's no special prefix required. The block group accessors had _disk_ because previously the names were occupied by the on-stack accessors. As this has been addressed in the previous patch, we can now unify the naming. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: rename block_group_item on-stack accessors to follow namingDavid Sterba
All accessors defined by BTRFS_SETGET_STACK_FUNCS contain _stack_ in the name, the block group ones were not following that scheme, so let's switch them. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: remove embedded block_group_cache::itemDavid Sterba
The members ::used and ::flags are now in the block group cache structure, the last one is chunk_objectid, but that's set to a fixed value and otherwise unused. The item is constructed from a local variable before write, so we can remove the embedded one from block group. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: move block_group_item::flags to block groupDavid Sterba
The flags are read from the item that's embedded to block group struct, but the item will be removed. Use the ::flags after read and before write. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: move block_group_item::used to block groupDavid Sterba
For unknown reasons, the member 'used' in the block group struct is stored in the b-tree item and accessed everywhere using the special accessor helper. Let's unify it and make it a regular member and only update the item before writing it to the tree. The item is still being used for flags and chunk_objectid, there's some duplication until the item is removed in following patches. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: Remove btrfs_bio::flags memberQu Wenruo
The last user of btrfs_bio::flags was removed in commit 326e1dbb5736 ("block: remove management of bi_remaining when restoring original bi_end_io"), remove it. (Tagged for stable as the structure is heavily used and space savings are desirable.) CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: add blake2b to checksumming algorithmsDavid Sterba
Add blake2b (with 256 bit digest) to the list of possible checksumming algorithms used by BTRFS. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: add member for a specific checksum driverDavid Sterba
Currently all the checksum algorithms generate a fixed size digest size and we use it. The on-disk format can hold up to BTRFS_CSUM_SIZE bytes and BLAKE2b produces digest of 512 bits by default. We can't do that and will use the blake2b-256, this needs to be passed to the crypto API. Separate that from the base algorithm name and add a member to request specific driver, in this case with the digest size. The only place that uses the driver name is the crypto API setup. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: sysfs: show used checksum driver per filesystemJohannes Thumshirn
Show the used driver for the checksum algorithm for the filesystem in sysfs file /sys/fs/btrfs/UUID/features/checksum, eg. crc32c (crc32c-generic) Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: sysfs: export supported checksumsDavid Sterba
Export supported checksum algorithms via sysfs in the list of static features: /sys/fs/btrfs/features/supported_checksums Space spearated list of checksum algorithm names. Co-developed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: add sha256 to checksumming algorithmJohannes Thumshirn
Add sha256 to the list of possible checksumming algorithms used by BTRFS. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: add xxhash64 to checksumming algorithmsJohannes Thumshirn
Add xxhash64 to the list of possible checksumming algorithms used by BTRFS. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18block: sed-opal: Introduce SUM_SET_LIST parameter and append it using ↵Revanth Rajashekar
'add_token_u64' In function 'activate_lsp', rather than hard-coding the short atom header(0x83), we need to let the function 'add_short_atom_header' append the header based on the parameter being appended. The parameter has been defined in Section 3.1.2.1 of https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage-Opal_Feature_Set_Single_User_Mode_v1-00_r1-00-Final.pdf Reviewed-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-18libtraceevent: Fix parsing of event %o and %X argument typesKonstantin Khlebnikov
Add missing "%o" and "%X". Ext4 events use "%o" for printing i_mode. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Link: http://lore.kernel.org/lkml/157338066113.6548.11461421296091086041.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-18perf callchain: Fix segfault in thread__resolve_callchain_sample()Adrian Hunter
Do not dereference 'chain' when it is NULL. $ perf record -e intel_pt//u -e branch-misses:u uname $ perf report --itrace=l --branch-history perf: Segmentation fault Fixes: e9024d519d89 ("perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191114142538.4097-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-18perf map_groups: Auto sort maps by name, if neededArnaldo Carvalho de Melo
There are still lots of lookups by name, even if just when loading vmlinux, till that code is studied to figure out if its possible to do away with those map lookup by names, provide a way to sort it using libc's qsort/bsearch. Doing it at the first lookup defers the sorting a bit, and as the code stands now, is never done for user maps, just for the kernel ones. # perf probe -l # perf probe -x ~/bin/perf -L __map_groups__find_by_name <__map_groups__find_by_name@/home/acme/git/perf/tools/perf/util/symbol.c:0> 0 static struct map *__map_groups__find_by_name(struct map_groups *mg, const char *name) 1 { struct map **mapp; 4 if (mg->maps_by_name == NULL && 5 map__groups__sort_by_name_from_rbtree(mg)) 6 return NULL; 8 mapp = bsearch(name, mg->maps_by_name, mg->nr_maps, sizeof(*mapp), map__strcmp_name); 9 if (mapp) 10 return *mapp; 11 return NULL; 12 } struct map *map_groups__find_by_name(struct map_groups *mg, const char *name) { # perf probe -x ~/bin/perf 'found=__map_groups__find_by_name:10 name:string' Added new event: probe_perf:found (on __map_groups__find_by_name:10 in /home/acme/bin/perf with name:string) You can now use it in all perf tools, such as: perf record -e probe_perf:found -aR sleep 1 # # perf probe -x ~/bin/perf -L map_groups__find_by_name <map_groups__find_by_name@/home/acme/git/perf/tools/perf/util/symbol.c:0> 0 struct map *map_groups__find_by_name(struct map_groups *mg, const char *name) 1 { 2 struct maps *maps = &mg->maps; struct map *map; 5 down_read(&maps->lock); 7 if (mg->last_search_by_name && strcmp(mg->last_search_by_name->dso->short_name, name) == 0) { 8 map = mg->last_search_by_name; 9 goto out_unlock; } /* * If we have mg->maps_by_name, then the name isn't in the rbtree, * as mg->maps_by_name mirrors the rbtree when lookups by name are * made. */ 16 map = __map_groups__find_by_name(mg, name); 17 if (map || mg->maps_by_name != NULL) 18 goto out_unlock; /* Fallback to traversing the rbtree... */ 21 maps__for_each_entry(maps, map) 22 if (strcmp(map->dso->short_name, name) == 0) { 23 mg->last_search_by_name = map; 24 goto out_unlock; } 27 map = NULL; out_unlock: 30 up_read(&maps->lock); 31 return map; 32 } int dso__load_vmlinux(struct dso *dso, struct map *map, const char *vmlinux, bool vmlinux_allocated) # perf probe -x ~/bin/perf 'fallback=map_groups__find_by_name:21 name:string' Added new events: probe_perf:fallback (on map_groups__find_by_name:21 in /home/acme/bin/perf with name:string) probe_perf:fallback_1 (on map_groups__find_by_name:21 in /home/acme/bin/perf with name:string) You can now use it in all perf tools, such as: perf record -e probe_perf:fallback_1 -aR sleep 1 # # perf probe -l probe_perf:fallback (on map_groups__find_by_name:21@util/symbol.c in /home/acme/bin/perf with name_string) probe_perf:fallback_1 (on map_groups__find_by_name:21@util/symbol.c in /home/acme/bin/perf with name_string) probe_perf:found (on __map_groups__find_by_name:10@util/symbol.c in /home/acme/bin/perf with name_string) # # perf stat -e probe_perf:* Now run 'perf top' in another term and then, after a while, stop 'perf stat': Furthermore, if we ask for interval printing, we can see that that is done just at the start of the workload: # perf stat -I1000 -e probe_perf:* # time counts unit events 1.000319513 0 probe_perf:found 1.000319513 0 probe_perf:fallback_1 1.000319513 0 probe_perf:fallback 2.001868092 23,251 probe_perf:found 2.001868092 0 probe_perf:fallback_1 2.001868092 0 probe_perf:fallback 3.002901597 0 probe_perf:found 3.002901597 0 probe_perf:fallback_1 3.002901597 0 probe_perf:fallback 4.003358591 0 probe_perf:found 4.003358591 0 probe_perf:fallback_1 4.003358591 0 probe_perf:fallback ^C # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-c5lmbyr14x448rcfii7y6t3k@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-18perf machine: No need to check if kernel module maps pre-existArnaldo Carvalho de Melo
We'only populating maps for kernel modules either from perf.data file PERF_RECORD_MMAP records or when parsing /proc/modules, so there is no need to first look if we already have those module maps in the list, that would mean the kernel has duplicate entries. So ditch one use of looking up maps by name. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-gnzjg2hhuz6jnrw91m35059y@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-18docs: cgroup: mm: Fix spelling of "list"Chris Down
Signed-off-by: Chris Down <chris@chrisdown.name> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: kernel-team@fb.com Signed-off-by: Tejun Heo <tj@kernel.org>
2019-11-18blk-cgroup: cgroup_rstat_updated() shouldn't be called on cgroup1Tejun Heo
Currently, cgroup rstat is supported only on cgroup2 hierarchy and rstat functions shouldn't be called on cgroup1 cgroups. While converting blk-cgroup core statistics to rstat, f73316482977 ("blk-cgroup: reimplement basic IO stats using cgroup rstat") accidentally ended up calling cgroup_rstat_updated() on cgroup1 cgroups causing crashes. Longer term, we probably should add cgroup1 support to rstat but for now let's mask the call directly. Fixes: f73316482977 ("blk-cgroup: reimplement basic IO stats using cgroup rstat") Tested-by: Faiz Abbas <faiz_abbas@ti.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-18Revert "bcache: fix fifo index swapping condition in journal_pin_cmp()"Jens Axboe
Coly says: "Guoju Fang talked to me today, he told me this change was unnecessary and I was over-thought. Then I realize fifo_idx() uses a mask to handle the array index overflow condition, so the index swap in journal_pin_cmp() won't happen. And yes, Guoju and Kent are correct. Since you already applied this patch, can you please to remove this patch from your for-next branch? This single patch does not break thing, but it is unecessary at this moment." This reverts commit c0e0954e909c17b43d176ab219fc598964616ae6. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-18scsi: sd_zbc: Remove set but not used variable 'buflen'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: drivers/scsi/sd_zbc.c: In function 'sd_zbc_check_zones': drivers/scsi/sd_zbc.c:341:9: warning: variable 'buflen' set but not used [-Wunused-but-set-variable] It is not used since commit d9dd73087a8b ("block: Enhance blk_revalidate_disk_zones()") Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-18perf/core: Fix the mlock accounting, againAlexander Shishkin
Commit: 5e6c3c7b1ec2 ("perf/aux: Fix tracking of auxiliary trace buffer allocation") tried to guess the correct combination of arithmetic operations that would undo the AUX buffer's mlock accounting, and failed, leaking the bottom part when an allocation needs to be charged partially to both user->locked_vm and mm->pinned_vm, eventually leaving the user with no locked bonus: $ perf record -e intel_pt//u -m1,128 uname [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.061 MB perf.data ] $ perf record -e intel_pt//u -m1,128 uname Permission error mapping pages. Consider increasing /proc/sys/kernel/perf_event_mlock_kb, or try again with a smaller value of -m/--mmap_pages. (current value: 1,128) Fix this by subtracting both locked and pinned counts when AUX buffer is unmapped. Reported-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-18dm thin: wakeup worker only when deferred bios existJeffle Xu
Single thread fio test (read, bs=4k, ioengine=libaio, iodepth=128, numjobs=1) over dm-thin device has poor performance versus bare nvme device. Further investigation with perf indicates that queue_work_on() consumes over 20% CPU time when doing IO over dm-thin device. The call stack is as follows. - 40.57% thin_map + 22.07% queue_work_on + 9.95% dm_thin_find_block + 2.80% cell_defer_no_holder 1.91% inc_all_io_entry.isra.33.part.34 + 1.78% bio_detain.isra.35 In cell_defer_no_holder(), wakeup_worker() is always called, no matter whether the tc->deferred_bio_list list is empty or not. In single thread IO model, this list is most likely empty. So skip waking up worker thread if tc->deferred_bio_list list is empty. Single thread IO performance improves from 448 MiB/s to 646 MiB/s (+44%) once the needless wake_worker() calls are properly skipped. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-11-18block: Don't disable interrupts in trigger_softirq()Sebastian Andrzej Siewior
trigger_softirq() is always invoked as a SMP-function call which is always invoked with disables interrupts. Don't disable interrupt in trigger_softirq() because interrupts are already disabled. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-18perf record: No need to process the synthesized MMAP events twiceArnaldo Carvalho de Melo
At the end of a 'perf record' session, by default, we'll process all samples and populate the threads, maps, etc so as to find out which of the DSOs got samples, to reduce the size of the build-id table we'll add to the perf.data headers. But we don't need to process the PERF_RECORD_MMAP events synthesized for the kernel modules, as we have those already via perf_session__create_kernel_maps(), so add mmap/mmap2 handlers that first look at event->header.misc to see if the event is for a user map, bailing out if not. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-mofoxvcx2dryppcw3o689jdd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-18perf map: No need to adjust the long name of modulesArnaldo Carvalho de Melo
At some point in the past we needed to make sure we would get the long name of modules and not just what we get from /proc/modules, but that need, as described in the cset that introduced the adjustment function: Fixes: c03d5184f0e9 ("perf machine: Adjust dso->long_name for offline module") Without using the buildid-cache: # lsmod | grep trusted # insmod trusted.ko # lsmod | grep trusted trusted 24576 0 # strace -e open,openat perf probe -m ./trusted.ko key_seal |& grep trusted openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 4 openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 7 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/.debug/root/trusted.ko/dd3d355d567394d540f527e093e0f64b95879584/probes", O_RDWR|O_CREAT, 0644) = 3 openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/.debug/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, ".debug/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 4 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 probe:key_seal (on key_seal in trusted) # perf probe -l probe:key_seal (on key_seal in trusted) # No attempt at opening '[trusted]'. Now using the build-id cache: # rmmod trusted # perf buildid-cache --add ./trusted.ko # insmod trusted.ko # strace -e open,openat perf probe -m ./trusted.ko key_seal |& grep trusted openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 4 openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 7 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/.debug/root/trusted.ko/dd3d355d567394d540f527e093e0f64b95879584/probes", O_RDWR|O_CREAT, 0644) = 3 openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/.debug/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, ".debug/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 4 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 # Again, no attempt at reading '[trusted]'. Finally, adding a probe to that function and then using: [root@quaco ~]# perf trace -e probe_perf:*/max-stack=16/ --max-events=2 0.000 perf/13456 probe_perf:dso__adjust_kmod_long_name(__probe_ip: 5492263) dso__adjust_kmod_long_name (/home/acme/bin/perf) machine__process_kernel_mmap_event (/home/acme/bin/perf) machine__process_mmap_event (/home/acme/bin/perf) perf_event__process_mmap (/home/acme/bin/perf) machines__deliver_event (/home/acme/bin/perf) perf_session__deliver_event (/home/acme/bin/perf) perf_session__process_event (/home/acme/bin/perf) process_simple (/home/acme/bin/perf) reader__process_events (/home/acme/bin/perf) __perf_session__process_events (/home/acme/bin/perf) perf_session__process_events (/home/acme/bin/perf) process_buildids (/home/acme/bin/perf) record__finish_output (/home/acme/bin/perf) __cmd_record (/home/acme/bin/perf) cmd_record (/home/acme/bin/perf) run_builtin (/home/acme/bin/perf) 0.055 perf/13456 probe_perf:dso__adjust_kmod_long_name(__probe_ip: 5492263) dso__adjust_kmod_long_name (/home/acme/bin/perf) machine__process_kernel_mmap_event (/home/acme/bin/perf) machine__process_mmap_event (/home/acme/bin/perf) perf_event__process_mmap (/home/acme/bin/perf) machines__deliver_event (/home/acme/bin/perf) perf_session__deliver_event (/home/acme/bin/perf) perf_session__process_event (/home/acme/bin/perf) process_simple (/home/acme/bin/perf) reader__process_events (/home/acme/bin/perf) __perf_session__process_events (/home/acme/bin/perf) perf_session__process_events (/home/acme/bin/perf) process_buildids (/home/acme/bin/perf) record__finish_output (/home/acme/bin/perf) __cmd_record (/home/acme/bin/perf) cmd_record (/home/acme/bin/perf) run_builtin (/home/acme/bin/perf) # This was the only path I could find using the perf tools that reach at this function, then as of november/2019, if we put a probe in the line where the actuall setting of the dso->long_name is done: # perf trace -e probe_perf:* ^C[root@quaco ~] # perf stat -e probe_perf:* -I 2000 2.000404265 0 probe_perf:dso__adjust_kmod_long_name 4.001142200 0 probe_perf:dso__adjust_kmod_long_name 6.001704120 0 probe_perf:dso__adjust_kmod_long_name 8.002398316 0 probe_perf:dso__adjust_kmod_long_name 10.002984010 0 probe_perf:dso__adjust_kmod_long_name 12.003597851 0 probe_perf:dso__adjust_kmod_long_name 14.004113303 0 probe_perf:dso__adjust_kmod_long_name 16.004582773 0 probe_perf:dso__adjust_kmod_long_name 18.005176373 0 probe_perf:dso__adjust_kmod_long_name 20.005801605 0 probe_perf:dso__adjust_kmod_long_name 22.006467540 0 probe_perf:dso__adjust_kmod_long_name ^C 23.683261941 0 probe_perf:dso__adjust_kmod_long_name # Its not being used at all. To further test this I used kvm.ko as the offline module, i.e. removed if from the buildid-cache by nuking it completely (rm -rf ~/.debug) and moved it from the normal kernel distro path, removed the modules, stoped the kvm guest, and then installed it manually, etc. # rmmod kvm-intel # rmmod kvm # lsmod | grep kvm # modprobe kvm-intel modprobe: ERROR: ctx=0x55d3b1722260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory modprobe: ERROR: ctx=0x55d3b1722260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory modprobe: ERROR: could not insert 'kvm_intel': Unknown symbol in module, or unknown parameter (see dmesg) # insmod ./kvm.ko # modprobe kvm-intel modprobe: ERROR: ctx=0x562f34026260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory modprobe: ERROR: ctx=0x562f34026260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory # lsmod | grep kvm kvm_intel 299008 0 kvm 765952 1 kvm_intel irqbypass 16384 1 kvm # # perf probe -x ~/bin/perf machine__findnew_module_map:12 mname=m.name:string filename=filename:string 'dso_long_name=map->dso->long_name:string' 'dso_name=map->dso->name:string' # perf probe -l probe_perf:machine__findnew_module_map (on machine__findnew_module_map:12@util/machine.c in /home/acme/bin/perf with mname filename dso_long_name dso_name) # perf record ^C[ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 3.416 MB perf.data (33956 samples) ] # perf trace -e probe_perf:machine* <SNIP> 6.322 perf/23099 probe_perf:machine__findnew_module_map(__probe_ip: 5492493, mname: "[salsa20_generic]", filename: "/lib/modules/5.3.8-200.fc30.x86_64/kernel/crypto/salsa20_generic.ko.xz", dso_long_name: "/lib/modules/5.3.8-200.fc30.x86_64/kernel/crypto/salsa20_generic.ko.xz", dso_name: "[salsa20_generic]") 6.375 perf/23099 probe_perf:machine__findnew_module_map(__probe_ip: 5492493, mname: "[kvm]", filename: "[kvm]", dso_long_name: "[kvm]", dso_name: "[kvm]") <SNIP> The filename doesn't come with the path, no point in trying to set the dso->long_name. [root@quaco ~]# strace -e open,openat perf probe -m ./kvm.ko kvm_apic_local_deliver |& egrep 'open.*kvm' openat(AT_FDCWD, "/sys/module/kvm_intel/notes/.note.gnu.build-id", O_RDONLY) = 4 openat(AT_FDCWD, "/sys/module/kvm/notes/.note.gnu.build-id", O_RDONLY) = 4 openat(AT_FDCWD, "/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 7 openat(AT_FDCWD, "/sys/module/kvm_intel/notes/.note.gnu.build-id", O_RDONLY) = 8 openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/.debug/root/kvm.ko/5955f426cb93f03f30f3e876814be2db80ab0b55/probes", O_RDWR|O_CREAT, 0644) = 3 openat(AT_FDCWD, "/usr/lib/debug/root/kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/lib/debug/root/kvm.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/.debug/kvm.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3 openat(AT_FDCWD, "kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, ".debug/kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 4 openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3 [root@quaco ~]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-jlfew3lyb24d58egrp0o72o2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>