git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-02-15	btrfs: remove now unused checksumming helpers	Christoph Hellwig
	Remove the unused btrfs_verify_data_csum helper, and fold btrfs_check_data_csum into its only caller. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: remove btrfs_bio_for_each_sector	Christoph Hellwig
	btrfs_bio_for_each_sector is unused now, so remove it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: open code btrfs_bio_free_csum	Christoph Hellwig
	btrfs_bio_free_csum has only one caller left, and that caller is always for an data inode and doesn't need zeroing of the csum pointer as that pointer will never be touched again. Just open code the conditional kfree there. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: handle checksum validation and repair at the storage layer	Christoph Hellwig
	Currently btrfs handles checksum validation and repair in the end I/O handler for the btrfs_bio. This leads to a lot of duplicate code plus issues with varying semantics or bugs, e.g. - the until recently broken repair for compressed extents - the fact that encoded reads validate the checksums but do not kick of read repair - the inconsistent checking of the BTRFS_FS_STATE_NO_CSUMS flag This commit revamps the checksum validation and repair code to instead work below the btrfs_submit_bio interfaces. In case of a checksum failure (or a plain old I/O error), the repair is now kicked off before the upper level ->end_io handler is invoked. Progress of an in-progress repair is tracked by a small structure that is allocated using a mempool for each original bio with failed sectors, which holds a reference to the original bio. This new structure is allocated using a mempool to guarantee forward progress even under memory pressure. The mempool will be replenished when the repair completes, just as the mempools backing the bios. There is one significant behavior change here: If repair fails or is impossible to start with, the whole bio will be failed to the upper layer. This is the behavior that all I/O submitters except for buffered I/O already emulated in their end_io handler. For buffered I/O this now means that a large readahead request can fail due to a single bad sector, but as readahead errors are ignored the following readpage if the sector is actually accessed will still be able to read. This also matches the I/O failure handling in other file systems. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: add a btrfs_data_csum_ok helper	Christoph Hellwig
	Add a new checksumming helper that wraps btrfs_check_data_csum and does all the checks to if we're dealing with some form of nodatacsum I/O. This helper will be used by the new storage layer checksum validation and repair code. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: pre-load data checksum for reads in btrfs_submit_bio	Christoph Hellwig
	Instead of calling btrfs_lookup_bio_sums in every caller of btrfs_submit_bio that reads data, do the call once in btrfs_submit_bio. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: save the bio iter for checksum validation in common code	Christoph Hellwig
	All callers of btrfs_submit_bio that want to validate checksums currently have to store a copy of the iter in the btrfs_bio. Move the assignment into common code. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: refactor error handling in btrfs_submit_bio	Christoph Hellwig
	Add a bbio local variable and to prepare for calling functions that return a blk_status_t, rename the existing int used for error handling so that ret can be reused for the blk_status_t, and a label that can be reused for failing the passed in bio. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: simplify parameters of btrfs_lookup_bio_sums	Christoph Hellwig
	The csums argument is always NULL now, so remove it and always allocate the csums array in the btrfs_bio. Also pass the btrfs_bio instead of inode + bio to document that this function requires a btrfs_bio and not just any bio. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: remove the direct I/O read checksum lookup optimization	Christoph Hellwig
	To prepare for pending changes drop the optimization to only look up csums once per bio that is submitted from the iomap layer. In the short run this does cause additional lookups for fragmented direct reads, but later in the series, the bio based lookup will be used on the entire bio submitted from iomap, restoring the old behavior in common code. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: add a btrfs_inode pointer to struct btrfs_bio	Christoph Hellwig
	All btrfs_bio I/Os are associated with an inode. Add a pointer to that inode, which will allow to simplify a lot of calling conventions, and which will be needed in the I/O completion path in the future. This grow the btrfs_bio structure by a pointer, but that grows will be offset by the removal of the device pointer soon. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: better document struct btrfs_bio	Christoph Hellwig
	Update the comments on btrfs_bio to better describe the structure. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	block: export bio_split_rw	Christoph Hellwig
	bio_split_rw can be used by file systems to split and incoming write bio into multiple bios fitting the hardware limit for use as ZONE_APPEND bios. Export it for initial use in btrfs. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: raid56: reduce overhead to calculate the bio length	Qu Wenruo
	In rbio_update_error_bitmap(), we need to calculate the length of the rbio. As since it's called in the endio function, we can not directly grab the length from bi_iter. Currently we call bio_for_each_segment_all(), which will always return a range inside a page. But that's not necessary as we don't really care about anything inside the page. So use bio_for_each_bvec_all(), which can return a bvec across multiple continuous pages thus reduce the loops. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: fix spelling mistakes found using codespell	Colin Ian King
	There quite a few spelling mistakes as found using codespell. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: skip backref walking during fiemap if we know the leaf is shared	Filipe Manana
	During fiemap, when checking if a data extent is shared we are doing the backref walking even if we already know the leaf is shared, which is a waste of time since if the leaf shared then the data extent is also shared. So skip the backref walking when we know we are in a shared leaf. The following test was measures the gains for a case where all leaves are shared due to a snapshot: $ cat test.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj umount $DEV &> /dev/null mkfs.btrfs -f $DEV # Use compression to quickly create files with a lot of extents # (each with a size of 128K). mount -o compress=lzo $DEV $MNT # 40G gives 327680 extents, each with a size of 128K. xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar # Add some more files to increase the size of the fs and extent # trees (in the real world there's a lot of files and extents # from other files). xfs_io -f -c "pwrite -S 0xcd -b 1M 0 20G" $MNT/file1 xfs_io -f -c "pwrite -S 0xef -b 1M 0 20G" $MNT/file2 xfs_io -f -c "pwrite -S 0x73 -b 1M 0 20G" $MNT/file3 # Create a snapshot so all the extents become indirectly shared # through subtrees, with a generation less than or equals to the # generation used to create the snapshot. btrfs subvolume snapshot -r $MNT $MNT/snap1 # Unmount and mount again to clear cached metadata. umount $MNT mount -o compress=lzo $DEV $MNT start=$(date +%s%N) # The filefrag tool uses the fiemap ioctl. filefrag $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "fiemap took $dur milliseconds (metadata not cached)" echo start=$(date +%s%N) filefrag $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "fiemap took $dur milliseconds (metadata cached)" umount $MNT The results were the following on a non-debug kernel (Debian's default kernel config). Before this patch: (...) /mnt/sdi/foobar: 327680 extents found fiemap took 1821 milliseconds (metadata not cached) /mnt/sdi/foobar: 327680 extents found fiemap took 399 milliseconds (metadata cached) After this patch: (...) /mnt/sdi/foobar: 327680 extents found fiemap took 591 milliseconds (metadata not cached) /mnt/sdi/foobar: 327680 extents found fiemap took 123 milliseconds (metadata cached) That's a speedup of 3.1x and 3.2x. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: assert commit root semaphore is held when accessing backref cache	Filipe Manana
	During fiemap, when accessing the cache that stores the sharedness of an extent, we need to either be holding a transaction handle or the commit root semaphore. I left comments about this in the comment that precedes store_backref_shared_cache() and lookup_backref_shared_cache(), but have actually not enforced it through assertions. So assert that the commit root semaphore is held if we are not holding a transaction handle. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: hold block group refcount during async discard	Boris Burkov
	Async discard does not acquire the block group reference count while it holds a reference on the discard list. This is generally OK, as the paths which destroy block groups tend to try to synchronize on cancelling async discard work. However, relying on cancelling work requires careful analysis to be sure it is safe from races with unpinning scheduling more work. While I am unable to find a race with unpinning in the current code for either the unused bgs or relocation paths, I believe we have one in an older version of auto relocation in a Meta internal build. This suggests that this is in fact an error prone model, and could be fragile to future changes to these bg deletion paths. To make this ownership more clear, add a refcount for async discard. If work is queued for a block group, its refcount should be incremented, and when work is completed or canceled, it should be decremented. CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: send: cache utimes operations for directories if possible	Filipe Manana
	Whenever we add or remove an entry to a directory, we issue an utimes command for the directory. If we add 1000 entries to a directory (create 1000 files under it or move 1000 files to it), then we issue the same utimes command 1000 times, which increases the send stream size, results in more pipe IO, one search in the send b+tree, allocating one path for the search, etc, as well as making the receiver do a system call for each duplicated utimes command. We also issue an utimes command when we create a new directory, but later we might add entries to it corresponding to inodes with an higher inode number, so it's pointless to issue the utimes command before we create the last inode under the directory. So use a lru cache to track directories for which we must send a utimes command. When we need to remove an entry from the cache, we issue the utimes command for the respective directory. When finishing the send operation, we go over each cache element and issue the respective utimes command. Finally the caching is entirely optional, just a performance optimization, meaning that if we fail to cache (due to memory allocation failure), we issue the utimes command right away, that is, we fallback to the previous, unoptimized, behaviour. This patch belongs to a patchset comprised of the following patches: btrfs: send: directly return from did_overwrite_ref() and simplify it btrfs: send: avoid unnecessary generation search at did_overwrite_ref() btrfs: send: directly return from will_overwrite_ref() and simplify it btrfs: send: avoid extra b+tree searches when checking reference overrides btrfs: send: remove send_progress argument from can_rmdir() btrfs: send: avoid duplicated orphan dir allocation and initialization btrfs: send: avoid unnecessary orphan dir rbtree search at can_rmdir() btrfs: send: reduce searches on parent root when checking if dir can be removed btrfs: send: iterate waiting dir move rbtree only once when processing refs btrfs: send: initialize all the red black trees earlier btrfs: send: genericize the backref cache to allow it to be reused btrfs: adapt lru cache to allow for 64 bits keys on 32 bits systems btrfs: send: cache information about created directories btrfs: allow a generation number to be associated with lru cache entries btrfs: add an api to delete a specific entry from the lru cache btrfs: send: use the lru cache to implement the name cache btrfs: send: update size of roots array for backref cache entries btrfs: send: cache utimes operations for directories if possible The following test was run before and after applying the whole patchset, and on a non-debug kernel (Debian's default kernel config): #!/bin/bash MNT=/mnt/sdi DEV=/dev/sdi mkfs.btrfs -f $DEV > /dev/null mount $DEV $MNT mkdir $MNT/A for ((i = 1; i <= 20000; i++)); do echo -n > $MNT/A/file_$i done btrfs subvolume snapshot -r $MNT $MNT/snap1 mkdir $MNT/B for ((i = 20000; i <= 40000; i++)); do echo -n > $MNT/B/file_$i done mv $MNT/A/file_* $MNT/B/ btrfs subvolume snapshot -r $MNT $MNT/snap2 start=$(date +%s%N) btrfs send -p $MNT/snap1 $MNT/snap2 > /dev/null end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "Incremental send took $dur milliseconds" umount $MNT Before the whole patchset: 18408 milliseconds After the whole patchset: 1942 milliseconds (9.5x speedup) Using 60000 files instead of 40000: Before the whole patchset: 39764 milliseconds After the whole patchset: 3076 milliseconds (12.9x speedup) Using 20000 files instead of 40000: Before the whole patchset: 5072 milliseconds After the whole patchset: 916 milliseconds (5.5x speedup) Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: send: update size of roots array for backref cache entries	Filipe Manana
	Currently we limit the size of the roots array, for backref cache entries, to 12 elements. This is because that number is enough for most cases and to make the backref cache entry size to be exactly 128 bytes, so that memory is allocated from the kmalloc-128 slab and no space is wasted. However recent changes in the series refactored the backref cache to be more generic and allow it to be reused for other purposes, which resulted in increasing the size of the embedded structure btrfs_lru_cache_entry in order to allow for supporting inode numbers as keys on 32 bits system and allow multiple generations per key. This resulted in increasing the size of struct backref_cache_entry from 128 bytes to 152 bytes. Since the cache entries are allocated with kmalloc(), it means we end up using the slab kmalloc-192, so we end up wasting 40 bytes of memory. So bump the size of the roots array from 12 elements to 17 elements, so we end up using 192 bytes for each backref cache entry. This patch is part of a larger patchset and the changelog of the last patch in the series contains a sample performance test and results. The patches that comprise the patchset are the following: btrfs: send: directly return from did_overwrite_ref() and simplify it btrfs: send: avoid unnecessary generation search at did_overwrite_ref() btrfs: send: directly return from will_overwrite_ref() and simplify it btrfs: send: avoid extra b+tree searches when checking reference overrides btrfs: send: remove send_progress argument from can_rmdir() btrfs: send: avoid duplicated orphan dir allocation and initialization btrfs: send: avoid unnecessary orphan dir rbtree search at can_rmdir() btrfs: send: reduce searches on parent root when checking if dir can be removed btrfs: send: iterate waiting dir move rbtree only once when processing refs btrfs: send: initialize all the red black trees earlier btrfs: send: genericize the backref cache to allow it to be reused btrfs: adapt lru cache to allow for 64 bits keys on 32 bits systems btrfs: send: cache information about created directories btrfs: allow a generation number to be associated with lru cache entries btrfs: add an api to delete a specific entry from the lru cache btrfs: send: use the lru cache to implement the name cache btrfs: send: update size of roots array for backref cache entries btrfs: send: cache utimes operations for directories if possible Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: send: use the lru cache to implement the name cache	Filipe Manana
	The name cache in send is basically a lru cache implemented with a radix tree and linked lists, very similar to the lru cache module which is used for the send backref cache and the cache of previously created directories during a send operation. So remove all the custom caching code for the name cache and make it use the lru cache instead. One particular detail to note is that the current cache behaves a bit differently when it comes to eviction of entries. Namely when after inserting a new name in the cache, if the cache now has 256 entries, we evict the last 128 LRU entries. The lru_cache.{c,h} module behaves a bit differently in that once we reach the cache limit, we evict a single LRU entry. In practice this doesn't make much difference, but it's actually better to evict just one entry instead of half of the entries, as there's always a chance we will need a name stored in one of that last 128 removed entries. This patch is part of a larger patchset and the changelog of the last patch in the series contains a sample performance test and results. The patches that comprise the patchset are the following: btrfs: send: directly return from did_overwrite_ref() and simplify it btrfs: send: avoid unnecessary generation search at did_overwrite_ref() btrfs: send: directly return from will_overwrite_ref() and simplify it btrfs: send: avoid extra b+tree searches when checking reference overrides btrfs: send: remove send_progress argument from can_rmdir() btrfs: send: avoid duplicated orphan dir allocation and initialization btrfs: send: avoid unnecessary orphan dir rbtree search at can_rmdir() btrfs: send: reduce searches on parent root when checking if dir can be removed btrfs: send: iterate waiting dir move rbtree only once when processing refs btrfs: send: initialize all the red black trees earlier btrfs: send: genericize the backref cache to allow it to be reused btrfs: adapt lru cache to allow for 64 bits keys on 32 bits systems btrfs: send: cache information about created directories btrfs: allow a generation number to be associated with lru cache entries btrfs: add an api to delete a specific entry from the lru cache btrfs: send: use the lru cache to implement the name cache btrfs: send: update size of roots array for backref cache entries btrfs: send: cache utimes operations for directories if possible Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	usb: typec: pd: Add higher capability sysfs for sink PDO	Saranya Gopal
	28th bit of fixed supply sink PDO represents higher capability. When this bit is set, the sink device needs more than vsafe5V (eg: 12 V) to provide full functionality. This patch adds this higher capability sysfs interface for sink PDO. 28th bit of fixed supply source PDO represents usb_suspend_supported attribute. This usb_suspend_supported sysfs is already exposed for source PDOs. This patch adds 'source-capabilities' in usb_suspend_supported sysfs documentation for additional clarity. Signed-off-by: Saranya Gopal <saranya.gopal@intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20230214114543.205103-2-saranya.gopal@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-15	usb: typec: pd: Remove usb_suspend_supported sysfs from sink PDO	Saranya Gopal
	As per USB PD specification, 28th bit of fixed supply sink PDO represents "higher capability" attribute and not "usb suspend supported" attribute. So, this patch removes the usb_suspend_supported attribute from sink PDO. Fixes: 662a60102c12 ("usb: typec: Separate USB Power Delivery from USB Type-C") Cc: stable <stable@kernel.org> Reported-by: Rajaram Regupathy <rajaram.regupathy@intel.com> Signed-off-by: Saranya Gopal <saranya.gopal@intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20230214114543.205103-1-saranya.gopal@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-15	usb: dwc3: pci: add support for the Intel Meteor Lake-M	Heikki Krogerus
	This patch adds the necessary PCI IDs for Intel Meteor Lake-M devices. Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230215132711.35668-1-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-15	f2fs: drop unnecessary arg for f2fs_ioc_*()	Yangtao Li
	They are not used, let's remove them. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-15	f2fs: Revert "f2fs: truncate blocks in batch in __complete_revoke_list()"	Jaegeuk Kim
	We should not truncate replaced blocks, and were supposed to truncate the first part as well. This reverts commit 78a99fe6254cad4be310cd84af39f6c46b668c72. Cc: stable@vger.kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-15	Merge tag 'kvm-s390-next-6.3-1' of ↵	Paolo Bonzini
	https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD * Two more V!=R patches * The last part of the cmpxchg patches * A few fixes
2023-02-15	Merge tag 'kvm-riscv-6.3-1' of https://github.com/kvm-riscv/linux into HEAD	Paolo Bonzini
	KVM/riscv changes for 6.3 - Fix wrong usage of PGDIR_SIZE to check page sizes - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect() - Redirect illegal instruction traps to guest - SBI PMU support for guest
2023-02-15	wifi: mac80211: add documentation for amsdu_mesh_control	Johannes Berg
	This documentation wasn't added in the original patch, add it now. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 6e4c0d0460bd ("wifi: mac80211: add a workaround for receiving non-standard mesh A-MSDU") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-15	wifi: cfg80211: remove gfp parameter from ↵	Lorenzo Bianconi
	cfg80211_obss_color_collision_notify description Get rid of gfp parameter from cfg80211_obss_color_collision_notify routine description. Fixes: 935ef47b16cc ("wifi: cfg80211: get rid of gfp in cfg80211_bss_color_notify") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/2da652e2cd5c7903191091ae9757718f1be802a1.1676453359.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-15	wifi: mac80211: always initialize link_sta with sta	Johannes Berg
	When we have multiple interfaces receiving the same frame, such as a multicast frame, one interface might have a sta and the other not. In this case, link_sta would be set but not cleared again. Always set link_sta, so we keep an invariant that link_sta and sta are either both set or both not set. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-15	wifi: mac80211: pass 'sta' to ieee80211_rx_data_set_sta()	Johannes Berg
	There's at least one case in ieee80211_rx_for_interface() where we might pass &((struct sta_info )NULL)->sta to it only to then do container_of(), and then checking the result for NULL, but checking the result of container_of() for NULL looks really odd. Fix this by just passing the struct sta_info instead. Fixes: e66b7920aa5a ("wifi: mac80211: fix initialization of rx->link and rx->link_sta") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-15	wifi: cfg80211: Set SSID if it is not already set	Marc Bornand
	When a connection was established without going through NL80211_CMD_CONNECT, the ssid was never set in the wireless_dev struct. Now we set it in __cfg80211_connect_result() when it is not already set. When using a userspace configuration that does not call cfg80211_connect() (can be checked with breakpoints in the kernel), this patch should allow `networkctl status device_name` to output the SSID instead of null. Cc: stable@vger.kernel.org Reported-by: Yohan Prod'homme <kernel@zoddo.fr> Fixes: 7b0a0e3c3a88 (wifi: cfg80211: do some rework towards MLO link APIs) Link: https://bugzilla.kernel.org/show_bug.cgi?id=216711 Signed-off-by: Marc Bornand <dev.mbornand@systemb.ch> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-15	selftests/bpf: Fix map_kptr test.	Alexei Starovoitov
	The compiler is optimizing out majority of unref_ptr read/writes, so the test wasn't testing much. For example, one could delete '__kptr' tag from 'struct prog_test_ref_kfunc __kptr *unref_ptr;' and the test would still "pass". Convert it to volatile stores. Confirmed by comparing bpf asm before/after. Fixes: 2cbc469a6fc3 ("selftests/bpf: Add C tests for kptr") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230214235051.22938-1-alexei.starovoitov@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-15	Merge tag 'kvm-x86-vmx-6.3' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM VMX changes for 6.3: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps
2023-02-15	Merge tag 'kvm-x86-svm-6.3' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM SVM changes for 6.3: - Fix a mostly benign overflow bug in SEV's send\|receive_update_data() - Move the SVM-specific "host flags" into vcpu_svm (extracted from the vNMI enabling series) - A handful for fixes and cleanups
2023-02-15	HID: asus: use spinlock to safely schedule workers	Pietro Borrello
	Use spinlocks to deal with workers introducing a wrapper asus_schedule_work(), and several spinlock checks. Otherwise, asus_kbd_backlight_set() may schedule led->work after the structure has been freed, causing a use-after-free. Fixes: af22a610bc38 ("HID: asus: support backlight on USB keyboards") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Link: https://lore.kernel.org/r/20230125-hid-unregister-leds-v4-5-7860c5763c38@diag.uniroma1.it Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
2023-02-15	HID: asus: use spinlock to protect concurrent accesses	Pietro Borrello
	asus driver has a worker that may access data concurrently. Proct the accesses using a spinlock. Fixes: af22a610bc38 ("HID: asus: support backlight on USB keyboards") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Link: https://lore.kernel.org/r/20230125-hid-unregister-leds-v4-4-7860c5763c38@diag.uniroma1.it Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
2023-02-15	HID: bigben: use spinlock to safely schedule workers	Pietro Borrello
	Use spinlocks to deal with workers introducing a wrapper bigben_schedule_work(), and several spinlock checks. Otherwise, bigben_set_led() may schedule bigben->worker after the structure has been freed, causing a use-after-free. Fixes: 4eb1b01de5b9 ("HID: hid-bigbenff: fix race condition for scheduled work during removal") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Link: https://lore.kernel.org/r/20230125-hid-unregister-leds-v4-3-7860c5763c38@diag.uniroma1.it Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
2023-02-15	HID: bigben_worker() remove unneeded check on report_field	Pietro Borrello
	bigben_worker() checks report_field to be non-NULL. The check has been added in commit 918aa1ef104d ("HID: bigbenff: prevent null pointer dereference") to prevent a NULL pointer crash. However, the true root cause was a missing check for output reports, patched in commit c7bf714f8755 ("HID: check empty report_list in bigben_probe()"), where the type-confused report list_entry was overlapping with a NULL pointer, which was then causing the crash. Fixes: 918aa1ef104d ("HID: bigbenff: prevent null pointer dereference") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Link: https://lore.kernel.org/r/20230125-hid-unregister-leds-v4-2-7860c5763c38@diag.uniroma1.it Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
2023-02-15	HID: bigben: use spinlock to protect concurrent accesses	Pietro Borrello
	bigben driver has a worker that may access data concurrently. Proct the accesses using a spinlock. Fixes: 256a90ed9e46 ("HID: hid-bigbenff: driver for BigBen Interactive PS3OFMINIPAD gamepad") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Link: https://lore.kernel.org/r/20230125-hid-unregister-leds-v4-1-7860c5763c38@diag.uniroma1.it Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
2023-02-15	ring-buffer: Handle race between rb_move_tail and rb_check_pages	Mukesh Ojha
	It seems a data race between ring_buffer writing and integrity check. That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG was cleared when doing integrity check rb_check_pages(): rb_check_pages() rb_handle_head_page(): -------- -------- rb_head_page_deactivate() rb_head_page_set_normal() rb_head_page_activate() We do intergrity test of the list to check if the list is corrupted and it is still worth doing it. So, let's refactor rb_check_pages() such that we no longer clear and set flag during the list sanity checking. [1] and [2] are the test to reproduce and the crash report respectively. 1: ``` read_trace.sh while true; do # the "trace" file is closed after read head -1 /sys/kernel/tracing/trace > /dev/null done ``` ``` repro.sh sysctl -w kernel.panic_on_warn=1 # function tracer will writing enough data into ring_buffer echo function > /sys/kernel/tracing/current_tracer ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ``` 2: ------------[ cut here ]------------ WARNING: CPU: 9 PID: 62 at kernel/trace/ring_buffer.c:2653 rb_move_tail+0x450/0x470 Modules linked in: CPU: 9 PID: 62 Comm: ksoftirqd/9 Tainted: G W 6.2.0-rc6+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:rb_move_tail+0x450/0x470 Code: ff ff 4c 89 c8 f0 4d 0f b1 02 48 89 c2 48 83 e2 fc 49 39 d0 75 24 83 e0 03 83 f8 02 0f 84 e1 fb ff ff 48 8b 57 10 f0 ff 42 08 <0f> 0b 83 f8 02 0f 84 ce fb ff ff e9 db RSP: 0018:ffffb5564089bd00 EFLAGS: 00000203 RAX: 0000000000000000 RBX: ffff9db385a2bf81 RCX: ffffb5564089bd18 RDX: ffff9db281110100 RSI: 0000000000000fe4 RDI: ffff9db380145400 RBP: ffff9db385a2bf80 R08: ffff9db385a2bfc0 R09: ffff9db385a2bfc2 R10: ffff9db385a6c000 R11: ffff9db385a2bf80 R12: 0000000000000000 R13: 00000000000003e8 R14: ffff9db281110100 R15: ffffffffbb006108 FS: 0000000000000000(0000) GS:ffff9db3bdcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005602323024c8 CR3: 0000000022e0c000 CR4: 00000000000006e0 Call Trace: <TASK> ring_buffer_lock_reserve+0x136/0x360 ? __do_softirq+0x287/0x2df ? __pfx_rcu_softirq_qs+0x10/0x10 trace_function+0x21/0x110 ? __pfx_rcu_softirq_qs+0x10/0x10 ? __do_softirq+0x287/0x2df function_trace_call+0xf6/0x120 0xffffffffc038f097 ? rcu_softirq_qs+0x5/0x140 rcu_softirq_qs+0x5/0x140 __do_softirq+0x287/0x2df run_ksoftirqd+0x2a/0x30 smpboot_thread_fn+0x188/0x220 ? __pfx_smpboot_thread_fn+0x10/0x10 kthread+0xe7/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> ---[ end trace 0000000000000000 ]--- [ crash report and test reproducer credit goes to Zheng Yejian] Link: https://lore.kernel.org/linux-trace-kernel/1676376403-16462-1-git-send-email-quic_mojha@quicinc.com Cc: <mhiramat@kernel.org> Cc: stable@vger.kernel.org Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") Reported-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-15	selftests/bpf: Cross-compile bpftool	Björn Töpel
	When the BPF selftests are cross-compiled, only the a host version of bpftool is built. This version of bpftool is used on the host-side to generate various intermediates, e.g., skeletons. The test runners are also using bpftool, so the Makefile will symlink bpftool from the selftest/bpf root, where the test runners will look the tool: \| $(Q)ln -sf $(if $2,..,.)/tools/build/bpftool/bootstrap/bpftool \ \| $(OUTPUT)/$(if $2,$2/)bpftool There are two problems for cross-compilation builds: 1. There is no native (cross-compilation target) of bpftool 2. The bootstrap/bpftool is never cross-compiled (by design) Make sure that a native/cross-compiled version of bpftool is built, and if CROSS_COMPILE is set, symlink the native/non-bootstrap version. Acked-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230214161253.183458-1-bjorn@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15	bpf, docs: Add myself to BPF docs MAINTAINERS entry	David Vernet
	In commit 7e2a9ebe8126 ("docs, bpf: Ensure IETF's BPF mailing list gets copied for ISA doc changes"), a new MAINTAINERS entry was added for any BPF IETF documentation updates for the ongoing standardization process. I've been making it a point to try and review as many BPF documentation patches as possible, and have made a committment to Alexei to consistently review BPF standardization patches going forward. This patch adds my name as a reviewer to the MAINTAINERS entry for the standardization effort. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230214223553.78353-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15	selftests/bpf: Fix build error for LoongArch	Tiezhu Yang
	There exists build error when make -C tools/testing/selftests/bpf/ on LoongArch: BINARY test_verifier In file included from test_verifier.c:27: tools/include/uapi/linux/bpf_perf_event.h:14:28: error: field 'regs' has incomplete type 14 \| bpf_user_pt_regs_t regs; \| ^~~~ make: *** [Makefile:577: tools/testing/selftests/bpf/test_verifier] Error 1 make: Leaving directory 'tools/testing/selftests/bpf' Add missing uapi header for LoongArch to use the following definition: typedef struct user_pt_regs bpf_user_pt_regs_t; Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/1676458867-22052-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15	Documentation: bpf: Add missing line break separator in node_data struct ↵	Bagas Sanjaya
	code block Stephen Rothwell reported htmldocs warning when merging bpf-next tree, which was the same warning as reported by kernel test robot: Documentation/bpf/graph_ds_impl.rst:62: ERROR: Error in "code-block" directive: maximum 1 argument(s) allowed, 12 supplied. The error is due to Sphinx confuses node_data struct declaration with code-block directive option. Fix the warning by separating the code-block marker with node_data struct declaration. Link: https://lore.kernel.org/linux-next/20230215144505.4751d823@canb.auug.org.au/ Link: https://lore.kernel.org/linux-doc/202302151123.wUE5FYFx-lkp@intel.com/ Fixes: c31315c3aa0929 ("bpf, documentation: Add graph documentation for non-owning refs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20230215123253.41552-3-bagasdotme@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15	thermal/drivers/st: Remove syscfg based driver	Alain Volmat
	The syscfg based thermal driver is only supporting STiH415 STiH416 and STiD127 platforms which are all no more supported. We can thus safely remove this driver since the remaining STi platform STiH407/STiH410 and STiH418 are all using the memmap based thermal driver. Signed-off-by: Alain Volmat <avolmat@me.com> Link: https://lore.kernel.org/r/20230209091659.1409-7-avolmat@me.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	thermal: Remove core header inclusion from drivers	Daniel Lezcano
	As the name states "thermal_core.h" is the header file for the core components of the thermal framework. Too many drivers are including it. Hopefully the recent cleanups helped to self encapsulate the code a bit more and prevented the drivers to need this header. Remove this inclusion in every place where it is possible. Some other drivers did a confusion with the core header and the one exported in linux/thermal.h. They include the former instead of the latter. The changes also fix this. The tegra/soctherm driver still remains as it uses an internal function which need to be replaced. The Intel HFI driver uses the netlink internal framework core and should be changed to prevent to deal with the internals. No functional changes intended. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> # armada_thermal.c Reviewed-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> # uniphier_thermal.c Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> # rcar_gen3_thermal.c Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> # amlogic_thermal.c Acked-by: Florian Fainelli <f.fainelli@gmail.com> # bcm2835_thermal.c Acked-by: Thierry Reding <treding@nvidia.com> # tegra30-tsensor.c Link: https://lore.kernel.org/r/20230206153432.1017282-1-daniel.lezcano@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	tools/lib/thermal: Fix include path for libnl3 in pkg-config file.	Vibhav Pant
	Fixes pkg-config returning malformed CFLAGS for libthermal. Signed-off-by: Vibhav Pant <vibhavp@gmail.com> Link: https://lore.kernel.org/r/20230211081935.62690-1-vibhavp@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	thermal/drivers/hisi: Drop second sensor hi3660	Yongqin Liu
	The commit 74c8e6bffbe1 ("driver core: Add __alloc_size hint to devm allocators") exposes a panic "BRK handler: Fatal exception" on the hi3660_thermal_probe funciton. This is because the function allocates memory for only one sensors array entry, but tries to fill up a second one. Fix this by removing the unneeded second access. Fixes: 7d3a2a2bbadb ("thermal/drivers/hisi: Fix number of sensors on hi3660") Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org> Link: https://lore.kernel.org/linux-mm/20221101223321.1326815-5-keescook@chromium.org/ Link: https://lore.kernel.org/r/20230210141507.71014-1-yongqin.liu@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>