summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-15btrfs: add a btrfs_data_csum_ok helperChristoph Hellwig
Add a new checksumming helper that wraps btrfs_check_data_csum and does all the checks to if we're dealing with some form of nodatacsum I/O. This helper will be used by the new storage layer checksum validation and repair code. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: pre-load data checksum for reads in btrfs_submit_bioChristoph Hellwig
Instead of calling btrfs_lookup_bio_sums in every caller of btrfs_submit_bio that reads data, do the call once in btrfs_submit_bio. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: save the bio iter for checksum validation in common codeChristoph Hellwig
All callers of btrfs_submit_bio that want to validate checksums currently have to store a copy of the iter in the btrfs_bio. Move the assignment into common code. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: refactor error handling in btrfs_submit_bioChristoph Hellwig
Add a bbio local variable and to prepare for calling functions that return a blk_status_t, rename the existing int used for error handling so that ret can be reused for the blk_status_t, and a label that can be reused for failing the passed in bio. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: simplify parameters of btrfs_lookup_bio_sumsChristoph Hellwig
The csums argument is always NULL now, so remove it and always allocate the csums array in the btrfs_bio. Also pass the btrfs_bio instead of inode + bio to document that this function requires a btrfs_bio and not just any bio. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: remove the direct I/O read checksum lookup optimizationChristoph Hellwig
To prepare for pending changes drop the optimization to only look up csums once per bio that is submitted from the iomap layer. In the short run this does cause additional lookups for fragmented direct reads, but later in the series, the bio based lookup will be used on the entire bio submitted from iomap, restoring the old behavior in common code. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: add a btrfs_inode pointer to struct btrfs_bioChristoph Hellwig
All btrfs_bio I/Os are associated with an inode. Add a pointer to that inode, which will allow to simplify a lot of calling conventions, and which will be needed in the I/O completion path in the future. This grow the btrfs_bio structure by a pointer, but that grows will be offset by the removal of the device pointer soon. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: better document struct btrfs_bioChristoph Hellwig
Update the comments on btrfs_bio to better describe the structure. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15block: export bio_split_rwChristoph Hellwig
bio_split_rw can be used by file systems to split and incoming write bio into multiple bios fitting the hardware limit for use as ZONE_APPEND bios. Export it for initial use in btrfs. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: raid56: reduce overhead to calculate the bio lengthQu Wenruo
In rbio_update_error_bitmap(), we need to calculate the length of the rbio. As since it's called in the endio function, we can not directly grab the length from bi_iter. Currently we call bio_for_each_segment_all(), which will always return a range inside a page. But that's not necessary as we don't really care about anything inside the page. So use bio_for_each_bvec_all(), which can return a bvec across multiple continuous pages thus reduce the loops. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: fix spelling mistakes found using codespellColin Ian King
There quite a few spelling mistakes as found using codespell. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: skip backref walking during fiemap if we know the leaf is sharedFilipe Manana
During fiemap, when checking if a data extent is shared we are doing the backref walking even if we already know the leaf is shared, which is a waste of time since if the leaf shared then the data extent is also shared. So skip the backref walking when we know we are in a shared leaf. The following test was measures the gains for a case where all leaves are shared due to a snapshot: $ cat test.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj umount $DEV &> /dev/null mkfs.btrfs -f $DEV # Use compression to quickly create files with a lot of extents # (each with a size of 128K). mount -o compress=lzo $DEV $MNT # 40G gives 327680 extents, each with a size of 128K. xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar # Add some more files to increase the size of the fs and extent # trees (in the real world there's a lot of files and extents # from other files). xfs_io -f -c "pwrite -S 0xcd -b 1M 0 20G" $MNT/file1 xfs_io -f -c "pwrite -S 0xef -b 1M 0 20G" $MNT/file2 xfs_io -f -c "pwrite -S 0x73 -b 1M 0 20G" $MNT/file3 # Create a snapshot so all the extents become indirectly shared # through subtrees, with a generation less than or equals to the # generation used to create the snapshot. btrfs subvolume snapshot -r $MNT $MNT/snap1 # Unmount and mount again to clear cached metadata. umount $MNT mount -o compress=lzo $DEV $MNT start=$(date +%s%N) # The filefrag tool uses the fiemap ioctl. filefrag $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "fiemap took $dur milliseconds (metadata not cached)" echo start=$(date +%s%N) filefrag $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "fiemap took $dur milliseconds (metadata cached)" umount $MNT The results were the following on a non-debug kernel (Debian's default kernel config). Before this patch: (...) /mnt/sdi/foobar: 327680 extents found fiemap took 1821 milliseconds (metadata not cached) /mnt/sdi/foobar: 327680 extents found fiemap took 399 milliseconds (metadata cached) After this patch: (...) /mnt/sdi/foobar: 327680 extents found fiemap took 591 milliseconds (metadata not cached) /mnt/sdi/foobar: 327680 extents found fiemap took 123 milliseconds (metadata cached) That's a speedup of 3.1x and 3.2x. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: assert commit root semaphore is held when accessing backref cacheFilipe Manana
During fiemap, when accessing the cache that stores the sharedness of an extent, we need to either be holding a transaction handle or the commit root semaphore. I left comments about this in the comment that precedes store_backref_shared_cache() and lookup_backref_shared_cache(), but have actually not enforced it through assertions. So assert that the commit root semaphore is held if we are not holding a transaction handle. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: hold block group refcount during async discardBoris Burkov
Async discard does not acquire the block group reference count while it holds a reference on the discard list. This is generally OK, as the paths which destroy block groups tend to try to synchronize on cancelling async discard work. However, relying on cancelling work requires careful analysis to be sure it is safe from races with unpinning scheduling more work. While I am unable to find a race with unpinning in the current code for either the unused bgs or relocation paths, I believe we have one in an older version of auto relocation in a Meta internal build. This suggests that this is in fact an error prone model, and could be fragile to future changes to these bg deletion paths. To make this ownership more clear, add a refcount for async discard. If work is queued for a block group, its refcount should be incremented, and when work is completed or canceled, it should be decremented. CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: send: cache utimes operations for directories if possibleFilipe Manana
Whenever we add or remove an entry to a directory, we issue an utimes command for the directory. If we add 1000 entries to a directory (create 1000 files under it or move 1000 files to it), then we issue the same utimes command 1000 times, which increases the send stream size, results in more pipe IO, one search in the send b+tree, allocating one path for the search, etc, as well as making the receiver do a system call for each duplicated utimes command. We also issue an utimes command when we create a new directory, but later we might add entries to it corresponding to inodes with an higher inode number, so it's pointless to issue the utimes command before we create the last inode under the directory. So use a lru cache to track directories for which we must send a utimes command. When we need to remove an entry from the cache, we issue the utimes command for the respective directory. When finishing the send operation, we go over each cache element and issue the respective utimes command. Finally the caching is entirely optional, just a performance optimization, meaning that if we fail to cache (due to memory allocation failure), we issue the utimes command right away, that is, we fallback to the previous, unoptimized, behaviour. This patch belongs to a patchset comprised of the following patches: btrfs: send: directly return from did_overwrite_ref() and simplify it btrfs: send: avoid unnecessary generation search at did_overwrite_ref() btrfs: send: directly return from will_overwrite_ref() and simplify it btrfs: send: avoid extra b+tree searches when checking reference overrides btrfs: send: remove send_progress argument from can_rmdir() btrfs: send: avoid duplicated orphan dir allocation and initialization btrfs: send: avoid unnecessary orphan dir rbtree search at can_rmdir() btrfs: send: reduce searches on parent root when checking if dir can be removed btrfs: send: iterate waiting dir move rbtree only once when processing refs btrfs: send: initialize all the red black trees earlier btrfs: send: genericize the backref cache to allow it to be reused btrfs: adapt lru cache to allow for 64 bits keys on 32 bits systems btrfs: send: cache information about created directories btrfs: allow a generation number to be associated with lru cache entries btrfs: add an api to delete a specific entry from the lru cache btrfs: send: use the lru cache to implement the name cache btrfs: send: update size of roots array for backref cache entries btrfs: send: cache utimes operations for directories if possible The following test was run before and after applying the whole patchset, and on a non-debug kernel (Debian's default kernel config): #!/bin/bash MNT=/mnt/sdi DEV=/dev/sdi mkfs.btrfs -f $DEV > /dev/null mount $DEV $MNT mkdir $MNT/A for ((i = 1; i <= 20000; i++)); do echo -n > $MNT/A/file_$i done btrfs subvolume snapshot -r $MNT $MNT/snap1 mkdir $MNT/B for ((i = 20000; i <= 40000; i++)); do echo -n > $MNT/B/file_$i done mv $MNT/A/file_* $MNT/B/ btrfs subvolume snapshot -r $MNT $MNT/snap2 start=$(date +%s%N) btrfs send -p $MNT/snap1 $MNT/snap2 > /dev/null end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "Incremental send took $dur milliseconds" umount $MNT Before the whole patchset: 18408 milliseconds After the whole patchset: 1942 milliseconds (9.5x speedup) Using 60000 files instead of 40000: Before the whole patchset: 39764 milliseconds After the whole patchset: 3076 milliseconds (12.9x speedup) Using 20000 files instead of 40000: Before the whole patchset: 5072 milliseconds After the whole patchset: 916 milliseconds (5.5x speedup) Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: send: update size of roots array for backref cache entriesFilipe Manana
Currently we limit the size of the roots array, for backref cache entries, to 12 elements. This is because that number is enough for most cases and to make the backref cache entry size to be exactly 128 bytes, so that memory is allocated from the kmalloc-128 slab and no space is wasted. However recent changes in the series refactored the backref cache to be more generic and allow it to be reused for other purposes, which resulted in increasing the size of the embedded structure btrfs_lru_cache_entry in order to allow for supporting inode numbers as keys on 32 bits system and allow multiple generations per key. This resulted in increasing the size of struct backref_cache_entry from 128 bytes to 152 bytes. Since the cache entries are allocated with kmalloc(), it means we end up using the slab kmalloc-192, so we end up wasting 40 bytes of memory. So bump the size of the roots array from 12 elements to 17 elements, so we end up using 192 bytes for each backref cache entry. This patch is part of a larger patchset and the changelog of the last patch in the series contains a sample performance test and results. The patches that comprise the patchset are the following: btrfs: send: directly return from did_overwrite_ref() and simplify it btrfs: send: avoid unnecessary generation search at did_overwrite_ref() btrfs: send: directly return from will_overwrite_ref() and simplify it btrfs: send: avoid extra b+tree searches when checking reference overrides btrfs: send: remove send_progress argument from can_rmdir() btrfs: send: avoid duplicated orphan dir allocation and initialization btrfs: send: avoid unnecessary orphan dir rbtree search at can_rmdir() btrfs: send: reduce searches on parent root when checking if dir can be removed btrfs: send: iterate waiting dir move rbtree only once when processing refs btrfs: send: initialize all the red black trees earlier btrfs: send: genericize the backref cache to allow it to be reused btrfs: adapt lru cache to allow for 64 bits keys on 32 bits systems btrfs: send: cache information about created directories btrfs: allow a generation number to be associated with lru cache entries btrfs: add an api to delete a specific entry from the lru cache btrfs: send: use the lru cache to implement the name cache btrfs: send: update size of roots array for backref cache entries btrfs: send: cache utimes operations for directories if possible Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: send: use the lru cache to implement the name cacheFilipe Manana
The name cache in send is basically a lru cache implemented with a radix tree and linked lists, very similar to the lru cache module which is used for the send backref cache and the cache of previously created directories during a send operation. So remove all the custom caching code for the name cache and make it use the lru cache instead. One particular detail to note is that the current cache behaves a bit differently when it comes to eviction of entries. Namely when after inserting a new name in the cache, if the cache now has 256 entries, we evict the last 128 LRU entries. The lru_cache.{c,h} module behaves a bit differently in that once we reach the cache limit, we evict a single LRU entry. In practice this doesn't make much difference, but it's actually better to evict just one entry instead of half of the entries, as there's always a chance we will need a name stored in one of that last 128 removed entries. This patch is part of a larger patchset and the changelog of the last patch in the series contains a sample performance test and results. The patches that comprise the patchset are the following: btrfs: send: directly return from did_overwrite_ref() and simplify it btrfs: send: avoid unnecessary generation search at did_overwrite_ref() btrfs: send: directly return from will_overwrite_ref() and simplify it btrfs: send: avoid extra b+tree searches when checking reference overrides btrfs: send: remove send_progress argument from can_rmdir() btrfs: send: avoid duplicated orphan dir allocation and initialization btrfs: send: avoid unnecessary orphan dir rbtree search at can_rmdir() btrfs: send: reduce searches on parent root when checking if dir can be removed btrfs: send: iterate waiting dir move rbtree only once when processing refs btrfs: send: initialize all the red black trees earlier btrfs: send: genericize the backref cache to allow it to be reused btrfs: adapt lru cache to allow for 64 bits keys on 32 bits systems btrfs: send: cache information about created directories btrfs: allow a generation number to be associated with lru cache entries btrfs: add an api to delete a specific entry from the lru cache btrfs: send: use the lru cache to implement the name cache btrfs: send: update size of roots array for backref cache entries btrfs: send: cache utimes operations for directories if possible Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15thermal/drivers/st: Remove syscfg based driverAlain Volmat
The syscfg based thermal driver is only supporting STiH415 STiH416 and STiD127 platforms which are all no more supported. We can thus safely remove this driver since the remaining STi platform STiH407/STiH410 and STiH418 are all using the memmap based thermal driver. Signed-off-by: Alain Volmat <avolmat@me.com> Link: https://lore.kernel.org/r/20230209091659.1409-7-avolmat@me.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15thermal: Remove core header inclusion from driversDaniel Lezcano
As the name states "thermal_core.h" is the header file for the core components of the thermal framework. Too many drivers are including it. Hopefully the recent cleanups helped to self encapsulate the code a bit more and prevented the drivers to need this header. Remove this inclusion in every place where it is possible. Some other drivers did a confusion with the core header and the one exported in linux/thermal.h. They include the former instead of the latter. The changes also fix this. The tegra/soctherm driver still remains as it uses an internal function which need to be replaced. The Intel HFI driver uses the netlink internal framework core and should be changed to prevent to deal with the internals. No functional changes intended. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> # armada_thermal.c Reviewed-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> # uniphier_thermal.c Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> # rcar_gen3_thermal.c Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> # amlogic_thermal.c Acked-by: Florian Fainelli <f.fainelli@gmail.com> # bcm2835_thermal.c Acked-by: Thierry Reding <treding@nvidia.com> # tegra30-tsensor.c Link: https://lore.kernel.org/r/20230206153432.1017282-1-daniel.lezcano@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15tools/lib/thermal: Fix include path for libnl3 in pkg-config file.Vibhav Pant
Fixes pkg-config returning malformed CFLAGS for libthermal. Signed-off-by: Vibhav Pant <vibhavp@gmail.com> Link: https://lore.kernel.org/r/20230211081935.62690-1-vibhavp@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15thermal/drivers/hisi: Drop second sensor hi3660Yongqin Liu
The commit 74c8e6bffbe1 ("driver core: Add __alloc_size hint to devm allocators") exposes a panic "BRK handler: Fatal exception" on the hi3660_thermal_probe funciton. This is because the function allocates memory for only one sensors array entry, but tries to fill up a second one. Fix this by removing the unneeded second access. Fixes: 7d3a2a2bbadb ("thermal/drivers/hisi: Fix number of sensors on hi3660") Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org> Link: https://lore.kernel.org/linux-mm/20221101223321.1326815-5-keescook@chromium.org/ Link: https://lore.kernel.org/r/20230210141507.71014-1-yongqin.liu@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15thermal/drivers/rcar_gen3_thermal: Fix device initializationNiklas Söderlund
The thermal zone is registered before the device is register and the thermal coefficients are calculated, providing a window for very incorrect readings. The reason why the zone was register before the device was fully initialized was that the presence of the set_trips() callback is used to determine if the driver supports interrupt or not, as it is not defined if the device is incapable of interrupts. Fix this by using the operations structure in the private data instead of the zone to determine if interrupts are available or not, and initialize the device before registering the zone. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20230208190333.3159879-4-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15thermal/drivers/rcar_gen3_thermal: Create device local ops structNiklas Söderlund
The callback operations are modified on a driver global level. If one device tree description do not define interrupts, the set_trips() operation was disabled globally for all users of the driver. Fix this by creating a device local copy of the operations structure and modify the copy depending on what the device can do. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20230208190333.3159879-3-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15thermal/drivers/rcar_gen3_thermal: Do not call set_trips() when resumingNiklas Söderlund
There is no need to explicitly call set_trips() when resuming from suspend. The thermal framework calls thermal_zone_device_update() that restores the trip points. Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/20230208190333.3159879-2-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15thermal/drivers/rcar_gen3: Add support for R-Car V4HGeert Uytterhoeven
Add support for the Thermal Sensor/Chip Internal Voltage Monitor/Core Voltage Monitor (THS/CIVM/CVM) on the Renesas R-Car V4H (R8A779G0) SoC. According to the R-Car V4H Hardware User's Manual Rev. 0.70, the (preliminary) conversion formula for the thermal sensor is the same as for most other R-Car Gen3 and Gen4 SoCs, while the (preliminary) conversion formula for the chip internal voltage monitor differs. As the driver only uses the former, no further changes are needed. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/852048eb5f4cc001be7a97744f4c5caea912d071.1675958665.git.geert+renesas@glider.be Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15dt-bindings: thermal: rcar-gen3-thermal: Add r8a779g0 supportGeert Uytterhoeven
Document support for the Thermal Sensor/Chip Internal Voltage Monitor/Core Voltage Monitor (THS/CIVM/CVM) on the Renesas R-Car V4H (R8A779G0) SoC. Unlike most other R-Car Gen3 and Gen4 SoCs, it has 4 instead of 3 sensors, so increase the maximum number of reg tuples. Just like other R-Car Gen4 SoCs, interrupts are not routed to the INTC-AP but to the ECM. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/11f740522ec479011cc8eef6bb450603be394def.1675958665.git.geert+renesas@glider.be Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driverBalsam CHIHI
The Low Voltage Thermal Sensor (LVTS) is a multiple sensors, multi controllers contained in a thermal domain. A thermal domains can be the MCU or the AP. Each thermal domains contain up to seven controllers, each thermal controller handle up to four thermal sensors. The LVTS has two Finite State Machines (FSM), one to handle the functionin temperatures range like hot or cold temperature and another one to handle monitoring trip point. The FSM notifies via interrupts when a trip point is crossed. The interrupt is managed at the thermal controller level, so when an interrupt occurs, the driver has to find out which sensor triggered such an interrupt. The sampling of the thermal can be filtered or immediate. For the former, the LVTS measures several points and applies a low pass filter. Signed-off-by: Balsam CHIHI <bchihi@baylibre.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> On MT8195 Tomato Chromebook: Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20230209105628.50294-5-bchihi@baylibre.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15dt-bindings: thermal: mediatek: Add LVTS thermal controllersBalsam CHIHI
Add LVTS thermal controllers dt-binding definition for mt8192 and mt8195. Signed-off-by: Balsam CHIHI <bchihi@baylibre.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230209105628.50294-3-bchihi@baylibre.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15thermal/drivers/mediatek: Relocate driver to mediatek folderBalsam CHIHI
Add MediaTek proprietary folder to upstream more thermal zone and cooler drivers, relocate the original thermal controller driver to it, and rename it as "auxadc_thermal.c" to show its purpose more clearly. Signed-off-by: Balsam CHIHI <bchihi@baylibre.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20230209105628.50294-2-bchihi@baylibre.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15tools/lib/thermal: Fix thermal_sampling_exit()Vincent Guittot
thermal_sampling_init() suscribes to THERMAL_GENL_SAMPLING_GROUP_NAME group so thermal_sampling_exit() should unsubscribe from the same group. Fixes: 47c4b0de080a ("tools/lib/thermal: Add a thermal library") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20230202102812.453357-1-vincent.guittot@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15Merge branch 'thermal-intel'Rafael J. Wysocki
Merge thermal control changes related to Intel platforms for 6.3-rc1: - Rework ACPI helper functions for thermal control to retrieve a trip point temperature instead of initializing a trip point objetc (Rafael Wysocki). - Clean up and improve the int340x thermal driver ((Rafael Wysocki). - Simplify and clean up the intel_pch thermal driver ((Rafael Wysocki). - Fix the Intel powerclamp thermal driver and make it use the common idle injection framework (Srinivas Pandruvada). - Add two module parameters, cpumask and max_idle, to the Intel powerclamp thermal driver to allow it to affect only a specific subset of CPUs instead of all of them (Srinivas Pandruvada). - Make the Intel quark_dts thermal driver Use generic trip point objects instead of its own trip point representation (Daniel Lezcano). - Add toctree entry for thermal documents and fix two issues in the Intel powerclamp driver documentation (Bagas Sanjaya). * thermal-intel: (25 commits) Documentation: powerclamp: Fix numbered lists formatting Documentation: powerclamp: Escape wildcard in cpumask description Documentation: admin-guide: Add toctree entry for thermal docs thermal: intel: powerclamp: Add two module parameters Documentation: admin-guide: Move intel_powerclamp documentation thermal: intel: powerclamp: Fix duration module parameter thermal: intel: powerclamp: Return last requested state as cur_state thermal: intel: quark_dts: Use generic trip points thermal: intel: powerclamp: Use powercap idle-inject feature powercap: idle_inject: Add update callback powercap: idle_inject: Export symbols thermal: intel: powerclamp: Fix cur_state for multi package system thermal: intel: intel_pch: Drop struct board_info thermal: intel: intel_pch: Rename board ID symbols thermal: intel: intel_pch: Fold suspend and resume routines into their callers thermal: intel: intel_pch: Fold two functions into their callers thermal: intel: intel_pch: Eliminate device operations object thermal: intel: intel_pch: Rename device operations callbacks thermal: intel: intel_pch: Eliminate redundant return pointers thermal: intel: intel_pch: Make pch_wpt_add_acpi_psv_trip() return int ...
2023-02-15Merge branch 'thermal-core'Rafael J. Wysocki
Merge thermal control core changes for 6.3-rc1: - Clean up thermal device unregistration code (Viresh Kumar). - Fix and clean up thermal control core initialization error code paths (Daniel Lezcano). - Relocate the trip points handling code into a separate file (Daniel Lezcano). - Make the thermal core fail registration of thermal zones and cooling devices if the thermal class has not been registered (Rafael Wysocki). - Make the core thermal control code use sysfs_emit_at() instead of scnprintf() where applicable (ye xingchen). * thermal-core: thermal: core: Use sysfs_emit_at() instead of scnprintf() thermal: Fail object registration if thermal class is not registered thermal/core: Move the thermal trip code to a dedicated file thermal/core: Remove unneeded ida_destroy() thermal/core: Fix unregistering netlink at thermal init time thermal: core: Use device_unregister() instead of device_del/put() thermal: core: Move cdev cleanup to thermal_release()
2023-02-15gpio: mlxbf2: select GPIOLIB_IRQCHIPLinus Walleij
This driver uncondictionally uses the GPIOLIB_IRQCHIP so select it. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2023-02-15Merge branches 'pm-cpuidle', 'pm-core' and 'pm-sleep'Rafael J. Wysocki
Merge cpuidle updates, PM core updates and changes related to system sleep handling for 6.3-rc1: - Make the TEO cpuidle governor check CPU utilization in order to refine idle state selection (Kajetan Puchalski). - Make Kconfig select the haltpoll cpuidle governor when the haltpoll cpuidle driver is selected and replace a default_idle() call in that driver with arch_cpu_idle() which allows MWAIT to be used (Li RongQing). - Add Emerald Rapids Xeon support to the intel_idle driver (Artem Bityutskiy). - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to avoid randconfig build failures (Arnd Bergmann). - Make kobj_type structures used in the cpuidle sysfs interface constant (Thomas Weißschuh). - Make the cpuidle driver registration code update microsecond values of idle state parameters in accordance with their nanosecond values if they are provided (Rafael Wysocki). - Make the PSCI cpuidle driver prevent topology CPUs from being suspended on PREEMPT_RT (Krzysztof Kozlowski). - Document that pm_runtime_force_suspend() cannot be used with DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald). - Add EXPORT macros for exporting PM functions from drivers (Richard Fitzgerald). - Drop "select SRCU" from system sleep Kconfig (Paul E. McKenney). - Remove /** from non-kernel-doc comments in hibernation code (Randy Dunlap). * pm-cpuidle: cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT cpuidle: driver: Update microsecond values of state parameters as needed cpuidle: sysfs: make kobj_type structures constant cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies intel_idle: add Emerald Rapids Xeon support cpuidle-haltpoll: Replace default_idle() with arch_cpu_idle() cpuidle-haltpoll: select haltpoll governor cpuidle: teo: Introduce util-awareness cpuidle: teo: Optionally skip polling states in teo_find_shallower_state() * pm-core: PM: Add EXPORT macros for exporting PM functions PM: runtime: Document that force_suspend() is incompatible with SMART_SUSPEND * pm-sleep: PM: sleep: Remove "select SRCU" PM: hibernate: swap: don't use /** for non-kernel-doc comments
2023-02-15gpiolib: acpi: Add a ignore wakeup quirk for Clevo NH5xAxWerner Sembach
The commit 1796f808e4bb ("HID: i2c-hid: acpi: Stop setting wakeup_capable") changed the policy such that I2C touchpads may be able to wake up the system by default if the system is configured as such. However for some devices there is a bug, that is causing the touchpad to instantly wake up the device again once it gets deactivated. The root cause is still under investigation (see Link tag). To workaround this problem for the time being, introduce a quirk for this model that will prevent the wakeup capability for being set for GPIO 16. Fixes: 1796f808e4bb ("HID: i2c-hid: acpi: Stop setting wakeup_capable") Link: https://lore.kernel.org/linux-acpi/20230210164636.628462-1-wse@tuxedocomputers.com/ Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Cc: <stable@vger.kernel.org> # v6.1+ Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2023-02-15Documentation: amd-pstate: disambiguate user space sectionsBagas Sanjaya
kernel test robot reported htmldocs warning: Documentation/admin-guide/pm/amd-pstate.rst:343: WARNING: duplicate label admin-guide/pm/amd-pstate:user space interface in ``sysfs``, other instance in Documentation/admin-guide/pm/amd-pstate.rst The documentation contains two sections with the same "User Space Interface in ``sysfs``" title. The first one deals with per-policy sysfs and the second one is about general attributes (currently only global attributes are documented). Disambiguate title text of both sections to fix the warning. Link: https://lore.kernel.org/linux-doc/202302151041.0SWs1RHK-lkp@intel.com/ Fixes: b9e6a2d47b2565 ("Documentation: amd-pstate: introduce new global sysfs attributes") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Wyes Karny <wyes.karny@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQWyes Karny
`amd_pstate_set_epp` function uses `cppc_req_cached` and `epp` variable to update the MSR_AMD_CPPC_REQ register for AMD MSR systems. The recent commit 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use") changed the sequence of updating cppc_req_cached and writing the MSR_AMD_CPPC_REQ. Therefore while switching from powersave to performance governor and vice-versa in active mode MSR_AMD_CPPC_REQ is set with the previous cached value. To fix this: first update the `cppc_req_cached` variable and then call `amd_pstate_set_epp` function. - Before commit 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use"): With powersave governor: [ 1.652743] amd_pstate_epp_init: writing to cppc_req_cached = 0x1eff [ 1.652744] amd_pstate_set_epp: writing cppc_req_cached = 0x1eff [ 1.652746] amd_pstate_set_epp: writing min_perf = 30, des_perf = 0, max_perf = 255, epp = 0 Changing to performance governor: [ 300.493842] amd_pstate_epp_init: writing to cppc_req_cached = 0xffff [ 300.493846] amd_pstate_set_epp: writing cppc_req_cached = 0xffff [ 300.493847] amd_pstate_set_epp: writing min_perf = 255, des_perf = 0, max_perf = 255, epp = 0 - After commit 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use"): With powersave governor: [ 1.646037] amd_pstate_set_epp: writing cppc_req_cached = 0xffff [ 1.646038] amd_pstate_set_epp: writing min_perf = 255, des_perf = 0, max_perf = 255, epp = 0 [ 1.646042] amd_pstate_epp_init: writing to cppc_req_cached = 0x1eff Changing to performance governor: [ 687.117401] amd_pstate_set_epp: writing cppc_req_cached = 0x1eff [ 687.117405] amd_pstate_set_epp: writing min_perf = 30, des_perf = 0, max_perf = 255, epp = 0 [ 687.117419] amd_pstate_epp_init: writing to cppc_req_cached = 0xffff - After this fix: With powersave governor: [ 2.525717] amd_pstate_epp_init: writing to cppc_req_cached = 0x1eff [ 2.525720] amd_pstate_set_epp: writing cppc_req_cached = 0x1eff [ 2.525722] amd_pstate_set_epp: writing min_perf = 30, des_perf = 0, max_perf = 255, epp = 0 Changing to performance governor: [ 3440.152468] amd_pstate_epp_init: writing to cppc_req_cached = 0xffff [ 3440.152473] amd_pstate_set_epp: writing cppc_req_cached = 0xffff [ 3440.152474] amd_pstate_set_epp: writing min_perf = 255, des_perf = 0, max_perf = 255, epp = 0 Fixes: 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use") Signed-off-by: Wyes Karny <wyes.karny@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15gpio: vf610: make irq_chip immutableAlexander Stein
Since recently, the kernel is nagging about mutable irq_chips: "not an immutable chip, please consider fixing it!" Drop the unneeded copy, flag it as IRQCHIP_IMMUTABLE, add the new helper functions and call the appropriate gpiolib functions. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2023-02-15Merge branches 'acpi-video', 'acpi-misc' and 'acpi-docs'Rafael J. Wysocki
Merge ACPI backlight driver changes, miscellaneous ACPI-related changes and ACPI-related documentation updates for 6.3-rc1: - Fix Lenovo Ideapad Z570 DMI match in the ACPI backlight driver (Hans de Goede). - Silence missing prototype warnings in some places in the ACPI-related code (Ammar Faizi). - Make kobj_type structures used in the ACPI code constant (Thomas Weißschuh). - Correct spelling in firmware-guide/ACPI (Randy Dunlap). - Clarify the meaning of Explicit and Implicit in the _DSD GPIO properties documentation (Andy Shevchenko). * acpi-video: ACPI: video: Fix Lenovo Ideapad Z570 DMI match * acpi-misc: ACPI: make kobj_type structures constant ACPI: Silence missing prototype warnings * acpi-docs: Documentation: firmware-guide: gpio-properties: Clarify Explicit and Implicit Documentation: firmware-guide/ACPI: correct spelling
2023-02-15Merge branches 'acpi-resource', 'acpi-pmic', 'acpi-battery' and 'acpi-apei'Rafael J. Wysocki
Merge ACPI resources handling changes, ACPI PMIC and battery drivers changes and ACPI APEI changes for 6.3-rc1: - Add two more entries to the ACPI IRQ override quirk list (Adam Niederer, Werner Sembach). - Add a pmic_i2c_address entry for Intel Bay Trail Crystal Cove to allow intel_soc_pmic_exec_mipi_pmic_seq_element() to be used with the Bay Trail Crystal Cove PMIC OpRegion driver (Hans de Goede). - Add comments with DSDT power OpRegion field names to the ACPI PMIC driver (Hans de Goede). - Fix string termination handling in the ACPI battery driver (Armin Wolf). - Limit error type to 32-bit width in the ACPI APEI error injection code (Shuai Xue). * acpi-resource: ACPI: resource: Do IRQ override on all TongFang GMxRGxx ACPI: resource: Add IRQ overrides for MAINGEAR Vector Pro 2 models * acpi-pmic: ACPI: PMIC: Add comments with DSDT power opregion field names ACPI: PMIC: Add pmic_i2c_address to BYT Crystal Cove support * acpi-battery: ACPI: battery: Increase maximum string length ACPI: battery: Fix buffer overread if not NUL-terminated ACPI: battery: Fix missing NUL-termination with large strings * acpi-apei: ACPI: APEI: EINJ: Limit error type to 32-bit width
2023-02-15Merge branches 'acpi-processor', 'acpi-tables', 'acpi-pnp' and ↵Rafael J. Wysocki
'acpi-maintainers' Merge ACPI processor driver changes, ACPI table parser changes, ACPI device enumeration changes related to PNP and a MAINTAINERS update related to ACPI for 6.3-rc1: - Drop an unnecessary (void *) conversion from the ACPI processor driver (Zhou jie). - Modify the ACPI processor performance library code to use the "no limit" frequency QoS as appropriate and adjust the intel_pstate driver accordingly (Rafael Wysocki). - Add support for NBFT to the ACPI table parser (Stuart Hayes). - Introduce list of known non-PNP devices to avoid enumerating some of them as PNP devices (Rafael Wysocki). - Add x86 ACPI paths to the ACPI entry in MAINTAINERS to allow scripts to report the actual maintainers information (Rafael Wysocki). * acpi-processor: cpufreq: intel_pstate: Drop ACPI _PSS states table patching ACPI: processor: perflib: Avoid updating frequency QoS unnecessarily ACPI: processor: perflib: Use the "no limit" frequency QoS ACPI: processor: idle: Drop unnecessary (void *) conversion * acpi-tables: ACPI: tables: Add support for NBFT * acpi-pnp: ACPI: PNP: Introduce list of known non-PNP devices * acpi-maintainers: MAINTAINERS: Add x86 ACPI paths to the ACPI entry
2023-02-15Merge branch 'acpica'Rafael J. Wysocki
Merge ACPICA changes for 6.3-rc1: - Drop port I/O validation for some regions to avoid AML failures due to rejections of legitimate port I/O writes (Mario Limonciello). - Constify acpi_get_handle() pathname argument to allow its callers to pass conts pathnames to it (Sakari Ailus). - Prevent acpi_ns_simple_repair() from crashing in some cases when AE_AML_NO_RETURN_VALUE should be returned (Daniil Tatianin). - Fix typo in CDAT DSMAS struct definition (Lukas Wunner). * acpica: ACPICA: Fix typo in CDAT DSMAS struct definition ACPICA: nsrepair: handle cases without a return value correctly ACPICA: Constify pathname argument for acpi_get_handle() ACPICA: Drop port I/O validation for some regions
2023-02-15Merge tag 'qcom-arm64-defconfig-for-6.3-3' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/defconfig Few more Qualcomm ARM64 defconfig updates for v6.3 This enables the drivers needed to support USB Type-C based external display on the SC8280XP laptops. It also enables a couple of core drivers for the Qualcomm SA8775P platform. * tag 'qcom-arm64-defconfig-for-6.3-3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: defconfig: enable drivers required by the Qualcomm SA8775P platform arm64: defconfig: Enable DisplayPort on SC8280XP laptops Link: https://lore.kernel.org/r/20230215051757.1166709-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-15Merge tag 'qcom-arm64-for-6.3-3' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/dt Last set of Qualcomm ARM64 DTS updates for v6.3 This introduces additional DisplayPort controllers and pmic_glink on SC8280XP (8cx Gen3), which provides support for USB Type-C-based displays on the the Lenovo ThinkPad X13s and the compute reference device. The pmic_glink also provides battery and power supply status. Interrupt-parents are corrected across the SC8280XP PMICs, to allow non-Linux OSs to properly handle interrupts in the various blocks therein. It cleans up the SM8350 base dtsi and introduces GPU support on this platform, as well as enable this for the Hardware Development Kit (HDK). It enables i2c busses on the Fairphone FP4 Lastly it aligns glink node names with bindings across a few platforms, and corrects the compatible for the PON block in the pmk8350 PMIC. * tag 'qcom-arm64-for-6.3-3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: msm8996: align RPM G-Link clock-controller node with bindings arm64: dts: qcom: qcs404: align RPM G-Link node with bindings arm64: dts: qcom: ipq6018: align RPM G-Link node with bindings arm64: dts: qcom: sm8550: remove invalid interconnect property from cryptobam arm64: dts: qcom: sc7280: Adjust zombie PWM frequency arm64: dts: qcom: sc8280xp-pmics: Specify interrupt parent explicitly arm64: dts: qcom: sm7225-fairphone-fp4: enable remaining i2c busses arm64: dts: qcom: sm7225-fairphone-fp4: move status property down arm64: dts: qcom: pmk8350: Use the correct PON compatible arm64: dts: qcom: sc8280xp-x13s: Enable external display arm64: dts: qcom: sc8280xp-crd: Introduce pmic_glink arm64: dts: qcom: sc8280xp: Add USB-C-related DP blocks arm64: dts: qcom: sm8350-hdk: enable GPU arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU nodes arm64: dts: qcom: sm8350: finish reordering nodes arm64: dts: qcom: sm8350: move more nodes to correct place arm64: dts: qcom: sm8350: reorder device nodes Link: https://lore.kernel.org/r/20230215051530.1165953-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-15gpiolib: acpi: remove redundant declarationRaag Jadav
Remove acpi_device declaration, as it is no longer needed. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2023-02-15perf/x86: Refuse to export capabilities for hybrid PMUsSean Christopherson
Now that KVM disables vPMU support on hybrid CPUs, WARN and return zeros if perf_get_x86_pmu_capability() is invoked on a hybrid CPU. The helper doesn't provide an accurate accounting of the PMU capabilities for hybrid CPUs and needs to be enhanced if KVM, or anything else outside of perf, wants to act on the PMU capabilities. Cc: stable@vger.kernel.org Cc: Andrew Cooper <Andrew.Cooper3@citrix.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230208204230.1360502-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-15KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)Sean Christopherson
Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or gains a mechanism to let userspace opt-in to the dangers of exposing a hybrid vPMU to KVM guests. Virtualizing a hybrid PMU, or at least part of a hybrid PMU, is possible, but it requires careful, deliberate configuration from userspace. E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to prevent migrating a vCPU between a big core and a little core, userspace must enumerate a reasonable topology to the guest, and guest CPUID must be curated per vCPU to enumerate accurate vPMU capabilities. The last point is especially problematic, as KVM doesn't control which pCPU it runs on when enumerating KVM's vPMU capabilities to userspace, i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form. Alternatively, userspace could enable vPMU support by enumerating the set of features that are common and coherent across all cores, e.g. by filtering PMU events and restricting guest capabilities. But again, that requires userspace to take action far beyond reflecting KVM's supported feature set into the guest. For now, simply disable vPMU support on hybrid CPUs to avoid inducing seemingly random #GPs in guests, and punt support for hybrid CPUs to a future enabling effort. Reported-by: Jianfeng Gao <jianfeng.gao@intel.com> Cc: stable@vger.kernel.org Cc: Andrew Cooper <Andrew.Cooper3@citrix.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230208204230.1360502-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-15x86/build: Make 64-bit defconfig the defaultArnd Bergmann
Running 'make ARCH=x86 defconfig' on anything other than an x86_64 machine currently results in a 32-bit build, which is rarely what anyone wants these days. Change the default so that the 64-bit config gets used unless the user asks for i386_defconfig, uses ARCH=i386 or runs on a system that "uname -m" identifies as i386/i486/i586/i686. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230215091706.1623070-1-arnd@kernel.org
2023-02-15sched/psi: Fix use-after-free in ep_remove_wait_queue()Munehisa Kamata
If a non-root cgroup gets removed when there is a thread that registered trigger and is polling on a pressure file within the cgroup, the polling waitqueue gets freed in the following path: do_rmdir cgroup_rmdir kernfs_drain_open_files cgroup_file_release cgroup_pressure_release psi_trigger_destroy However, the polling thread still has a reference to the pressure file and will access the freed waitqueue when the file is closed or upon exit: fput ep_eventpoll_release ep_free ep_remove_wait_queue remove_wait_queue This results in use-after-free as pasted below. The fundamental problem here is that cgroup_file_release() (and consequently waitqueue's lifetime) is not tied to the file's real lifetime. Using wake_up_pollfree() here might be less than ideal, but it is in line with the comment at commit 42288cb44c4b ("wait: add wake_up_pollfree()") since the waitqueue's lifetime is not tied to file's one and can be considered as another special case. While this would be fixable by somehow making cgroup_file_release() be tied to the fput(), it would require sizable refactoring at cgroups or higher layer which might be more justifiable if we identify more cases like this. BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x60/0xc0 Write of size 4 at addr ffff88810e625328 by task a.out/4404 CPU: 19 PID: 4404 Comm: a.out Not tainted 6.2.0-rc6 #38 Hardware name: Amazon EC2 c5a.8xlarge/, BIOS 1.0 10/16/2017 Call Trace: <TASK> dump_stack_lvl+0x73/0xa0 print_report+0x16c/0x4e0 kasan_report+0xc3/0xf0 kasan_check_range+0x2d2/0x310 _raw_spin_lock_irqsave+0x60/0xc0 remove_wait_queue+0x1a/0xa0 ep_free+0x12c/0x170 ep_eventpoll_release+0x26/0x30 __fput+0x202/0x400 task_work_run+0x11d/0x170 do_exit+0x495/0x1130 do_group_exit+0x100/0x100 get_signal+0xd67/0xde0 arch_do_signal_or_restart+0x2a/0x2b0 exit_to_user_mode_prepare+0x94/0x100 syscall_exit_to_user_mode+0x20/0x40 do_syscall_64+0x52/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> Allocated by task 4404: kasan_set_track+0x3d/0x60 __kasan_kmalloc+0x85/0x90 psi_trigger_create+0x113/0x3e0 pressure_write+0x146/0x2e0 cgroup_file_write+0x11c/0x250 kernfs_fop_write_iter+0x186/0x220 vfs_write+0x3d8/0x5c0 ksys_write+0x90/0x110 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 4407: kasan_set_track+0x3d/0x60 kasan_save_free_info+0x27/0x40 ____kasan_slab_free+0x11d/0x170 slab_free_freelist_hook+0x87/0x150 __kmem_cache_free+0xcb/0x180 psi_trigger_destroy+0x2e8/0x310 cgroup_file_release+0x4f/0xb0 kernfs_drain_open_files+0x165/0x1f0 kernfs_drain+0x162/0x1a0 __kernfs_remove+0x1fb/0x310 kernfs_remove_by_name_ns+0x95/0xe0 cgroup_addrm_files+0x67f/0x700 cgroup_destroy_locked+0x283/0x3c0 cgroup_rmdir+0x29/0x100 kernfs_iop_rmdir+0xd1/0x140 vfs_rmdir+0xfe/0x240 do_rmdir+0x13d/0x280 __x64_sys_rmdir+0x2c/0x30 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 0e94682b73bf ("psi: introduce psi monitor") Signed-off-by: Munehisa Kamata <kamatam@amazon.com> Signed-off-by: Mengchi Cheng <mengcc@amazon.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/20230106224859.4123476-1-kamatam@amazon.com/ Link: https://lore.kernel.org/r/20230214212705.4058045-1-kamatam@amazon.com
2023-02-15Documentation/hw-vuln: Fix rST warningPaolo Bonzini
The following warning: Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst:92: ERROR: Unexpected indentation. was introduced by commit 493a2c2d23ca. Fix it by placing everything in the same paragraph and also use a monospace font. Fixes: 493a2c2d23ca ("Documentation/hw-vuln: Add documentation for Cross-Thread Return Predictions") Reported-by: Stephen Rothwell <sfr@canb@auug.org.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>