linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-02-15	btrfs: assert commit root semaphore is held when accessing backref cache	Filipe Manana
	During fiemap, when accessing the cache that stores the sharedness of an extent, we need to either be holding a transaction handle or the commit root semaphore. I left comments about this in the comment that precedes store_backref_shared_cache() and lookup_backref_shared_cache(), but have actually not enforced it through assertions. So assert that the commit root semaphore is held if we are not holding a transaction handle. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: hold block group refcount during async discard	Boris Burkov
	Async discard does not acquire the block group reference count while it holds a reference on the discard list. This is generally OK, as the paths which destroy block groups tend to try to synchronize on cancelling async discard work. However, relying on cancelling work requires careful analysis to be sure it is safe from races with unpinning scheduling more work. While I am unable to find a race with unpinning in the current code for either the unused bgs or relocation paths, I believe we have one in an older version of auto relocation in a Meta internal build. This suggests that this is in fact an error prone model, and could be fragile to future changes to these bg deletion paths. To make this ownership more clear, add a refcount for async discard. If work is queued for a block group, its refcount should be incremented, and when work is completed or canceled, it should be decremented. CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: send: cache utimes operations for directories if possible	Filipe Manana
	Whenever we add or remove an entry to a directory, we issue an utimes command for the directory. If we add 1000 entries to a directory (create 1000 files under it or move 1000 files to it), then we issue the same utimes command 1000 times, which increases the send stream size, results in more pipe IO, one search in the send b+tree, allocating one path for the search, etc, as well as making the receiver do a system call for each duplicated utimes command. We also issue an utimes command when we create a new directory, but later we might add entries to it corresponding to inodes with an higher inode number, so it's pointless to issue the utimes command before we create the last inode under the directory. So use a lru cache to track directories for which we must send a utimes command. When we need to remove an entry from the cache, we issue the utimes command for the respective directory. When finishing the send operation, we go over each cache element and issue the respective utimes command. Finally the caching is entirely optional, just a performance optimization, meaning that if we fail to cache (due to memory allocation failure), we issue the utimes command right away, that is, we fallback to the previous, unoptimized, behaviour. This patch belongs to a patchset comprised of the following patches: btrfs: send: directly return from did_overwrite_ref() and simplify it btrfs: send: avoid unnecessary generation search at did_overwrite_ref() btrfs: send: directly return from will_overwrite_ref() and simplify it btrfs: send: avoid extra b+tree searches when checking reference overrides btrfs: send: remove send_progress argument from can_rmdir() btrfs: send: avoid duplicated orphan dir allocation and initialization btrfs: send: avoid unnecessary orphan dir rbtree search at can_rmdir() btrfs: send: reduce searches on parent root when checking if dir can be removed btrfs: send: iterate waiting dir move rbtree only once when processing refs btrfs: send: initialize all the red black trees earlier btrfs: send: genericize the backref cache to allow it to be reused btrfs: adapt lru cache to allow for 64 bits keys on 32 bits systems btrfs: send: cache information about created directories btrfs: allow a generation number to be associated with lru cache entries btrfs: add an api to delete a specific entry from the lru cache btrfs: send: use the lru cache to implement the name cache btrfs: send: update size of roots array for backref cache entries btrfs: send: cache utimes operations for directories if possible The following test was run before and after applying the whole patchset, and on a non-debug kernel (Debian's default kernel config): #!/bin/bash MNT=/mnt/sdi DEV=/dev/sdi mkfs.btrfs -f $DEV > /dev/null mount $DEV $MNT mkdir $MNT/A for ((i = 1; i <= 20000; i++)); do echo -n > $MNT/A/file_$i done btrfs subvolume snapshot -r $MNT $MNT/snap1 mkdir $MNT/B for ((i = 20000; i <= 40000; i++)); do echo -n > $MNT/B/file_$i done mv $MNT/A/file_* $MNT/B/ btrfs subvolume snapshot -r $MNT $MNT/snap2 start=$(date +%s%N) btrfs send -p $MNT/snap1 $MNT/snap2 > /dev/null end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "Incremental send took $dur milliseconds" umount $MNT Before the whole patchset: 18408 milliseconds After the whole patchset: 1942 milliseconds (9.5x speedup) Using 60000 files instead of 40000: Before the whole patchset: 39764 milliseconds After the whole patchset: 3076 milliseconds (12.9x speedup) Using 20000 files instead of 40000: Before the whole patchset: 5072 milliseconds After the whole patchset: 916 milliseconds (5.5x speedup) Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: send: update size of roots array for backref cache entries	Filipe Manana
	Currently we limit the size of the roots array, for backref cache entries, to 12 elements. This is because that number is enough for most cases and to make the backref cache entry size to be exactly 128 bytes, so that memory is allocated from the kmalloc-128 slab and no space is wasted. However recent changes in the series refactored the backref cache to be more generic and allow it to be reused for other purposes, which resulted in increasing the size of the embedded structure btrfs_lru_cache_entry in order to allow for supporting inode numbers as keys on 32 bits system and allow multiple generations per key. This resulted in increasing the size of struct backref_cache_entry from 128 bytes to 152 bytes. Since the cache entries are allocated with kmalloc(), it means we end up using the slab kmalloc-192, so we end up wasting 40 bytes of memory. So bump the size of the roots array from 12 elements to 17 elements, so we end up using 192 bytes for each backref cache entry. This patch is part of a larger patchset and the changelog of the last patch in the series contains a sample performance test and results. The patches that comprise the patchset are the following: btrfs: send: directly return from did_overwrite_ref() and simplify it btrfs: send: avoid unnecessary generation search at did_overwrite_ref() btrfs: send: directly return from will_overwrite_ref() and simplify it btrfs: send: avoid extra b+tree searches when checking reference overrides btrfs: send: remove send_progress argument from can_rmdir() btrfs: send: avoid duplicated orphan dir allocation and initialization btrfs: send: avoid unnecessary orphan dir rbtree search at can_rmdir() btrfs: send: reduce searches on parent root when checking if dir can be removed btrfs: send: iterate waiting dir move rbtree only once when processing refs btrfs: send: initialize all the red black trees earlier btrfs: send: genericize the backref cache to allow it to be reused btrfs: adapt lru cache to allow for 64 bits keys on 32 bits systems btrfs: send: cache information about created directories btrfs: allow a generation number to be associated with lru cache entries btrfs: add an api to delete a specific entry from the lru cache btrfs: send: use the lru cache to implement the name cache btrfs: send: update size of roots array for backref cache entries btrfs: send: cache utimes operations for directories if possible Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	btrfs: send: use the lru cache to implement the name cache	Filipe Manana
	The name cache in send is basically a lru cache implemented with a radix tree and linked lists, very similar to the lru cache module which is used for the send backref cache and the cache of previously created directories during a send operation. So remove all the custom caching code for the name cache and make it use the lru cache instead. One particular detail to note is that the current cache behaves a bit differently when it comes to eviction of entries. Namely when after inserting a new name in the cache, if the cache now has 256 entries, we evict the last 128 LRU entries. The lru_cache.{c,h} module behaves a bit differently in that once we reach the cache limit, we evict a single LRU entry. In practice this doesn't make much difference, but it's actually better to evict just one entry instead of half of the entries, as there's always a chance we will need a name stored in one of that last 128 removed entries. This patch is part of a larger patchset and the changelog of the last patch in the series contains a sample performance test and results. The patches that comprise the patchset are the following: btrfs: send: directly return from did_overwrite_ref() and simplify it btrfs: send: avoid unnecessary generation search at did_overwrite_ref() btrfs: send: directly return from will_overwrite_ref() and simplify it btrfs: send: avoid extra b+tree searches when checking reference overrides btrfs: send: remove send_progress argument from can_rmdir() btrfs: send: avoid duplicated orphan dir allocation and initialization btrfs: send: avoid unnecessary orphan dir rbtree search at can_rmdir() btrfs: send: reduce searches on parent root when checking if dir can be removed btrfs: send: iterate waiting dir move rbtree only once when processing refs btrfs: send: initialize all the red black trees earlier btrfs: send: genericize the backref cache to allow it to be reused btrfs: adapt lru cache to allow for 64 bits keys on 32 bits systems btrfs: send: cache information about created directories btrfs: allow a generation number to be associated with lru cache entries btrfs: add an api to delete a specific entry from the lru cache btrfs: send: use the lru cache to implement the name cache btrfs: send: update size of roots array for backref cache entries btrfs: send: cache utimes operations for directories if possible Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15	thermal/drivers/st: Remove syscfg based driver	Alain Volmat
	The syscfg based thermal driver is only supporting STiH415 STiH416 and STiD127 platforms which are all no more supported. We can thus safely remove this driver since the remaining STi platform STiH407/STiH410 and STiH418 are all using the memmap based thermal driver. Signed-off-by: Alain Volmat <avolmat@me.com> Link: https://lore.kernel.org/r/20230209091659.1409-7-avolmat@me.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	thermal: Remove core header inclusion from drivers	Daniel Lezcano
	As the name states "thermal_core.h" is the header file for the core components of the thermal framework. Too many drivers are including it. Hopefully the recent cleanups helped to self encapsulate the code a bit more and prevented the drivers to need this header. Remove this inclusion in every place where it is possible. Some other drivers did a confusion with the core header and the one exported in linux/thermal.h. They include the former instead of the latter. The changes also fix this. The tegra/soctherm driver still remains as it uses an internal function which need to be replaced. The Intel HFI driver uses the netlink internal framework core and should be changed to prevent to deal with the internals. No functional changes intended. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> # armada_thermal.c Reviewed-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> # uniphier_thermal.c Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> # rcar_gen3_thermal.c Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> # amlogic_thermal.c Acked-by: Florian Fainelli <f.fainelli@gmail.com> # bcm2835_thermal.c Acked-by: Thierry Reding <treding@nvidia.com> # tegra30-tsensor.c Link: https://lore.kernel.org/r/20230206153432.1017282-1-daniel.lezcano@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	tools/lib/thermal: Fix include path for libnl3 in pkg-config file.	Vibhav Pant
	Fixes pkg-config returning malformed CFLAGS for libthermal. Signed-off-by: Vibhav Pant <vibhavp@gmail.com> Link: https://lore.kernel.org/r/20230211081935.62690-1-vibhavp@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	thermal/drivers/hisi: Drop second sensor hi3660	Yongqin Liu
	The commit 74c8e6bffbe1 ("driver core: Add __alloc_size hint to devm allocators") exposes a panic "BRK handler: Fatal exception" on the hi3660_thermal_probe funciton. This is because the function allocates memory for only one sensors array entry, but tries to fill up a second one. Fix this by removing the unneeded second access. Fixes: 7d3a2a2bbadb ("thermal/drivers/hisi: Fix number of sensors on hi3660") Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org> Link: https://lore.kernel.org/linux-mm/20221101223321.1326815-5-keescook@chromium.org/ Link: https://lore.kernel.org/r/20230210141507.71014-1-yongqin.liu@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	thermal/drivers/rcar_gen3_thermal: Fix device initialization	Niklas Söderlund
	The thermal zone is registered before the device is register and the thermal coefficients are calculated, providing a window for very incorrect readings. The reason why the zone was register before the device was fully initialized was that the presence of the set_trips() callback is used to determine if the driver supports interrupt or not, as it is not defined if the device is incapable of interrupts. Fix this by using the operations structure in the private data instead of the zone to determine if interrupts are available or not, and initialize the device before registering the zone. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20230208190333.3159879-4-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	thermal/drivers/rcar_gen3_thermal: Create device local ops struct	Niklas Söderlund
	The callback operations are modified on a driver global level. If one device tree description do not define interrupts, the set_trips() operation was disabled globally for all users of the driver. Fix this by creating a device local copy of the operations structure and modify the copy depending on what the device can do. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20230208190333.3159879-3-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	thermal/drivers/rcar_gen3_thermal: Do not call set_trips() when resuming	Niklas Söderlund
	There is no need to explicitly call set_trips() when resuming from suspend. The thermal framework calls thermal_zone_device_update() that restores the trip points. Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/20230208190333.3159879-2-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	thermal/drivers/rcar_gen3: Add support for R-Car V4H	Geert Uytterhoeven
	Add support for the Thermal Sensor/Chip Internal Voltage Monitor/Core Voltage Monitor (THS/CIVM/CVM) on the Renesas R-Car V4H (R8A779G0) SoC. According to the R-Car V4H Hardware User's Manual Rev. 0.70, the (preliminary) conversion formula for the thermal sensor is the same as for most other R-Car Gen3 and Gen4 SoCs, while the (preliminary) conversion formula for the chip internal voltage monitor differs. As the driver only uses the former, no further changes are needed. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/852048eb5f4cc001be7a97744f4c5caea912d071.1675958665.git.geert+renesas@glider.be Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	dt-bindings: thermal: rcar-gen3-thermal: Add r8a779g0 support	Geert Uytterhoeven
	Document support for the Thermal Sensor/Chip Internal Voltage Monitor/Core Voltage Monitor (THS/CIVM/CVM) on the Renesas R-Car V4H (R8A779G0) SoC. Unlike most other R-Car Gen3 and Gen4 SoCs, it has 4 instead of 3 sensors, so increase the maximum number of reg tuples. Just like other R-Car Gen4 SoCs, interrupts are not routed to the INTC-AP but to the ECM. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/11f740522ec479011cc8eef6bb450603be394def.1675958665.git.geert+renesas@glider.be Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver	Balsam CHIHI
	The Low Voltage Thermal Sensor (LVTS) is a multiple sensors, multi controllers contained in a thermal domain. A thermal domains can be the MCU or the AP. Each thermal domains contain up to seven controllers, each thermal controller handle up to four thermal sensors. The LVTS has two Finite State Machines (FSM), one to handle the functionin temperatures range like hot or cold temperature and another one to handle monitoring trip point. The FSM notifies via interrupts when a trip point is crossed. The interrupt is managed at the thermal controller level, so when an interrupt occurs, the driver has to find out which sensor triggered such an interrupt. The sampling of the thermal can be filtered or immediate. For the former, the LVTS measures several points and applies a low pass filter. Signed-off-by: Balsam CHIHI <bchihi@baylibre.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> On MT8195 Tomato Chromebook: Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20230209105628.50294-5-bchihi@baylibre.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	dt-bindings: thermal: mediatek: Add LVTS thermal controllers	Balsam CHIHI
	Add LVTS thermal controllers dt-binding definition for mt8192 and mt8195. Signed-off-by: Balsam CHIHI <bchihi@baylibre.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230209105628.50294-3-bchihi@baylibre.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	thermal/drivers/mediatek: Relocate driver to mediatek folder	Balsam CHIHI
	Add MediaTek proprietary folder to upstream more thermal zone and cooler drivers, relocate the original thermal controller driver to it, and rename it as "auxadc_thermal.c" to show its purpose more clearly. Signed-off-by: Balsam CHIHI <bchihi@baylibre.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20230209105628.50294-2-bchihi@baylibre.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	tools/lib/thermal: Fix thermal_sampling_exit()	Vincent Guittot
	thermal_sampling_init() suscribes to THERMAL_GENL_SAMPLING_GROUP_NAME group so thermal_sampling_exit() should unsubscribe from the same group. Fixes: 47c4b0de080a ("tools/lib/thermal: Add a thermal library") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20230202102812.453357-1-vincent.guittot@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	Merge branch 'thermal-intel'	Rafael J. Wysocki
	Merge thermal control changes related to Intel platforms for 6.3-rc1: - Rework ACPI helper functions for thermal control to retrieve a trip point temperature instead of initializing a trip point objetc (Rafael Wysocki). - Clean up and improve the int340x thermal driver ((Rafael Wysocki). - Simplify and clean up the intel_pch thermal driver ((Rafael Wysocki). - Fix the Intel powerclamp thermal driver and make it use the common idle injection framework (Srinivas Pandruvada). - Add two module parameters, cpumask and max_idle, to the Intel powerclamp thermal driver to allow it to affect only a specific subset of CPUs instead of all of them (Srinivas Pandruvada). - Make the Intel quark_dts thermal driver Use generic trip point objects instead of its own trip point representation (Daniel Lezcano). - Add toctree entry for thermal documents and fix two issues in the Intel powerclamp driver documentation (Bagas Sanjaya). * thermal-intel: (25 commits) Documentation: powerclamp: Fix numbered lists formatting Documentation: powerclamp: Escape wildcard in cpumask description Documentation: admin-guide: Add toctree entry for thermal docs thermal: intel: powerclamp: Add two module parameters Documentation: admin-guide: Move intel_powerclamp documentation thermal: intel: powerclamp: Fix duration module parameter thermal: intel: powerclamp: Return last requested state as cur_state thermal: intel: quark_dts: Use generic trip points thermal: intel: powerclamp: Use powercap idle-inject feature powercap: idle_inject: Add update callback powercap: idle_inject: Export symbols thermal: intel: powerclamp: Fix cur_state for multi package system thermal: intel: intel_pch: Drop struct board_info thermal: intel: intel_pch: Rename board ID symbols thermal: intel: intel_pch: Fold suspend and resume routines into their callers thermal: intel: intel_pch: Fold two functions into their callers thermal: intel: intel_pch: Eliminate device operations object thermal: intel: intel_pch: Rename device operations callbacks thermal: intel: intel_pch: Eliminate redundant return pointers thermal: intel: intel_pch: Make pch_wpt_add_acpi_psv_trip() return int ...
2023-02-15	Merge branch 'thermal-core'	Rafael J. Wysocki
	Merge thermal control core changes for 6.3-rc1: - Clean up thermal device unregistration code (Viresh Kumar). - Fix and clean up thermal control core initialization error code paths (Daniel Lezcano). - Relocate the trip points handling code into a separate file (Daniel Lezcano). - Make the thermal core fail registration of thermal zones and cooling devices if the thermal class has not been registered (Rafael Wysocki). - Make the core thermal control code use sysfs_emit_at() instead of scnprintf() where applicable (ye xingchen). * thermal-core: thermal: core: Use sysfs_emit_at() instead of scnprintf() thermal: Fail object registration if thermal class is not registered thermal/core: Move the thermal trip code to a dedicated file thermal/core: Remove unneeded ida_destroy() thermal/core: Fix unregistering netlink at thermal init time thermal: core: Use device_unregister() instead of device_del/put() thermal: core: Move cdev cleanup to thermal_release()
2023-02-15	gpio: mlxbf2: select GPIOLIB_IRQCHIP	Linus Walleij
	This driver uncondictionally uses the GPIOLIB_IRQCHIP so select it. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2023-02-15	Merge branches 'pm-cpuidle', 'pm-core' and 'pm-sleep'	Rafael J. Wysocki
	Merge cpuidle updates, PM core updates and changes related to system sleep handling for 6.3-rc1: - Make the TEO cpuidle governor check CPU utilization in order to refine idle state selection (Kajetan Puchalski). - Make Kconfig select the haltpoll cpuidle governor when the haltpoll cpuidle driver is selected and replace a default_idle() call in that driver with arch_cpu_idle() which allows MWAIT to be used (Li RongQing). - Add Emerald Rapids Xeon support to the intel_idle driver (Artem Bityutskiy). - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to avoid randconfig build failures (Arnd Bergmann). - Make kobj_type structures used in the cpuidle sysfs interface constant (Thomas Weißschuh). - Make the cpuidle driver registration code update microsecond values of idle state parameters in accordance with their nanosecond values if they are provided (Rafael Wysocki). - Make the PSCI cpuidle driver prevent topology CPUs from being suspended on PREEMPT_RT (Krzysztof Kozlowski). - Document that pm_runtime_force_suspend() cannot be used with DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald). - Add EXPORT macros for exporting PM functions from drivers (Richard Fitzgerald). - Drop "select SRCU" from system sleep Kconfig (Paul E. McKenney). - Remove /** from non-kernel-doc comments in hibernation code (Randy Dunlap). * pm-cpuidle: cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT cpuidle: driver: Update microsecond values of state parameters as needed cpuidle: sysfs: make kobj_type structures constant cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies intel_idle: add Emerald Rapids Xeon support cpuidle-haltpoll: Replace default_idle() with arch_cpu_idle() cpuidle-haltpoll: select haltpoll governor cpuidle: teo: Introduce util-awareness cpuidle: teo: Optionally skip polling states in teo_find_shallower_state() * pm-core: PM: Add EXPORT macros for exporting PM functions PM: runtime: Document that force_suspend() is incompatible with SMART_SUSPEND * pm-sleep: PM: sleep: Remove "select SRCU" PM: hibernate: swap: don't use /** for non-kernel-doc comments
2023-02-15	gpiolib: acpi: Add a ignore wakeup quirk for Clevo NH5xAx	Werner Sembach
	The commit 1796f808e4bb ("HID: i2c-hid: acpi: Stop setting wakeup_capable") changed the policy such that I2C touchpads may be able to wake up the system by default if the system is configured as such. However for some devices there is a bug, that is causing the touchpad to instantly wake up the device again once it gets deactivated. The root cause is still under investigation (see Link tag). To workaround this problem for the time being, introduce a quirk for this model that will prevent the wakeup capability for being set for GPIO 16. Fixes: 1796f808e4bb ("HID: i2c-hid: acpi: Stop setting wakeup_capable") Link: https://lore.kernel.org/linux-acpi/20230210164636.628462-1-wse@tuxedocomputers.com/ Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Cc: <stable@vger.kernel.org> # v6.1+ Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2023-02-15	Documentation: amd-pstate: disambiguate user space sections	Bagas Sanjaya
	kernel test robot reported htmldocs warning: Documentation/admin-guide/pm/amd-pstate.rst:343: WARNING: duplicate label admin-guide/pm/amd-pstate:user space interface in ``sysfs``, other instance in Documentation/admin-guide/pm/amd-pstate.rst The documentation contains two sections with the same "User Space Interface in ``sysfs``" title. The first one deals with per-policy sysfs and the second one is about general attributes (currently only global attributes are documented). Disambiguate title text of both sections to fix the warning. Link: https://lore.kernel.org/linux-doc/202302151041.0SWs1RHK-lkp@intel.com/ Fixes: b9e6a2d47b2565 ("Documentation: amd-pstate: introduce new global sysfs attributes") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Wyes Karny <wyes.karny@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ	Wyes Karny
	`amd_pstate_set_epp` function uses `cppc_req_cached` and `epp` variable to update the MSR_AMD_CPPC_REQ register for AMD MSR systems. The recent commit 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use") changed the sequence of updating cppc_req_cached and writing the MSR_AMD_CPPC_REQ. Therefore while switching from powersave to performance governor and vice-versa in active mode MSR_AMD_CPPC_REQ is set with the previous cached value. To fix this: first update the `cppc_req_cached` variable and then call `amd_pstate_set_epp` function. - Before commit 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use"): With powersave governor: [ 1.652743] amd_pstate_epp_init: writing to cppc_req_cached = 0x1eff [ 1.652744] amd_pstate_set_epp: writing cppc_req_cached = 0x1eff [ 1.652746] amd_pstate_set_epp: writing min_perf = 30, des_perf = 0, max_perf = 255, epp = 0 Changing to performance governor: [ 300.493842] amd_pstate_epp_init: writing to cppc_req_cached = 0xffff [ 300.493846] amd_pstate_set_epp: writing cppc_req_cached = 0xffff [ 300.493847] amd_pstate_set_epp: writing min_perf = 255, des_perf = 0, max_perf = 255, epp = 0 - After commit 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use"): With powersave governor: [ 1.646037] amd_pstate_set_epp: writing cppc_req_cached = 0xffff [ 1.646038] amd_pstate_set_epp: writing min_perf = 255, des_perf = 0, max_perf = 255, epp = 0 [ 1.646042] amd_pstate_epp_init: writing to cppc_req_cached = 0x1eff Changing to performance governor: [ 687.117401] amd_pstate_set_epp: writing cppc_req_cached = 0x1eff [ 687.117405] amd_pstate_set_epp: writing min_perf = 30, des_perf = 0, max_perf = 255, epp = 0 [ 687.117419] amd_pstate_epp_init: writing to cppc_req_cached = 0xffff - After this fix: With powersave governor: [ 2.525717] amd_pstate_epp_init: writing to cppc_req_cached = 0x1eff [ 2.525720] amd_pstate_set_epp: writing cppc_req_cached = 0x1eff [ 2.525722] amd_pstate_set_epp: writing min_perf = 30, des_perf = 0, max_perf = 255, epp = 0 Changing to performance governor: [ 3440.152468] amd_pstate_epp_init: writing to cppc_req_cached = 0xffff [ 3440.152473] amd_pstate_set_epp: writing cppc_req_cached = 0xffff [ 3440.152474] amd_pstate_set_epp: writing min_perf = 255, des_perf = 0, max_perf = 255, epp = 0 Fixes: 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use") Signed-off-by: Wyes Karny <wyes.karny@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15	gpio: vf610: make irq_chip immutable	Alexander Stein
	Since recently, the kernel is nagging about mutable irq_chips: "not an immutable chip, please consider fixing it!" Drop the unneeded copy, flag it as IRQCHIP_IMMUTABLE, add the new helper functions and call the appropriate gpiolib functions. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2023-02-15	Merge branches 'acpi-video', 'acpi-misc' and 'acpi-docs'	Rafael J. Wysocki
	Merge ACPI backlight driver changes, miscellaneous ACPI-related changes and ACPI-related documentation updates for 6.3-rc1: - Fix Lenovo Ideapad Z570 DMI match in the ACPI backlight driver (Hans de Goede). - Silence missing prototype warnings in some places in the ACPI-related code (Ammar Faizi). - Make kobj_type structures used in the ACPI code constant (Thomas Weißschuh). - Correct spelling in firmware-guide/ACPI (Randy Dunlap). - Clarify the meaning of Explicit and Implicit in the _DSD GPIO properties documentation (Andy Shevchenko). * acpi-video: ACPI: video: Fix Lenovo Ideapad Z570 DMI match * acpi-misc: ACPI: make kobj_type structures constant ACPI: Silence missing prototype warnings * acpi-docs: Documentation: firmware-guide: gpio-properties: Clarify Explicit and Implicit Documentation: firmware-guide/ACPI: correct spelling
2023-02-15	Merge branches 'acpi-resource', 'acpi-pmic', 'acpi-battery' and 'acpi-apei'	Rafael J. Wysocki
	Merge ACPI resources handling changes, ACPI PMIC and battery drivers changes and ACPI APEI changes for 6.3-rc1: - Add two more entries to the ACPI IRQ override quirk list (Adam Niederer, Werner Sembach). - Add a pmic_i2c_address entry for Intel Bay Trail Crystal Cove to allow intel_soc_pmic_exec_mipi_pmic_seq_element() to be used with the Bay Trail Crystal Cove PMIC OpRegion driver (Hans de Goede). - Add comments with DSDT power OpRegion field names to the ACPI PMIC driver (Hans de Goede). - Fix string termination handling in the ACPI battery driver (Armin Wolf). - Limit error type to 32-bit width in the ACPI APEI error injection code (Shuai Xue). * acpi-resource: ACPI: resource: Do IRQ override on all TongFang GMxRGxx ACPI: resource: Add IRQ overrides for MAINGEAR Vector Pro 2 models * acpi-pmic: ACPI: PMIC: Add comments with DSDT power opregion field names ACPI: PMIC: Add pmic_i2c_address to BYT Crystal Cove support * acpi-battery: ACPI: battery: Increase maximum string length ACPI: battery: Fix buffer overread if not NUL-terminated ACPI: battery: Fix missing NUL-termination with large strings * acpi-apei: ACPI: APEI: EINJ: Limit error type to 32-bit width
2023-02-15	Merge branches 'acpi-processor', 'acpi-tables', 'acpi-pnp' and ↵	Rafael J. Wysocki
	'acpi-maintainers' Merge ACPI processor driver changes, ACPI table parser changes, ACPI device enumeration changes related to PNP and a MAINTAINERS update related to ACPI for 6.3-rc1: - Drop an unnecessary (void ) conversion from the ACPI processor driver (Zhou jie). - Modify the ACPI processor performance library code to use the "no limit" frequency QoS as appropriate and adjust the intel_pstate driver accordingly (Rafael Wysocki). - Add support for NBFT to the ACPI table parser (Stuart Hayes). - Introduce list of known non-PNP devices to avoid enumerating some of them as PNP devices (Rafael Wysocki). - Add x86 ACPI paths to the ACPI entry in MAINTAINERS to allow scripts to report the actual maintainers information (Rafael Wysocki). acpi-processor: cpufreq: intel_pstate: Drop ACPI _PSS states table patching ACPI: processor: perflib: Avoid updating frequency QoS unnecessarily ACPI: processor: perflib: Use the "no limit" frequency QoS ACPI: processor: idle: Drop unnecessary (void ) conversion acpi-tables: ACPI: tables: Add support for NBFT * acpi-pnp: ACPI: PNP: Introduce list of known non-PNP devices * acpi-maintainers: MAINTAINERS: Add x86 ACPI paths to the ACPI entry
2023-02-15	Merge branch 'acpica'	Rafael J. Wysocki
	Merge ACPICA changes for 6.3-rc1: - Drop port I/O validation for some regions to avoid AML failures due to rejections of legitimate port I/O writes (Mario Limonciello). - Constify acpi_get_handle() pathname argument to allow its callers to pass conts pathnames to it (Sakari Ailus). - Prevent acpi_ns_simple_repair() from crashing in some cases when AE_AML_NO_RETURN_VALUE should be returned (Daniil Tatianin). - Fix typo in CDAT DSMAS struct definition (Lukas Wunner). * acpica: ACPICA: Fix typo in CDAT DSMAS struct definition ACPICA: nsrepair: handle cases without a return value correctly ACPICA: Constify pathname argument for acpi_get_handle() ACPICA: Drop port I/O validation for some regions
2023-02-15	Merge tag 'qcom-arm64-defconfig-for-6.3-3' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/defconfig Few more Qualcomm ARM64 defconfig updates for v6.3 This enables the drivers needed to support USB Type-C based external display on the SC8280XP laptops. It also enables a couple of core drivers for the Qualcomm SA8775P platform. * tag 'qcom-arm64-defconfig-for-6.3-3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: defconfig: enable drivers required by the Qualcomm SA8775P platform arm64: defconfig: Enable DisplayPort on SC8280XP laptops Link: https://lore.kernel.org/r/20230215051757.1166709-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-15	Merge tag 'qcom-arm64-for-6.3-3' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/dt Last set of Qualcomm ARM64 DTS updates for v6.3 This introduces additional DisplayPort controllers and pmic_glink on SC8280XP (8cx Gen3), which provides support for USB Type-C-based displays on the the Lenovo ThinkPad X13s and the compute reference device. The pmic_glink also provides battery and power supply status. Interrupt-parents are corrected across the SC8280XP PMICs, to allow non-Linux OSs to properly handle interrupts in the various blocks therein. It cleans up the SM8350 base dtsi and introduces GPU support on this platform, as well as enable this for the Hardware Development Kit (HDK). It enables i2c busses on the Fairphone FP4 Lastly it aligns glink node names with bindings across a few platforms, and corrects the compatible for the PON block in the pmk8350 PMIC. * tag 'qcom-arm64-for-6.3-3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: msm8996: align RPM G-Link clock-controller node with bindings arm64: dts: qcom: qcs404: align RPM G-Link node with bindings arm64: dts: qcom: ipq6018: align RPM G-Link node with bindings arm64: dts: qcom: sm8550: remove invalid interconnect property from cryptobam arm64: dts: qcom: sc7280: Adjust zombie PWM frequency arm64: dts: qcom: sc8280xp-pmics: Specify interrupt parent explicitly arm64: dts: qcom: sm7225-fairphone-fp4: enable remaining i2c busses arm64: dts: qcom: sm7225-fairphone-fp4: move status property down arm64: dts: qcom: pmk8350: Use the correct PON compatible arm64: dts: qcom: sc8280xp-x13s: Enable external display arm64: dts: qcom: sc8280xp-crd: Introduce pmic_glink arm64: dts: qcom: sc8280xp: Add USB-C-related DP blocks arm64: dts: qcom: sm8350-hdk: enable GPU arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU nodes arm64: dts: qcom: sm8350: finish reordering nodes arm64: dts: qcom: sm8350: move more nodes to correct place arm64: dts: qcom: sm8350: reorder device nodes Link: https://lore.kernel.org/r/20230215051530.1165953-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-15	gpiolib: acpi: remove redundant declaration	Raag Jadav
	Remove acpi_device declaration, as it is no longer needed. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2023-02-15	perf/x86: Refuse to export capabilities for hybrid PMUs	Sean Christopherson
	Now that KVM disables vPMU support on hybrid CPUs, WARN and return zeros if perf_get_x86_pmu_capability() is invoked on a hybrid CPU. The helper doesn't provide an accurate accounting of the PMU capabilities for hybrid CPUs and needs to be enhanced if KVM, or anything else outside of perf, wants to act on the PMU capabilities. Cc: stable@vger.kernel.org Cc: Andrew Cooper <Andrew.Cooper3@citrix.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230208204230.1360502-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-15	KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)	Sean Christopherson
	Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or gains a mechanism to let userspace opt-in to the dangers of exposing a hybrid vPMU to KVM guests. Virtualizing a hybrid PMU, or at least part of a hybrid PMU, is possible, but it requires careful, deliberate configuration from userspace. E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to prevent migrating a vCPU between a big core and a little core, userspace must enumerate a reasonable topology to the guest, and guest CPUID must be curated per vCPU to enumerate accurate vPMU capabilities. The last point is especially problematic, as KVM doesn't control which pCPU it runs on when enumerating KVM's vPMU capabilities to userspace, i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form. Alternatively, userspace could enable vPMU support by enumerating the set of features that are common and coherent across all cores, e.g. by filtering PMU events and restricting guest capabilities. But again, that requires userspace to take action far beyond reflecting KVM's supported feature set into the guest. For now, simply disable vPMU support on hybrid CPUs to avoid inducing seemingly random #GPs in guests, and punt support for hybrid CPUs to a future enabling effort. Reported-by: Jianfeng Gao <jianfeng.gao@intel.com> Cc: stable@vger.kernel.org Cc: Andrew Cooper <Andrew.Cooper3@citrix.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230208204230.1360502-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-15	x86/build: Make 64-bit defconfig the default	Arnd Bergmann
	Running 'make ARCH=x86 defconfig' on anything other than an x86_64 machine currently results in a 32-bit build, which is rarely what anyone wants these days. Change the default so that the 64-bit config gets used unless the user asks for i386_defconfig, uses ARCH=i386 or runs on a system that "uname -m" identifies as i386/i486/i586/i686. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230215091706.1623070-1-arnd@kernel.org
2023-02-15	sched/psi: Fix use-after-free in ep_remove_wait_queue()	Munehisa Kamata
	If a non-root cgroup gets removed when there is a thread that registered trigger and is polling on a pressure file within the cgroup, the polling waitqueue gets freed in the following path: do_rmdir cgroup_rmdir kernfs_drain_open_files cgroup_file_release cgroup_pressure_release psi_trigger_destroy However, the polling thread still has a reference to the pressure file and will access the freed waitqueue when the file is closed or upon exit: fput ep_eventpoll_release ep_free ep_remove_wait_queue remove_wait_queue This results in use-after-free as pasted below. The fundamental problem here is that cgroup_file_release() (and consequently waitqueue's lifetime) is not tied to the file's real lifetime. Using wake_up_pollfree() here might be less than ideal, but it is in line with the comment at commit 42288cb44c4b ("wait: add wake_up_pollfree()") since the waitqueue's lifetime is not tied to file's one and can be considered as another special case. While this would be fixable by somehow making cgroup_file_release() be tied to the fput(), it would require sizable refactoring at cgroups or higher layer which might be more justifiable if we identify more cases like this. BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x60/0xc0 Write of size 4 at addr ffff88810e625328 by task a.out/4404 CPU: 19 PID: 4404 Comm: a.out Not tainted 6.2.0-rc6 #38 Hardware name: Amazon EC2 c5a.8xlarge/, BIOS 1.0 10/16/2017 Call Trace: <TASK> dump_stack_lvl+0x73/0xa0 print_report+0x16c/0x4e0 kasan_report+0xc3/0xf0 kasan_check_range+0x2d2/0x310 _raw_spin_lock_irqsave+0x60/0xc0 remove_wait_queue+0x1a/0xa0 ep_free+0x12c/0x170 ep_eventpoll_release+0x26/0x30 __fput+0x202/0x400 task_work_run+0x11d/0x170 do_exit+0x495/0x1130 do_group_exit+0x100/0x100 get_signal+0xd67/0xde0 arch_do_signal_or_restart+0x2a/0x2b0 exit_to_user_mode_prepare+0x94/0x100 syscall_exit_to_user_mode+0x20/0x40 do_syscall_64+0x52/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> Allocated by task 4404: kasan_set_track+0x3d/0x60 __kasan_kmalloc+0x85/0x90 psi_trigger_create+0x113/0x3e0 pressure_write+0x146/0x2e0 cgroup_file_write+0x11c/0x250 kernfs_fop_write_iter+0x186/0x220 vfs_write+0x3d8/0x5c0 ksys_write+0x90/0x110 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 4407: kasan_set_track+0x3d/0x60 kasan_save_free_info+0x27/0x40 ____kasan_slab_free+0x11d/0x170 slab_free_freelist_hook+0x87/0x150 __kmem_cache_free+0xcb/0x180 psi_trigger_destroy+0x2e8/0x310 cgroup_file_release+0x4f/0xb0 kernfs_drain_open_files+0x165/0x1f0 kernfs_drain+0x162/0x1a0 __kernfs_remove+0x1fb/0x310 kernfs_remove_by_name_ns+0x95/0xe0 cgroup_addrm_files+0x67f/0x700 cgroup_destroy_locked+0x283/0x3c0 cgroup_rmdir+0x29/0x100 kernfs_iop_rmdir+0xd1/0x140 vfs_rmdir+0xfe/0x240 do_rmdir+0x13d/0x280 __x64_sys_rmdir+0x2c/0x30 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 0e94682b73bf ("psi: introduce psi monitor") Signed-off-by: Munehisa Kamata <kamatam@amazon.com> Signed-off-by: Mengchi Cheng <mengcc@amazon.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/20230106224859.4123476-1-kamatam@amazon.com/ Link: https://lore.kernel.org/r/20230214212705.4058045-1-kamatam@amazon.com
2023-02-15	Documentation/hw-vuln: Fix rST warning	Paolo Bonzini
	The following warning: Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst:92: ERROR: Unexpected indentation. was introduced by commit 493a2c2d23ca. Fix it by placing everything in the same paragraph and also use a monospace font. Fixes: 493a2c2d23ca ("Documentation/hw-vuln: Add documentation for Cross-Thread Return Predictions") Reported-by: Stephen Rothwell <sfr@canb@auug.org.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-15	net: mpls: fix stale pointer if allocation fails during device rename	Jakub Kicinski
	lianhui reports that when MPLS fails to register the sysctl table under new location (during device rename) the old pointers won't get overwritten and may be freed again (double free). Handle this gracefully. The best option would be unregistering the MPLS from the device completely on failure, but unfortunately mpls_ifdown() can fail. So failing fully is also unreliable. Another option is to register the new table first then only remove old one if the new one succeeds. That requires more code, changes order of notifications and two tables may be visible at the same time. sysctl point is not used in the rest of the code - set to NULL on failures and skip unregister if already NULL. Reported-by: lianhui tang <bluetlh@gmail.com> Fixes: 0fae3bf018d9 ("mpls: handle device renames for per-device sysctls") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-15	net/sched: tcindex: search key must be 16 bits	Pedro Tammela
	Syzkaller found an issue where a handle greater than 16 bits would trigger a null-ptr-deref in the imperfect hash area update. general protection fault, probably for non-canonical address 0xdffffc0000000015: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000000a8-0x00000000000000af] CPU: 0 PID: 5070 Comm: syz-executor456 Not tainted 6.2.0-rc7-syzkaller-00112-gc68f345b7c42 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 RIP: 0010:tcindex_set_parms+0x1a6a/0x2990 net/sched/cls_tcindex.c:509 Code: 01 e9 e9 fe ff ff 4c 8b bd 28 fe ff ff e8 0e 57 7d f9 48 8d bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 94 0c 00 00 48 8b 85 f8 fd ff ff 48 8b 9b a8 00 RSP: 0018:ffffc90003d3ef88 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000015 RSI: ffffffff8803a102 RDI: 00000000000000a8 RBP: ffffc90003d3f1d8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88801e2b10a8 R13: dffffc0000000000 R14: 0000000000030000 R15: ffff888017b3be00 FS: 00005555569af300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056041c6d2000 CR3: 000000002bfca000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcindex_change+0x1ea/0x320 net/sched/cls_tcindex.c:572 tc_new_tfilter+0x96e/0x2220 net/sched/cls_api.c:2155 rtnetlink_rcv_msg+0x959/0xca0 net/core/rtnetlink.c:6132 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1942 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 ____sys_sendmsg+0x334/0x8c0 net/socket.c:2476 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530 __sys_sendmmsg+0x18f/0x460 net/socket.c:2616 __do_sys_sendmmsg net/socket.c:2645 [inline] __se_sys_sendmmsg net/socket.c:2642 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2642 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 Fixes: ee059170b1f7 ("net/sched: tcindex: update imperfect hash filters respecting rcu") Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-15	s390/irq,idle: simplify idle check	Heiko Carstens
	Use the per-cpu CIF_ENABLED_WAIT flag to decide if an interrupt occurred while a cpu was idle, instead of checking two conditions within the old psw. Also move clearing of the CIF_ENABLED_WAIT bit to the early interrupt handler, which in turn makes arch_vcpu_is_preempted() also a bit more precise, since the flag is now cleared before interrupt handlers have been called. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-15	s390/processor: add test_and_set_cpu_flag() and test_and_clear_cpu_flag()	Heiko Carstens
	Add test_and_set_cpu_flag() and test_and_clear_cpu_flag() helper functions. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-15	s390/processor: let cpu helper functions return boolean values	Heiko Carstens
	Let cpu helper functions return boolean values. This also allows to make the code a bit simpler by getting rid of the "!!" construct. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-15	s390/kfence: fix page fault reporting	Heiko Carstens
	Baoquan He reported lots of KFENCE reports when /proc/kcore is read, e.g. with crash or even simpler with dd: BUG: KFENCE: invalid read in copy_from_kernel_nofault+0x5e/0x120 Invalid read at 0x00000000f4f5149f: copy_from_kernel_nofault+0x5e/0x120 read_kcore+0x6b2/0x870 proc_reg_read+0x9a/0xf0 vfs_read+0x94/0x270 ksys_read+0x70/0x100 __do_syscall+0x1d0/0x200 system_call+0x82/0xb0 The reason for this is that read_kcore() simply reads memory that might have been unmapped by KFENCE with copy_from_kernel_nofault(). Any fault due to pages being unmapped by KFENCE would be handled gracefully by the fault handler (exception table fixup). However the s390 fault handler first reports the fault, and only afterwards would perform the exception table fixup. Most architectures have this in reversed order, which also avoids the false positive KFENCE reports when an unmapped page is accessed. Therefore change the s390 fault handler so it handles exception table fixups before KFENCE page faults are reported. Reported-by: Baoquan He <bhe@redhat.com> Tested-by: Baoquan He <bhe@redhat.com> Acked-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20230213183858.1473681-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-15	s390/zcrypt: introduce ctfm field in struct CPRBX	Harald Freudenberger
	Modify the CPRBX struct to expose a new field ctfm for use with hardware command filtering within a CEX8 crypto card in CCA coprocessor mode. The field replaces a reserved byte padding field so that the layout of the struct and the size does not change. The new field is used only by user space applications which may use this to expose the HW filtering facilities in the crypto firmware layers. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-14	tipc: fix kernel warning when sending SYN message	Tung Nguyen
	When sending a SYN message, this kernel stack trace is observed: ... [ 13.396352] RIP: 0010:_copy_from_iter+0xb4/0x550 ... [ 13.398494] Call Trace: [ 13.398630] <TASK> [ 13.398630] ? __alloc_skb+0xed/0x1a0 [ 13.398630] tipc_msg_build+0x12c/0x670 [tipc] [ 13.398630] ? shmem_add_to_page_cache.isra.71+0x151/0x290 [ 13.398630] __tipc_sendmsg+0x2d1/0x710 [tipc] [ 13.398630] ? tipc_connect+0x1d9/0x230 [tipc] [ 13.398630] ? __local_bh_enable_ip+0x37/0x80 [ 13.398630] tipc_connect+0x1d9/0x230 [tipc] [ 13.398630] ? __sys_connect+0x9f/0xd0 [ 13.398630] __sys_connect+0x9f/0xd0 [ 13.398630] ? preempt_count_add+0x4d/0xa0 [ 13.398630] ? fpregs_assert_state_consistent+0x22/0x50 [ 13.398630] __x64_sys_connect+0x16/0x20 [ 13.398630] do_syscall_64+0x42/0x90 [ 13.398630] entry_SYSCALL_64_after_hwframe+0x63/0xcd It is because commit a41dad905e5a ("iov_iter: saner checks for attempt to copy to/from iterator") has introduced sanity check for copying from/to iov iterator. Lacking of copy direction from the iterator viewpoint would lead to kernel stack trace like above. This commit fixes this issue by initializing the iov iterator with the correct copy direction when sending SYN or ACK without data. Fixes: f25dcc7687d4 ("tipc: tipc ->sendmsg() conversion") Reported-by: syzbot+d43608d061e8847ec9f3@syzkaller.appspotmail.com Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Link: https://lore.kernel.org/r/20230214012606.5804-1-tung.q.nguyen@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14	igb: Fix PPS input and output using 3rd and 4th SDP	Miroslav Lichvar
	Fix handling of the tsync interrupt to compare the pin number with IGB_N_SDP instead of IGB_N_EXTTS/IGB_N_PEROUT and fix the indexing to the perout array. Fixes: cf99c1dd7b77 ("igb: move PEROUT and EXTTS isr logic to separate functions") Reported-by: Matt Corallo <ntp-lists@mattcorallo.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230213185822.3960072-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14	Merge branch '100GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-02-13 (ice) This series contains updates to ice driver only. Michal fixes check of scheduling node weight and priority to be done against desired value, not current value. Jesse adds setting of all multicast when adding promiscuous mode to resolve traffic being lost due to filter settings. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: fix lost multicast packets in promisc mode ice: Fix check for weight and priority of a scheduling node ==================== Link: https://lore.kernel.org/r/20230213185259.3959224-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14	net: use a bounce buffer for copying skb->mark	Eric Dumazet
	syzbot found arm64 builds would crash in sock_recv_mark() when CONFIG_HARDENED_USERCOPY=y x86 and powerpc are not detecting the issue because they define user_access_begin. This will be handled in a different patch, because a check_object_size() is missing. Only data from skb->cb[] can be copied directly to/from user space, as explained in commit 79a8a642bf05 ("net: Whitelist the skbuff_head_cache "cb" field") syzbot report was: usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_head_cache' (offset 168, size 4)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102 ! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 4410 Comm: syz-executor533 Not tainted 6.2.0-rc7-syzkaller-17907-g2d3827b3f393 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : usercopy_abort+0x90/0x94 mm/usercopy.c:90 lr : usercopy_abort+0x90/0x94 mm/usercopy.c:90 sp : ffff80000fb9b9a0 x29: ffff80000fb9b9b0 x28: ffff0000c6073400 x27: 0000000020001a00 x26: 0000000000000014 x25: ffff80000cf52000 x24: fffffc0000000000 x23: 05ffc00000000200 x22: fffffc000324bf80 x21: ffff0000c92fe1a8 x20: 0000000000000001 x19: 0000000000000004 x18: 0000000000000000 x17: 656a626f2042554c x16: ffff0000c6073dd0 x15: ffff80000dbd2118 x14: ffff0000c6073400 x13: 00000000ffffffff x12: ffff0000c6073400 x11: ff808000081bbb4c x10: 0000000000000000 x9 : 7b0572d7cc0ccf00 x8 : 7b0572d7cc0ccf00 x7 : ffff80000bf650d4 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : 0000000000000000 x2 : ffff0001fefbff08 x1 : 0000000100000000 x0 : 000000000000006c Call trace: usercopy_abort+0x90/0x94 mm/usercopy.c:90 __check_heap_object+0xa8/0x100 mm/slub.c:4761 check_heap_object mm/usercopy.c:196 [inline] __check_object_size+0x208/0x6b8 mm/usercopy.c:251 check_object_size include/linux/thread_info.h:199 [inline] __copy_to_user include/linux/uaccess.h:115 [inline] put_cmsg+0x408/0x464 net/core/scm.c:238 sock_recv_mark net/socket.c:975 [inline] __sock_recv_cmsgs+0x1fc/0x248 net/socket.c:984 sock_recv_cmsgs include/net/sock.h:2728 [inline] packet_recvmsg+0x2d8/0x678 net/packet/af_packet.c:3482 ____sys_recvmsg+0x110/0x3a0 ___sys_recvmsg net/socket.c:2737 [inline] __sys_recvmsg+0x194/0x210 net/socket.c:2767 __do_sys_recvmsg net/socket.c:2777 [inline] __se_sys_recvmsg net/socket.c:2774 [inline] __arm64_sys_recvmsg+0x2c/0x3c net/socket.c:2774 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x64/0x178 arch/arm64/kernel/syscall.c:52 el0_svc_common+0xbc/0x180 arch/arm64/kernel/syscall.c:142 do_el0_svc+0x48/0x110 arch/arm64/kernel/syscall.c:193 el0_svc+0x58/0x14c arch/arm64/kernel/entry-common.c:637 el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 Code: 91388800 aa0903e1 f90003e8 94e6d752 (d4210000) Fixes: 6fd1d51cfa25 ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Erin MacNeil <lnx.erin@gmail.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/r/20230213160059.3829741-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14	drm/vmwgfx: Do not drop the reference to the handle too soon	Zack Rusin
	v3: Fix vmw_user_bo_lookup which was also dropping the gem reference before the kernel was done with buffer depending on userspace doing the right thing. Same bug, different spot. It is possible for userspace to predict the next buffer handle and to destroy the buffer while it's still used by the kernel. Delay dropping the internal reference on the buffers until kernel is done with them. Instead of immediately dropping the gem reference in vmw_user_bo_lookup and vmw_gem_object_create_with_handle let the callers decide when they're ready give the control back to userspace. Also fixes the second usage of vmw_gem_object_create_with_handle in vmwgfx_surface.c which wasn't grabbing an explicit reference to the gem object which could have been destroyed by the userspace on the owning surface at any point. Signed-off-by: Zack Rusin <zackr@vmware.com> Fixes: 8afa13a0583f ("drm/vmwgfx: Implement DRIVER_GEM") Reviewed-by: Martin Krastev <krastevm@vmware.com> Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230211050514.2431155-1-zack@kde.org (cherry picked from commit 9ef8d83e8e25d5f1811b3a38eb1484f85f64296c) Cc: <stable@vger.kernel.org> # v5.17+