linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-08-17	genpd: imx: scu-pd: do not power off console if no_console_suspend	Peng Fan
	Do not power off console if no_console_suspend, this will leave the serial device's corresponding PM domain on during system suspend, which is useful for debug system suspend. Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-08-17	genpd: imx: scu-pd: add more PDs	Peng Fan
	Add more PDs for i.MX8QM and i.MX8DXL, including dma-ch, esai, gpu1, v2x-mu, seco-mu, hdmi, img and etc. Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-08-17	genpd: imx: scu-pd: enlarge PD range	Peng Fan
	There are 5 LPI2C, 5 LPUART and 32 DMA0 Channel resources per imx_rsrc.h, and they are in i.MX8QM, so enlarge the PD range for them. Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-08-17	genpd: imx: relocate scu-pd under genpd	Peng Fan
	Move scu-pd driver under genpd directory where the driver should be. Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-08-17	cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases	Rafael J. Wysocki
	Because the cost of calling tick_nohz_get_sleep_length() may increase in the future, reorder the code in menu_select() so it first uses the statistics to determine the expected idle duration. If that value is higher than RESIDENCY_THRESHOLD_NS, tick_nohz_get_sleep_length() will be called to obtain the time till the closest timer and refine the idle duration prediction if necessary. This causes the governor to always take the full overhead of get_typical_interval() with the assumption that the cost will be amortized by skipping the tick_nohz_get_sleep_length() call in the cases when the predicted idle duration is relatively very small. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Doug Smythies <dsmythies@telus.net>
2023-08-17	ACPI: thermal: Eliminate code duplication from acpi_thermal_notify()	Rafael J. Wysocki
	Move the acpi_bus_generate_netlink_event() invocation into acpi_thermal_trips_update() which allows the code duplication in acpi_thermal_notify() to be cleaned up, but for this purpose the event value needs to be passed to acpi_thermal_trips_update() and from there to acpi_thermal_adjust_thermal_zone() which has to determine the flag value for __acpi_thermal_trips_update() by itself. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-17	ACPI: thermal: Drop unnecessary thermal zone callbacks	Rafael J. Wysocki
	Drop the .get_trip_type(), .get_trip_temp() and .get_crit_temp() thermal zone callbacks that are not necessary any more from the ACPI thermal driver along with the corresponding callback functions. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-17	ACPI: thermal: Rework thermal_get_trend()	Rafael J. Wysocki
	Rework the ACPI thermal driver's .get_trend() callback function, thermal_get_trend(), so that it does not call thermal_get_trip_type() and thermal_get_trip_temp() which are going to be dropped. This reduces the overhead of the function too, because it will always carry out a trip point lookup once after the change. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-17	ACPI: thermal: Use trip point table to register thermal zones	Rafael J. Wysocki
	Make the ACPI thermal driver use thermal_zone_device_register_with_trips() to register its thermal zones. For this purpose, make it create a trip point table that will be passed to thermal_zone_device_register_with_trips() as an argument. Also use the thermal_zone_update_trip_temp() helper introduced previously to update temperatures of the passive and active trip points after a trip points change notification from the platform firmware. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-17	thermal: core: Rework and rename __for_each_thermal_trip()	Rafael J. Wysocki
	Rework the currently unused __for_each_thermal_trip() to pass original pointers to struct thermal_trip objects to the callback, so it can be used for updating trip data (e.g. temperatures), rename it to for_each_thermal_trip() and make it available to modular drivers. Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-17	ACPI: thermal: Introduce struct acpi_thermal_trip	Rafael J. Wysocki
	Add struct acpi_thermal_trip to contain the temperature and valid flag of each trip point in the driver's local data structures. This helps to make the subsequent changes more straightforward. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-17	ACPI: thermal: Carry out trip point updates under zone lock	Rafael J. Wysocki
	There is a race condition between acpi_thermal_trips_update() and acpi_thermal_check_fn(), because the trip points may get updated while the latter is running which in theory may lead to inconsistent results. For example, if two trips are updated together, using the temperature value of one of them from before the update and the temperature value of the other one from after the update may not lead to the expected outcome. Moreover, if thermal_get_trend() runs when a trip points update is in progress, it may end up using stale trip point temperatures. To address this, make acpi_thermal_trips_update() call thermal_zone_device_exec() to carry out the trip points update and use a new acpi_thermal_adjust_thermal_zone() wrapper around __acpi_thermal_trips_update() as the callback function for the latter. While at it, change the acpi_thermal_trips_update() return data type to void as that function always returns 0 anyway. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-17	ACPI: thermal: Clean up acpi_thermal_register_thermal_zone()	Rafael J. Wysocki
	Rename the trips variable in acpi_thermal_register_thermal_zone() to trip_count so its name better reflects the purpose, rearrange white space in the loop over active trips for clarity and reduce code duplication related to calling thermal_zone_device_register() by using an extra local variable to store the passive delay value. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-17	thermal: core: Introduce thermal_zone_device_exec()	Rafael J. Wysocki
	Introduce a new helper function, thermal_zone_device_exec(), that can be used by drivers to run a given callback routine under the zone lock. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-17	cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie	Liao Chang
	The function cppc_freq_invariance_init() may failed to create kworker_fie, make it more robust by setting fie_disabled to FIE_DISBALED to prevent an invalid pointer dereference in kthread_destroy_worker(), which called from cppc_freq_invariance_exit(). Signed-off-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2023-08-17	cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.	Liao Chang
	The cpufreq framework used to use the zero of return value to reflect the cppc_cpufreq_get_rate() had failed to get current frequecy and treat all positive integer to be succeed. Since cppc_get_perf_ctrs() returns a negative integer in error case, so it is better to convert the value to zero as the return value of cppc_cpufreq_get_rate(). Signed-off-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2023-08-17	cpufreq: Prefer to print cpuid in MIN/MAX QoS register error message	Liao Chang
	When a cpufreq_policy is allocated, the cpus, related_cpus and real_cpus of policy are still unset. Therefore, it is preferable to print the passed 'cpu' parameter instead of a empty 'cpus' cpumask in error message when registering MIN/MAX QoS notifier fails. Signed-off-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2023-08-16	net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset	Alfred Lee
	If the switch is reset during active EEPROM transactions, as in just after an SoC reset after power up, the I2C bus transaction may be cut short leaving the EEPROM internal I2C state machine in the wrong state. When the switch is reset again, the bad state machine state may result in data being read from the wrong memory location causing the switch to enter unexpected mode rendering it inoperational. Fixes: a3dcb3e7e70c ("net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset") Signed-off-by: Alfred Lee <l00g33k@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230815001323.24739-1-l00g33k@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16	Revert "net: ethernet: ti: am65-cpsw: add mqprio qdisc offload in channel mode"	Jakub Kicinski
	This reverts commit 90bc21aaef4adaefceda2d385756138fc247c0c2. Patch was merged too hastily, Vladimir requested changes in: https://lore.kernel.org/all/20230816121305.5dio5tk3chge2ndh@skbuf/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16	drm/nouveau/disp: fix use-after-free in error handling of ↵	Karol Herbst
	nouveau_connector_create We can't simply free the connector after calling drm_connector_init on it. We need to clean up the drm side first. It might not fix all regressions from commit 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts"), but at least it fixes a memory corruption in error handling related to that commit. Link: https://lore.kernel.org/lkml/20230806213107.GFZNARG6moWpFuSJ9W@fat_crate.local/ Fixes: 95983aea8003 ("drm/nouveau/disp: add connector class") Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230814144933.3956959-1-kherbst@redhat.com
2023-08-16	soc: rockchip: grf: Fix SDMMC not working on RK3588 with bus-width > 1	Ondrej Jirman
	RK3588 has the same issue as other earlier RK SoCs. JTAG functionality muxed to some SDMMC data pins causes issues with SDMMC interface. Without this patch, I can only use SDMMC inteface with bus-width = <1>. (JTAG is muxed to data pins D2 and D3) Signed-off-by: Ondrej Jirman <megi@xff.cz> Link: https://lore.kernel.org/r/20230619011002.2249960-1-megi@xff.cz Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-08-16	genpd: rockchip: Add PD_VO entry for rv1126	Jagan Teki
	PD_VO power-domain entry in RV1126 are connected to - BIU_VO - VOP - RGA - IEP - DSIHOST Add an entry for it. Signed-off-by: Jagan Teki <jagan@edgeble.ai> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20230731110012.2913742-2-jagan@edgeble.ai Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-08-16	net/mlx5: Fix mlx5_cmd_update_root_ft() error flow	Shay Drory
	The cited patch change mlx5_cmd_update_root_ft() to work with multiple peer devices. However, it didn't align the error flow as well. Hence, Fix the error code to work with multiple peer devices. Fixes: 222dd185833e ("{net/RDMA}/mlx5: introduce lag_for_each_peer") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-16	net/mlx5e: XDP, Fix fifo overrun on XDP_REDIRECT	Dragos Tatulea
	Before this fix, running high rate traffic through XDP_REDIRECT with multibuf could overrun the fifo used to release the xdp frames after tx completion. This resulted in corrupted data being consumed on the free side. The culplirt was a miscalculation of the fifo size: the maximum ratio between fifo entries / data segments was incorrect. This ratio serves to calculate the max fifo size for a full sq where each packet uses the worst case number of entries in the fifo. This patch fixes the formula and names the constant. It also makes sure that future values will use a power of 2 number of entries for the fifo mask to work. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Fixes: 3f734b8c594b ("net/mlx5e: XDP, Use multiple single-entry objects in xdpi_fifo") Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-16	Revert "Revert "drm/amdgpu/display: change pipe policy for DCN 2.0""	Alex Deucher
	This reverts commit 27dd79c00aeab36cd7542c7a4481a32549038659. It appears MPC_SPLIT_DYNAMIC still causes problems with multiple displays on DCN2.0 hardware. Switch back to MPC_SPLIT_AVOID_MULT_DISP. This increases power usage with multiple displays, but avoids hangs. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2475 Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.4.x
2023-08-16	drm/amd: flush any delayed gfxoff on suspend entry	Mario Limonciello
	DCN 3.1.4 is reported to hang on s2idle entry if graphics activity is happening during entry. This is because GFXOFF was scheduled as delayed but RLC gets disabled in s2idle entry sequence which will hang GFX IP if not already in GFXOFF. To help this problem, flush any delayed work for GFXOFF early in s2idle entry sequence to ensure that it's off when RLC is changed. commit 4b31b92b143f ("drm/amdgpu: complete gfxoff allow signal during suspend without delay") modified power gating flow so that if called in s0ix that it ensured that GFXOFF wasn't put in work queue but instead processed immediately. This is dead code due to commit 10cb67eb8a1b ("drm/amdgpu: skip CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly called as part of the suspend entry code. Remove that dead code. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-08-16	drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix	Tim Huang
	GFX v11.0.1 reported fence fallback timer expired issue on SDMA and GFX rings after S0ix resume. This is generated by EOP interrupts are disabled when S0ix suspend but fails to re-enable when resume because of the GFX is in GFXOFF. [ 203.349571] [drm] Fence fallback timer expired on ring sdma0 [ 203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0 [ 203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0 For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers to configure the fence driver interrupts for rings that belong to GFX. The interrupts configuration will be restored by GFXOFF exit. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-08-16	drm/amdgpu: skip xcp drm device allocation when out of drm resource	James Zhu
	Return 0 when drm device alloc failed with -ENOSPC in order to allow amdgpu drive loading. But the xcp without drm device node assigned won't be visiable in user space. This helps amdgpu driver loading on system which has more than 64 nodes, the current limitation. The proposal to add more drm nodes is discussed in public, which will support up to 2^20 nodes totally. kernel drm: https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/ libdrm: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305 Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16	drm/amd/pm: Update pci link width for smu v13.0.6	Asad Kamal
	Update addresses of PCIE link width registers, & link width format used to populate gpu metrics table for smu v13.0.6 v2: Removed ESM register update v3: Updated patch subject and message Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16	drm/amd/pm: Fix temperature unit of SMU v13.0.6	Lijo Lazar
	Temperature needs to be reported in millidegree Celsius. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16	drm/amdgpu/pm: fix throttle_status for other than MP1 11.0.7	Umio Yasuno
	Use the right metrics table version based on the firmware. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2720 Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-08-16	drm/amdgpu: disable mcbp if parameter zero is set	Jiadong Zhu
	The parameter amdgpu_mcbp shall have priority against the default value calculated from the chip version. User could disable mcbp by setting the parameter mcbp as zero. v2: do not trigger preemption in sw ring muxer when mcbp is disabled. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16	vfio/type1: fix cap_migration information leak	Stefan Hajnoczi
	Fix an information leak where an uninitialized hole in struct vfio_iommu_type1_info_cap_migration on the stack is exposed to userspace. The definition of struct vfio_iommu_type1_info_cap_migration contains a hole as shown in this pahole(1) output: struct vfio_iommu_type1_info_cap_migration { struct vfio_info_cap_header header; /* 0 8 / __u32 flags; / 8 4 / / XXX 4 bytes hole, try to pack / __u64 pgsize_bitmap; / 16 8 / __u64 max_dirty_bitmap_size; / 24 8 / / size: 32, cachelines: 1, members: 4 / / sum members: 28, holes: 1, sum holes: 4 / / last cacheline: 32 bytes / }; The cap_mig variable is filled in without initializing the hole: static int vfio_iommu_migration_build_caps(struct vfio_iommu iommu, struct vfio_info_cap caps) { struct vfio_iommu_type1_info_cap_migration cap_mig; cap_mig.header.id = VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION; cap_mig.header.version = 1; cap_mig.flags = 0; / support minimum pgsize / cap_mig.pgsize_bitmap = (size_t)1 << __ffs(iommu->pgsize_bitmap); cap_mig.max_dirty_bitmap_size = DIRTY_BITMAP_SIZE_MAX; return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig)); } The structure is then copied to a temporary location on the heap. At this point it's already too late and ioctl(VFIO_IOMMU_GET_INFO) copies it to userspace later: int vfio_info_add_capability(struct vfio_info_cap caps, struct vfio_info_cap_header cap, size_t size) { struct vfio_info_cap_header header; header = vfio_info_cap_add(caps, size, cap->id, cap->version); if (IS_ERR(header)) return PTR_ERR(header); memcpy(header + 1, cap + 1, size - sizeof(*header)); return 0; } This issue was found by code inspection. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Fixes: ad721705d09c ("vfio iommu: Add migration capability to report supported features") Link: https://lore.kernel.org/r/20230801155352.1391945-1-stefanha@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16	vfio/fsl-mc: Use module_fsl_mc_driver macro to simplify the code	Li Zetao
	Use the module_fsl_mc_driver macro to simplify the code and remove redundant initialization owner in vfio_fsl_mc_driver. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230809131536.4021639-1-lizetao1@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16	vfio/cdx: Remove redundant initialization owner in vfio_cdx_driver	Li Zetao
	The cdx_driver_register() will set "THIS_MODULE" to driver.owner when register a cdx_driver driver, so it is redundant initialization to set driver.owner in the statement. Remove it for clean code. Signed-off-by: Li Zetao <lizetao1@huawei.com> Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com> Link: https://lore.kernel.org/r/20230808020937.2975196-1-lizetao1@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16	drivers: base: test: Add missing MODULE_* macros to root device tests	Maxime Ripard
	Commit 06188bc80ccb ("drivers: base: Add basic devm tests for root devices") introduced a new set of tests for root devices that could be compiled as a module, but didn't have the usual module macros. Make sure they're there. Fixes: 06188bc80ccb ("drivers: base: Add basic devm tests for root devices") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20230816073019.1446155-2-mripard@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16	drivers: base: test: Add missing MODULE_* macros for platform devices tests	Maxime Ripard
	Commit b4cc44301b9d ("drivers: base: Add basic devm tests for platform devices") introduced a new set of tests for platform devices that could be compiled as a module, but didn't have the usual module macros. Make sure they're there. Fixes: b4cc44301b9d ("drivers: base: Add basic devm tests for platform devices") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20230816073019.1446155-1-mripard@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16	vfio/pds: Add Kconfig and documentation	Brett Creeley
	Add Kconfig entries and pds-vfio-pci.rst. Also, add an entry in the MAINTAINERS file for this new driver. It's not clear where documentation for vendor specific VFIO drivers should live, so just re-use the current amd ethernet location. Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230807205755.29579-9-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16	vfio/pds: Add support for firmware recovery	Brett Creeley
	It's possible that the device firmware crashes and is able to recover due to some configuration and/or other issue. If a live migration is in progress while the firmware crashes, the live migration will fail. However, the VF PCI device should still be functional post crash recovery and subsequent migrations should go through as expected. When the pds_core device notices that firmware crashes it sends an event to all its client drivers. When the pds_vfio driver receives this event while migration is in progress it will request a deferred reset on the next migration state transition. This state transition will report failure as well as any subsequent state transition requests from the VMM/VFIO. Based on uapi/vfio.h the only way out of VFIO_DEVICE_STATE_ERROR is by issuing VFIO_DEVICE_RESET. Once this reset is done, the migration state will be reset to VFIO_DEVICE_STATE_RUNNING and migration can be performed. If the event is received while no migration is in progress (i.e. the VM is in normal operating mode), then no actions are taken and the migration state remains VFIO_DEVICE_STATE_RUNNING. Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230807205755.29579-8-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16	vfio/pds: Add support for dirty page tracking	Brett Creeley
	In order to support dirty page tracking, the driver has to implement the VFIO subsystem's vfio_log_ops. This includes log_start, log_stop, and log_read_and_clear. All of the tracker resources are allocated and dirty tracking on the device is started during log_start. The resources are cleaned up and dirty tracking on the device is stopped during log_stop. The dirty pages are determined and reported during log_read_and_clear. In order to support these callbacks admin queue commands are used. All of the adminq queue command structures and implementations are included as part of this patch. PDS_LM_CMD_DIRTY_STATUS is added to query the current status of dirty tracking on the device. This includes if it's enabled (i.e. number of regions being tracked from the device's perspective) and the maximum number of regions supported from the device's perspective. PDS_LM_CMD_DIRTY_ENABLE is added to enable dirty tracking on the specified number of regions and their iova ranges. PDS_LM_CMD_DIRTY_DISABLE is added to disable dirty tracking for all regions on the device. PDS_LM_CMD_READ_SEQ and PDS_LM_CMD_DIRTY_WRITE_ACK are added to support reading and acknowledging the currently dirtied pages. Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20230807205755.29579-7-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16	vfio/pds: Add VFIO live migration support	Brett Creeley
	Add live migration support via the VFIO subsystem. The migration implementation aligns with the definition from uapi/vfio.h and uses the pds_core PF's adminq for device configuration. The ability to suspend, resume, and transfer VF device state data is included along with the required admin queue command structures and implementations. PDS_LM_CMD_SUSPEND and PDS_LM_CMD_SUSPEND_STATUS are added to support the VF device suspend operation. PDS_LM_CMD_RESUME is added to support the VF device resume operation. PDS_LM_CMD_STATE_SIZE is added to determine the exact size of the VF device state data. PDS_LM_CMD_SAVE is added to get the VF device state data. PDS_LM_CMD_RESTORE is added to restore the VF device with the previously saved data from PDS_LM_CMD_SAVE. PDS_LM_CMD_HOST_VF_STATUS is added to notify the DSC/firmware when a migration is in/not-in progress from the host's perspective. The DSC/firmware can use this to clear/setup any necessary state related to a migration. Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230807205755.29579-6-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16	vfio/pds: register with the pds_core PF	Brett Creeley
	The pds_core driver will supply adminq services, so find the PF and register with the DSC services. Use the following commands to enable a VF: echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230807205755.29579-5-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16	pds_core: Require callers of register/unregister to pass PF drvdata	Brett Creeley
	Pass a pointer to the PF's private data structure rather than bouncing in and out of the PF's PCI function address. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230807205755.29579-4-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16	vfio/pds: Initial support for pds VFIO driver	Brett Creeley
	This is the initial framework for the new pds-vfio-pci device driver. This does the very basics of registering the PDS PCI device and configuring it as a VFIO PCI device. With this change, the VF device can be bound to the pds-vfio-pci driver on the host and presented to the VM as an ethernet VF. Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230807205755.29579-3-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16	vfio: Commonize combine_ranges for use in other VFIO drivers	Brett Creeley
	Currently only Mellanox uses the combine_ranges function. The new pds_vfio driver also needs this function. So, move it to a common location for other vendor drivers to use. Also, fix RCT ordering while moving/renaming the function. Cc: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20230807205755.29579-2-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16	drm/panel: simple: Fix Innolux G156HCE-L01 LVDS clock	Luca Ceresoli
	This panel has been implemented in commit eae7488814b5 ("drm/panel-simple: Add Innolux G156HCE-L01 panel entry") with a higher clock than the typical one mentioned on the documentation to avoid flickering on the unit tested. Testing on a different unit shows that the panel actually works with the intended 70.93 MHz clock and even lower frequencies so the flickering is likely caused either by a defective unit or by other different components such as the bridge. Fixes: eae7488814b5 ("drm/panel-simple: Add Innolux G156HCE-L01 panel entry") Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Tested-by: Marek Vasut <marex@denx.de> # MX8MM with LT9211 Reviewed-by: Marek Vasut <marex@denx.de> [narmstrong: fixed commit id in commit msg] Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230814134024.397739-1-luca.ceresoli@bootlin.com
2023-08-16	drm/amd/pm: disallow the fan setting if there is no fan on smu 13.0.0	Kenneth Feng
	drm/amd/pm: disallow the fan setting if there is no fan on smu 13.0.0 V2: depend on pm.no_fan to check Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-08-16	virtchnl: fix fake 1-elem arrays for structures allocated as `nents`	Alexander Lobakin
	Finally, fix 3 structures which are allocated technically correctly, i.e. the calculated size equals to the one that struct_size() would return, except for sizeof(). For &virtchnl_vlan_filter_list_v2, use the same approach when there are no enough space as taken previously for &virtchnl_vlan_filter_list, i.e. let the maximum size be calculated automatically instead of trying to guestimate it using maths. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-16	virtchnl: fix fake 1-elem arrays in structures allocated as `nents + 1`	Alexander Lobakin
	There are five virtchnl structures, which are allocated and checked in the code as `nents + 1`, meaning that they always have memory for one excessive element regardless of their actual number. This comes from that their sizeof() includes space for 1 element and then they get allocated via struct_size() or its open-coded equivalents, passing the actual number of elements. Expand virtchnl_struct_size() to handle such structures and replace those 1-elem arrays with proper flex ones. Also fix several places which open-code %IAVF_VIRTCHNL_VF_RESOURCE_SIZE. Finally, let the virtchnl_ether_addr_list size be computed automatically when there's no enough space for the whole list, otherwise we have to open-code reverse struct_size() logics. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-16	virtchnl: fix fake 1-elem arrays in structs allocated as `nents + 1` - 1	Alexander Lobakin
	The two most problematic virtchnl structures are virtchnl_rss_key and virtchnl_rss_lut. Their "flex" arrays have the type of u8, thus, when allocating / checking, the actual size is calculated as `sizeof + nents - 1 byte`. But their sizeof() is not 1 byte larger than the size of such structure with proper flex array, it's two bytes larger due to the padding. That said, their size is always 1 byte larger unless there are no tail elements -- then it's +2 bytes. Add virtchnl_struct_size() macro which will handle this case (and later other cases as well). Make its calling conv the same as we call struct_size() to allow it to be drop-in, even though it's unlikely to become possible to switch to generic API. The macro will calculate a proper size of a structure with a flex array at the end, so that it becomes transparent for the compilers, but add the difference from the old values, so that the real size of sorta-ABI-messages doesn't change. Use it on the allocation side in IAVF and the receiving side (defined as static inline in virtchnl.h) for the mentioned two structures. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>