linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-05-01	drm/xe: fix devcoredump chunk alignmnent calculation	Arnd Bergmann
	The device core dumps are copied in 1.5GB chunks, which leads to a link-time error on 32-bit builds because of the 64-bit division not getting trivially turned into mask and shift operations: ERROR: modpost: "__moddi3" [drivers/gpu/drm/xe/xe.ko] undefined! On top of this, I noticed that the ALIGN_DOWN() usage here cannot work because that is only defined for power-of-two alignments. Change ALIGN_DOWN into an explicit div_u64_rem() that avoids the link error and hopefully produces the right results. Doing a 1.5GB kvmalloc() does seem a bit suspicious as well, e.g. this will clearly fail on any 32-bit platform and is also likely to run out of memory on 64-bit systems under memory pressure, so using a much smaller power-of-two chunk size might be a good idea instead. v2: - Always call div_u64_rem (Matt) Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504251238.JsNgFeFc-lkp@intel.com/ Fixes: c4a2e5f865b7 ("drm/xe: Add devcoredump chunking") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250501012545.1045247-1-matthew.brost@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-05-01	drm/amdgpu: Add DPG pause for VCN v5.0.1	Sonny Jiang
	For vcn5.0.1 only, enable DPG PAUSE to avoid DPG resets. Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3e5f86c14c3440171f2a3e7a68ceb739297726e9)
2025-05-01	drm/amdgpu: Fix offset for HDP remap in nbio v7.11	Lijo Lazar
	APUs in passthrough mode use HDP flush. 0x7F000 offset used for remapping HDP flush is mapped to VPE space which could get power gated. Use another unused offset in BIF space. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d8116a32cdbe456c7f511183eb9ab187e3d590fb) Cc: stable@vger.kernel.org
2025-05-01	drm/amdgpu: Fail DMABUF map of XGMI-accessible memory	Felix Kuehling
	If peer memory is XGMI-accessible, we should never access it through PCIe P2P DMA mappings. PCIe P2P is slower, has different coherence behaviour, limited or no support for atomics, or may not work at all. Fail with a warning if DMABUF mappings of such memory are attempted. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dbe4c63689bc6b5fd3ab72650ea4b6a667e96a68)
2025-05-01	drm/amd/display: Fix slab-use-after-free in hdcp	Chris Bainbridge
	The HDCP code in amdgpu_dm_hdcp.c copies pointers to amdgpu_dm_connector objects without incrementing the kref reference counts. When using a USB-C dock, and the dock is unplugged, the corresponding amdgpu_dm_connector objects are freed, creating dangling pointers in the HDCP code. When the dock is plugged back, the dangling pointers are dereferenced, resulting in a slab-use-after-free: [ 66.775837] BUG: KASAN: slab-use-after-free in event_property_validate+0x42f/0x6c0 [amdgpu] [ 66.776171] Read of size 4 at addr ffff888127804120 by task kworker/0:1/10 [ 66.776179] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.14.0-rc7-00180-g54505f727a38-dirty #233 [ 66.776183] Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.17 12/18/2024 [ 66.776186] Workqueue: events event_property_validate [amdgpu] [ 66.776494] Call Trace: [ 66.776496] <TASK> [ 66.776497] dump_stack_lvl+0x70/0xa0 [ 66.776504] print_report+0x175/0x555 [ 66.776507] ? __virt_addr_valid+0x243/0x450 [ 66.776510] ? kasan_complete_mode_report_info+0x66/0x1c0 [ 66.776515] kasan_report+0xeb/0x1c0 [ 66.776518] ? event_property_validate+0x42f/0x6c0 [amdgpu] [ 66.776819] ? event_property_validate+0x42f/0x6c0 [amdgpu] [ 66.777121] __asan_report_load4_noabort+0x14/0x20 [ 66.777124] event_property_validate+0x42f/0x6c0 [amdgpu] [ 66.777342] ? __lock_acquire+0x6b40/0x6b40 [ 66.777347] ? enable_assr+0x250/0x250 [amdgpu] [ 66.777571] process_one_work+0x86b/0x1510 [ 66.777575] ? pwq_dec_nr_in_flight+0xcf0/0xcf0 [ 66.777578] ? assign_work+0x16b/0x280 [ 66.777580] ? lock_is_held_type+0xa3/0x130 [ 66.777583] worker_thread+0x5c0/0xfa0 [ 66.777587] ? process_one_work+0x1510/0x1510 [ 66.777588] kthread+0x3a2/0x840 [ 66.777591] ? kthread_is_per_cpu+0xd0/0xd0 [ 66.777594] ? trace_hardirqs_on+0x4f/0x60 [ 66.777597] ? _raw_spin_unlock_irq+0x27/0x60 [ 66.777599] ? calculate_sigpending+0x77/0xa0 [ 66.777602] ? kthread_is_per_cpu+0xd0/0xd0 [ 66.777605] ret_from_fork+0x40/0x90 [ 66.777607] ? kthread_is_per_cpu+0xd0/0xd0 [ 66.777609] ret_from_fork_asm+0x11/0x20 [ 66.777614] </TASK> [ 66.777643] Allocated by task 10: [ 66.777646] kasan_save_stack+0x39/0x60 [ 66.777649] kasan_save_track+0x14/0x40 [ 66.777652] kasan_save_alloc_info+0x37/0x50 [ 66.777655] __kasan_kmalloc+0xbb/0xc0 [ 66.777658] __kmalloc_cache_noprof+0x1c8/0x4b0 [ 66.777661] dm_dp_add_mst_connector+0xdd/0x5c0 [amdgpu] [ 66.777880] drm_dp_mst_port_add_connector+0x47e/0x770 [drm_display_helper] [ 66.777892] drm_dp_send_link_address+0x1554/0x2bf0 [drm_display_helper] [ 66.777901] drm_dp_check_and_send_link_address+0x187/0x1f0 [drm_display_helper] [ 66.777909] drm_dp_mst_link_probe_work+0x2b8/0x410 [drm_display_helper] [ 66.777917] process_one_work+0x86b/0x1510 [ 66.777919] worker_thread+0x5c0/0xfa0 [ 66.777922] kthread+0x3a2/0x840 [ 66.777925] ret_from_fork+0x40/0x90 [ 66.777927] ret_from_fork_asm+0x11/0x20 [ 66.777932] Freed by task 1713: [ 66.777935] kasan_save_stack+0x39/0x60 [ 66.777938] kasan_save_track+0x14/0x40 [ 66.777940] kasan_save_free_info+0x3b/0x60 [ 66.777944] __kasan_slab_free+0x52/0x70 [ 66.777946] kfree+0x13f/0x4b0 [ 66.777949] dm_dp_mst_connector_destroy+0xfa/0x150 [amdgpu] [ 66.778179] drm_connector_free+0x7d/0xb0 [ 66.778184] drm_mode_object_put.part.0+0xee/0x160 [ 66.778188] drm_mode_object_put+0x37/0x50 [ 66.778191] drm_atomic_state_default_clear+0x220/0xd60 [ 66.778194] __drm_atomic_state_free+0x16e/0x2a0 [ 66.778197] drm_mode_atomic_ioctl+0x15ed/0x2ba0 [ 66.778200] drm_ioctl_kernel+0x17a/0x310 [ 66.778203] drm_ioctl+0x584/0xd10 [ 66.778206] amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu] [ 66.778375] __x64_sys_ioctl+0x139/0x1a0 [ 66.778378] x64_sys_call+0xee7/0xfb0 [ 66.778381] do_syscall_64+0x87/0x140 [ 66.778385] entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fix this by properly incrementing and decrementing the reference counts when making and deleting copies of the amdgpu_dm_connector pointers. (Mario: rebase on current code and update fixes tag) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4006 Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> Fixes: da3fd7ac0bcf3 ("drm/amd/display: Update CP property based on HW query") Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250417215005.37964-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d4673f3c3b3dcb74e36e53cdfc880baa7a87b330) Cc: stable@vger.kernel.org
2025-05-01	net: vertexcom: mse102x: Fix RX error handling	Stefan Wahren
	In case the CMD_RTS got corrupted by interferences, the MSE102x doesn't allow a retransmission of the command. Instead the Ethernet frame must be shifted out of the SPI FIFO. Since the actual length is unknown, assume the maximum possible value. Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250430133043.7722-5-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	net: vertexcom: mse102x: Add range check for CMD_RTS	Stefan Wahren
	Since there is no protection in the SPI protocol against electrical interferences, the driver shouldn't blindly trust the length payload of CMD_RTS. So introduce a bounds check for incoming frames. Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250430133043.7722-4-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	net: vertexcom: mse102x: Fix LEN_MASK	Stefan Wahren
	The LEN_MASK for CMD_RTS doesn't cover the whole parameter mask. The Bit 11 is reserved, so adjust LEN_MASK accordingly. Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250430133043.7722-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	net: vertexcom: mse102x: Fix possible stuck of SPI interrupt	Stefan Wahren
	The MSE102x doesn't provide any SPI commands for interrupt handling. So in case the interrupt fired before the driver requests the IRQ, the interrupt will never fire again. In order to fix this always poll for pending packets after opening the interface. Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250430133043.7722-2-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	net: hns3: defer calling ptp_clock_register()	Jian Shen
	Currently the ptp_clock_register() is called before relative ptp resource ready. It may cause unexpected result when upper layer called the ptp API during the timewindow. Fix it by moving the ptp_clock_register() to the function end. Fixes: 0bf5eb788512 ("net: hns3: add support for PTP") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250430093052.2400464-5-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	net: hns3: fixed debugfs tm_qset size	Hao Lan
	The size of the tm_qset file of debugfs is limited to 64 KB, which is too small in the scenario with 1280 qsets. The size needs to be expanded to 1 MB. Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250430093052.2400464-4-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	net: hns3: fix an interrupt residual problem	Yonglong Liu
	When a VF is passthrough to a VM, and the VM is killed, the reported interrupt may not been handled, it will remain, and won't be clear by the nic engine even with a flr or tqp reset. When the VM restart, the interrupt of the first vector may be dropped by the second enable_irq in vfio, see the issue below: https://gitlab.com/qemu-project/qemu/-/issues/2884#note_2423361621 We notice that the vfio has always behaved this way, and the interrupt is a residue of the nic engine, so we fix the problem by moving the vector enable process out of the enable_irq loop. Fixes: 08a100689d4b ("net: hns3: re-organize vector handle") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250430093052.2400464-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	net: hns3: store rx VLAN tag offload state for VF	Jian Shen
	The VF driver missed to store the rx VLAN tag strip state when user change the rx VLAN tag offload state. And it will default to enable the rx vlan tag strip when re-init VF device after reset. So if user disable rx VLAN tag offload, and trig reset, then the HW will still strip the VLAN tag from packet nad fill into RX BD, but the VF driver will ignore it for rx VLAN tag offload disabled. It may cause the rx VLAN tag dropped. Fixes: b2641e2ad456 ("net: hns3: Add support of hardware rx-vlan-offload to HNS3 VF driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250430093052.2400464-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	Merge branch '200GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-04-29 (idpf, igc) For idpf: Michal fixes error path handling to remove memory leak. Larysa prevents reset from being called during shutdown. For igc: Jake adjusts locking order to resolve sleeping in atomic context. * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: fix lock order in igc_ptp_reset idpf: protect shutdown from reset idpf: fix potential memory leak on kcalloc() failure ==================== Link: https://patch.msgid.link/20250429221034.3909139-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	octeon_ep: Fix host hang issue during device reboot	Sathesh B Edara
	When the host loses heartbeat messages from the device, the driver calls the device-specific ndo_stop function, which frees the resources. If the driver is unloaded in this scenario, it calls ndo_stop again, attempting to free resources that have already been freed, leading to a host hang issue. To resolve this, dev_close should be called instead of the device-specific stop function.dev_close internally calls ndo_stop to stop the network interface and performs additional cleanup tasks. During the driver unload process, if the device is already down, ndo_stop is not called. Fixes: 5cb96c29aa0e ("octeon_ep: add heartbeat monitor") Signed-off-by: Sathesh B Edara <sedara@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250429114624.19104-1-sedara@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	net: fec: ERR007885 Workaround for conventional TX	Mattias Barthel
	Activate TX hang workaround also in fec_enet_txq_submit_skb() when TSO is not enabled. Errata: ERR007885 Symptoms: NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out commit 37d6017b84f7 ("net: fec: Workaround for imx6sx enet tx hang when enable three queues") There is a TDAR race condition for mutliQ when the software sets TDAR and the UDMA clears TDAR simultaneously or in a small window (2-4 cycles). This will cause the udma_tx and udma_tx_arbiter state machines to hang. So, the Workaround is checking TDAR status four time, if TDAR cleared by hardware and then write TDAR, otherwise don't set TDAR. Fixes: 53bb20d1faba ("net: fec: add variable reg_desc_active to speed things up") Signed-off-by: Mattias Barthel <mattias.barthel@atlascopco.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250429090826.3101258-1-mattiasbarthel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	net: lan743x: Fix memleak issue when GSO enabled	Thangaraj Samynathan
	Always map the `skb` to the LS descriptor. Previously skb was mapped to EXT descriptor when the number of fragments is zero with GSO enabled. Mapping the skb to EXT descriptor prevents it from being freed, leading to a memory leak Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250429052527.10031-1-thangaraj.s@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	ptp: ocp: Fix NULL dereference in Adva board SMA sysfs operations	Sagi Maimon
	On Adva boards, SMA sysfs store/get operations can call __handle_signal_outputs() or __handle_signal_inputs() while the `irig` and `dcf` pointers are uninitialized, leading to a NULL pointer dereference in __handle_signal() and causing a kernel crash. Adva boards don't use `irig` or `dcf` functionality, so add Adva-specific callbacks `ptp_ocp_sma_adva_set_outputs()` and `ptp_ocp_sma_adva_set_inputs()` that avoid invoking `irig` or `dcf` input/output routines. Fixes: ef61f5528fca ("ptp: ocp: add Adva timecard support") Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250429073320.33277-1-maimon.sagi@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	bnxt_en: fix module unload sequence	Vadim Fedorenko
	Recent updates to the PTP part of bnxt changed the way PTP FIFO is cleared, skbs waiting for TX timestamps are now cleared during ndo_close() call. To do clearing procedure, the ptp structure must exist and point to a valid address. Module destroy sequence had ptp clear code running before netdev close causing invalid memory access and kernel crash. Change the sequence to destroy ptp structure after device close. Fixes: 8f7ae5a85137 ("bnxt_en: improve TX timestamping FIFO configuration") Reported-by: Taehee Yoo <ap420073@gmail.com> Closes: https://lore.kernel.org/netdev/CAMArcTWDe2cd41=ub=zzvYifaYcYv-N-csxfqxUvejy_L0D6UQ@mail.gmail.com/ Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Tested-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250430170343.759126-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01	drm/rockchip: add CONFIG_OF dependency	Arnd Bergmann
	DRM_DISPLAY_DP_AUX_BUS cannot be selected when CONFIG_OF is disabled: WARNING: unmet direct dependencies detected for DRM_DISPLAY_DP_AUX_BUS Depends on [n]: HAS_IOMEM [=y] && DRM [=y] && OF [=n] Selected by [y]: - DRM_ROCKCHIP [=y] && HAS_IOMEM [=y] && DRM [=y] && ROCKCHIP_IOMMU [=y] && ROCKCHIP_ANALOGIX_DP [=y] Rockchip platforms all depend on OF anyway, so add the dependency here for compile testing. Fixes: d7b4936b2bc0 ("drm/rockchip: analogix_dp: Add support to get panel from the DP AUX bus") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Acked-by: Andy Yan <andy.yan@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250423164422.2793634-1-arnd@kernel.org
2025-05-01	ASoC: stm32: sai: fix kernel rate configuration	Mark Brown
	Merge series from Olivier Moysan <olivier.moysan@foss.st.com>: This patchset adds some checks on kernel minimum rate requirements. This avoids potential clock rate misconfiguration, when setting the kernel frequency on STM32MP2 SoCs.
2025-05-01	Merge tag 'drm-misc-fixes-2025-04-30' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes A spurious WARN fix for nouveau, an init and interrupt handling fixes for ivpu, a warning fix for ttm, a hotplug fix for fdinfo, vblank fixes for adp, a memory leak fix for the shmem kunit tests, and a timing fix for mipi-dbi. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://lore.kernel.org/r/20250430-dark-eggplant-trout-c4ea6c@houat
2025-04-30	drm/amd/display: Rename program_timing function for better debugging	Antonio Fernando Silva e Cruz Filho
	[WHY] Improve the output when using the ftrace debug feature, making it easier to identify which function is currently being executed. [HOW] Rename the program_timing function to a name that includes the path to the function's file. Signed-off-by: Antonio Fernando Silva e Cruz Filho <fernando.cruz.ctt@gmail.com> Co-developed-by: André Nogueira Ribeiro <r.andrenogueira@gmail.com> Signed-off-by: André Nogueira Ribeiro <r.andrenogueira@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu/userq: remove unnecessary NULL check	Dan Carpenter
	The "ticket" pointer points to in the middle of the &exec struct so it can't be NULL. Remove the check. Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu/userq: Call unreserve on error in amdgpu_userq_fence_read_wptr()	Dan Carpenter
	This error path should call amdgpu_bo_unreserve() before returning. Fixes: d8675102ba32 ("drm/amdgpu: add vm root BO lock before accessing the vm") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu: properly handle GC vs MM in amdgpu_vmid_mgr_init()	Alex Deucher
	When kernel queues are disabled, all GC vmids are available for the scheduler. MM vmids are still managed by the driver so make all 16 available. Also fix gmc 10 vs 11 mix up in commit 1f61fc28b939 ("drm/amdgpu/mes: make more vmids available when disable_kq=1") v2: Properly handle pre-GC 10 hardware Fixes: 1f61fc28b939 ("drm/amdgpu/mes: make more vmids available when disable_kq=1") Cc: Arvind Yadav <Arvind.Yadav@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu/mes: use correct MES pipe for resets	Alex Deucher
	Use the KIQ pipe for kernel queues and the SCHED pipe for user queues. Fixes: 2408b0272b04 ("drm/amdgpu/mes: consolidate on a single mes reset callback") Cc: Michael Chen <Michael.Chen@amd.com> Cc: Shaoyun Liu <Shaoyun.Liu@amd.com> Reviewed-by: Michael Chen <michael.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu/mes: consolidate on a single mes reset callback	Alex Deucher
	Use the legacy one as it covers both kernel queues and user queues. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu/mes: remove more unused functions	Alex Deucher
	These were leftover from mes bring up and are unused. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu/userq: fix user_queue parameters list	Bagas Sanjaya
	Sphinx reports htmldocs warning: Documentation/gpu/amdgpu/module-parameters:7: drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1119: ERROR: Unexpected indentation. [docutils] Fix the warning by using reST bullet list syntax for user_queue parameter options, separated from preceding paragraph by a blank line. Fixes: fb20954c9717 ("drm/amdgpu/userq: rework driver parameter") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20250422202956.176fb590@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu: Fix comment style	Lijo Lazar
	Fix code comment style Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504271826.xy2fFO28-lkp@intel.com/ Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amd/pm: Fix comment style	Lijo Lazar
	Fix code comment style Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504271422.D6cqMlZ0-lkp@intel.com/ Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu: Print bootloader status for long waits	Lijo Lazar
	If it needs a long wait for completion of bootloader execution, report the status in between. That helps to know if there is some issue during bootloader execution. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu: refine MES register print for devices of hive	Yifan Zha
	[Why] Register access print missed device info. [How] Using dev_xxx instead of DRM_xxx to indicate which device of a hive is the message for. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu: Fix query order of XGMI v6.4.1 status	Lijo Lazar
	Keep the register offsets as per link order for querying XGMI v6.4.1 link status. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Fixes: 6dee64e765c4 ("drm/amdgpu: Fix xgmi v6.4.1 link status reporting") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amd/pm: Add board voltage node to hwmon	Asad Kamal
	Add and expose board voltage node as vddboard to hwmon for smu_v13_0_6 v2: Replace ip check with supported sensor attribute(Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu: Fix API status offset for MES queue reset	Jesse.Zhang
	The mes_v11_0_reset_hw_queue and mes_v12_0_reset_hw_queue functions were using the wrong union type (MESAPI__REMOVE_QUEUE) when getting the offset for api_status. Since these functions handle queue reset operations, they should use MESAPI__RESET union instead. This fixes the polling of API status during hardware queue reset operations in the MES for both v11 and v12 versions. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-By: Shaoyun.liu <Shaoyun.liu@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amd/pm: Add voltage caps for smu_v13_0_6	Asad Kamal
	Add & enable board voltage caps for smu_v13_0_6 v3: Update version check for board voltage support Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amd/pm: Fill static metrics data	Asad Kamal
	Fill static metrics data for smu_v13_0_6 v2: Proceed with driver load just with warning even if board voltage reads invalid value Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amd/pm: Use common function to fetch static metrics table	Asad Kamal
	Use common function to fetch static metrics table for smu_v13_0_12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amd/pn: Fetch static metrics table	Asad Kamal
	Fetch static metrics table for smu_v13_0_6 v2: Add static metrics caps check to fetch static metrics table v3: Update version having all fixes for static metrics support Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amd/pm: Update pmfw headers for smu_v_13_0_6	Asad Kamal
	Update pmfw headers for smu_v_13_0_6 to include static metrics table Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu/userq: take the userq_mgr lock in enforce isolation	Alex Deucher
	Add the missing locking. Fixes: 94976e7e5ede ("drm/amdgpu/userq: add helpers to start/stop scheduling") Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu/userq: take the userq_mgr lock in suspend/resume	Alex Deucher
	Add the missing locking. Fixes: 73e12e98ec0c ("drm/amdgpu/userq: add suspend and resume helpers") Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu: Add DPG pause for VCN v5.0.1	Sonny Jiang
	For vcn5.0.1 only, enable DPG PAUSE to avoid DPG resets. Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amd/pm: Add ip version check for smu_v13_0_12 functions	Asad Kamal
	Add ip version check to use smu_v13_0_12 specific functions Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amd/display: downgrade HDMI infoframe error to one time warning	Aurabindo Pillai
	In certain config, a modeprobe test triggers too many instances of the error related to infoframe. Make it print only once, and also make it a warning. Fixes: 6027cbee1900 ("drm/amd/display: Add error check for avi and vendor infoframe setup function") Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Chengjun Yao <Chengjun.Yao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdkfd: add pasid debugfs entries	Eric Huang
	the entries will be appearing at /sys/kernel/debug/kfd/proc/<pid>/pasid_<gpuid>. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amdgpu: remove DRM_AMDGPU_NAVI3X_USERQ config for UQ	Arvind Yadav
	DRM_AMDGPU_NAVI3X_USERQ config support is not required for usermode queue. v2: rebase. Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30	drm/amd/display: Fix NULL pointer dereference for program_lut_mode in ↵	Srinivasan Shanmugam
	dcn401_populate_mcm_luts This commit introduces a NULL pointer check for mpc->funcs->program_lut_mode in the dcn401_populate_mcm_luts function. The previous implementation directly called program_lut_mode without validating its existence, which could lead to a NULL pointer dereference. With this change, the function is now only invoked if mpc->funcs->program_lut_mode is not NULL Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn401/dcn401_hwseq.c:720 dcn401_populate_mcm_luts() error: we previously assumed 'mpc->funcs->program_lut_mode' could be null (see line 701) drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn401/dcn401_hwseq.c 642 void dcn401_populate_mcm_luts(struct dc dc, 643 struct pipe_ctx pipe_ctx, 644 struct dc_cm2_func_luts mcm_luts, 645 bool lut_bank_a) 646 { ... 716 } 717 if (m_lut_params.pwl) { 718 if (mpc->funcs->mcm.populate_lut) 719 mpc->funcs->mcm.populate_lut(mpc, m_lut_params, lut_bank_a, mpcc_id); --> 720 mpc->funcs->program_lut_mode(mpc, MCM_LUT_SHAPER, MCM_LUT_ENABLE, lut_bank_a, mpcc_id); Cc: Yihan Zhu <yihanzhu@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Yihan Zhu <yihanzhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>