summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2025-06-26drm/i915/wm: make struct intel_dbuf_state opaque typeJani Nikula
With all the code touching struct intel_dbuf_state moved inside skl_watermark.c, we move the struct definition there too, and make the type opaque. This nicely reduces includes from skl_watermark.h. Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://lore.kernel.org/r/83ae5f022a1d6d83c031e5c079b04dc739102565.1750847509.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-06-26drm/i915/wm: add more accessors to dbuf stateJani Nikula
Add intel_dbuf_num_enabled_slices() and intel_dbuf_num_active_pipes() helpers to avoid looking at struct intel_dbuf_state internals outside of skl_watermark.c. Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://lore.kernel.org/r/7d555e7b4e93632b732b8b5a3cd4076baf781bee.1750847509.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-06-26drm/i915/wm: abstract intel_dbuf_pmdemand_needs_update()Jani Nikula
Add intel_dbuf_pmdemand_needs_update() helper to avoid looking at struct intel_dbuf_state internals outside of skl_watermark.c. With this, we can also move to_intel_dbuf_state(), intel_atomic_get_old_dbuf_state(), and intel_atomic_get_new_dbuf_state() inside skl_watermark.c. Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://lore.kernel.org/r/b493f259d0d3db047151fee18d7e801ad469fa88.1750847509.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-06-26drm/i915: remove unused DISPLAY_PLANE_FLIP_PENDING() macroJani Nikula
DISPLAY_PLANE_FLIP_PENDING() has been unused since commit fd3a40242e87 ("drm/i915: Rip out legacy page_flip completion/irq handling"). Remove. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250625132140.1564473-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-06-26drm/i915/display: Implement wa_16011342517Nemesa Garg
While doing voltage swing for type-c phy for DP 1.62 and HDMI write the LOADGEN_SHARING_PMD_DISABLE bit to 1. -v2: Update commit. Add bspec[Suraj] -v3: Move w/a before DKL_TX_PMD_LANE_SUS. Use DKL_TX_DPCNTL2[Ville] -v4: Use intel_encoder_is_dp and intel_encoder_is_hdmi. [Suraj] Bspec: 55359 Signed-off-by: Nemesa Garg <nemesa.garg@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/20250625074911.194085-1-nemesa.garg@intel.com
2025-06-26drm/i915/panel: register drm_panel and call prepare/unprepare for eDPArun R Murthy
Allocate and register drm_panel to allow the panel_follower framework to detect the eDP panel and pass drm_connector::kdev device to drm_panel allocation for matching. Call drm_panel_prepare/unprepare in ddi_enable for eDP to allow the followers to get notified of the panel power state changes. Note: This is for eDP with DDI platforms only. v2: remove backlight setup from panel_register (Jani) v3: Updated the commit message (Jani) Signed-off-by: Arun R Murthy <arun.r.murthy@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/20250624-edp_panel-v3-1-e8197b6d9fde@intel.com
2025-06-25gpu: nova-core: replace `Duration` with `Delta`Alexandre Courbot
The kernel's `Delta` type was not available when the `wait_on` function was introduced. Now that it is, switch to it as it is more compact than `Duration` and cannot panic. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Link: https://lore.kernel.org/r/20250624-nova-delta-v1-1-b37d75a593ac@nvidia.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-06-25drm/nouveau/disp: Use dev->dev to get the deviceSakari Ailus
The local variable dev points to drm->dev already, use dev directly. Link: https://lore.kernel.org/r/20250409103344.3661603-1-sakari.ailus@linux.intel.com Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-06-25drm/xe: Do not wedge device on killed exec queuesMatthew Brost
When a user closes an exec queue or interrupts an app with Ctrl-C, this does not warrant wedging the device in mode 2. Avoid this by skipping the wedge check for killed exec queues in the TDR and LR exec queue cleanup worker. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250624174103.2707941-1-matthew.brost@intel.com
2025-06-25drm/bridge: ti-sn65dsi86: Add HPD for DisplayPort connector typeJayesh Choudhary
By default, HPD was disabled on SN65DSI86 bridge. When the driver was added (commit "a095f15c00e27"), the HPD_DISABLE bit was set in pre-enable call which was moved to other function calls subsequently. Later on, commit "c312b0df3b13" added detect utility for DP mode. But with HPD_DISABLE bit set, all the HPD events are disabled[0] and the debounced state always return 1 (always connected state). Set HPD_DISABLE bit conditionally based on display sink's connector type. Since the HPD_STATE is reflected correctly only after waiting for debounce time (~100-400ms) and adding this delay in detect() is not feasible owing to the performace impact (glitches and frame drop), remove runtime calls in detect() and add hpd_enable()/disable() bridge hooks with runtime calls, to detect hpd properly without any delay. [0]: <https://www.ti.com/lit/gpn/SN65DSI86> (Pg. 32) Fixes: c312b0df3b13 ("drm/bridge: ti-sn65dsi86: Implement bridge connector operations for DP") Cc: Max Krummenacher <max.krummenacher@toradex.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com> Signed-off-by: Jayesh Choudhary <j-choudhary@ti.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20250624044835.165708-1-j-choudhary@ti.com
2025-06-25Revert "drm/xe/ptl: Apply Wa_16026007364"Daniele Ceraolo Spurio
This reverts commit 3972872e459d812ab5e481a231a6066cf4f4d0f4. There are several things wrong with the way this WA was implemented: - The KLV is only supported on GuC 70.47.0 or newer, so we shouldn't apply it unconditionally. - The KLV requires 2 DWs of data, which are not currently provided. The GuC currently ignores any unknown KLVs, so on versions older that 70.47.0 nothing happens. However, starting on 70.47.0 the GuC attempts to parse the KLV and fails due to the missing data, causing a GuC load abort. Given that 70.47.0 is the first GuC version approved for public release for PTL, let's revert this patch so it doesn't cause the GuC load to fail with that blob. We can then re-apply it properly fixed after the GuC definition is merged, which will also have the added benefit of running the KLV addition through CI with the right GuC version. Fixes: 3972872e459d ("drm/xe/ptl: Apply Wa_16026007364") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: sanirban <sk.anirban@intel.com> Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250625001202.1616606-2-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-25drm/i915: fix build error some moreArnd Bergmann
An earlier patch fixed a build failure with clang, but I still see the same problem with some configurations using gcc: drivers/gpu/drm/i915/i915_pmu.c: In function 'config_mask': include/linux/compiler_types.h:568:38: error: call to '__compiletime_assert_462' declared with attribute error: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1 drivers/gpu/drm/i915/i915_pmu.c:116:3: note: in expansion of macro 'BUILD_BUG_ON' 116 | BUILD_BUG_ON(bit > As I understand it, the problem is that the function is not always fully inlined, but the __builtin_constant_p() can still evaluate the argument as being constant. Marking it as __always_inline so far works for me in all configurations. Fixes: a7137b1825b5 ("drm/i915/pmu: Fix build error with GCOV and AutoFDO enabled") Fixes: a644fde77ff7 ("drm/i915/pmu: Change bitmask of enabled events to u32") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250620111824.3395007-1-arnd@kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit ef69f9dd1cd7301cdf04ba326ed28152a3affcf6) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2025-06-25gpu: nova-core: consider `clippy::cast_lossless`Danilo Krummrich
Fix all warnings caused by `clippy::cast_lossless`, which is going to be enabled by [1]. Cc: Alexandre Courbot <acourbot@nvidia.com> Cc: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/r/20250615-ptr-as-ptr-v12-5-f43b024581e8@gmail.com [1] Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Tested-by: Alexandre Courbot <acourbot@nvidia.com> Link: https://lore.kernel.org/r/20250624132337.2242-2-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-06-25gpu: nova-core: impl From for u32 for enums used from register!Danilo Krummrich
Implement From for u32 for all enum types used within the register!() macro. This avoids a conflict with [1] as reported in [2]. Cc: Alexandre Courbot <acourbot@nvidia.com> Cc: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/r/20250615-ptr-as-ptr-v12-5-f43b024581e8@gmail.com [1] Link: https://lore.kernel.org/all/20250624173114.3be38990@canb.auug.org.au/ [2] Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Tested-by: Alexandre Courbot <acourbot@nvidia.com> Link: https://lore.kernel.org/r/20250624132337.2242-1-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-06-24drm/i915: reduce stack usage in igt_vma_pin1()Arnd Bergmann
The igt_vma_pin1() function has a rather high stack usage, which gets in the way of reducing the default warning limit: In file included from drivers/gpu/drm/i915/i915_vma.c:2285: drivers/gpu/drm/i915/selftests/i915_vma.c:257:12: error: stack frame size (1288) exceeds limit (1280) in 'igt_vma_pin1' [-Werror,-Wframe-larger-than] There are two things going on here: - The on-stack modes[] array is really large itself and gets constructed for every call, using around 1000 bytes itself depending on the configuration. - The call to i915_vma_pin() gets inlined and adds another 200 bytes for the i915_gem_ww_ctx structure since commit 7d1c2618eac5 ("drm/i915: Take reservation lock around i915_vma_pin.") The second one is easy enough to change, by moving the function into the appropriate .c file. Since it is already large enough to not always be inlined, this seems like a good idea regardless, reducing both the code size and the internal stack usage of each of its 67 callers. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250620113644.3844552-1-arnd@kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-24drm/i915: fix build error some moreArnd Bergmann
An earlier patch fixed a build failure with clang, but I still see the same problem with some configurations using gcc: drivers/gpu/drm/i915/i915_pmu.c: In function 'config_mask': include/linux/compiler_types.h:568:38: error: call to '__compiletime_assert_462' declared with attribute error: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1 drivers/gpu/drm/i915/i915_pmu.c:116:3: note: in expansion of macro 'BUILD_BUG_ON' 116 | BUILD_BUG_ON(bit > As I understand it, the problem is that the function is not always fully inlined, but the __builtin_constant_p() can still evaluate the argument as being constant. Marking it as __always_inline so far works for me in all configurations. Fixes: 686d773186bf ("drm/i915/pmu: Fix build error with GCOV and AutoFDO enabled") Fixes: a644fde77ff7 ("drm/i915/pmu: Change bitmask of enabled events to u32") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250620111824.3395007-1-arnd@kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-24drm/i915/wm: reduce stack usage in skl_print_wm_changes()Arnd Bergmann
When KMSAN is enabled, this function causes has a rather excessive stack usage: drivers/gpu/drm/i915/display/skl_watermark.c:2977:1: error: stack frame size (1432) exceeds limit (1408) in 'skl_compute_wm' [-Werror,-Wframe-larger-than] This is apparently all caused by the varargs calls to drm_dbg_kms(). Inlining this into skl_compute_wm() means that any function called by skl_compute_wm() has its own stack on top of that. Move the worst bit into a separate function marked as noinline_for_stack to limit that to the one code path that actually needs it. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250620113748.3869160-1-arnd@kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-24drm/xe/guc: Explicitly exit CT safe mode on unwindMichal Wajdeczko
During driver probe we might be briefly using CT safe mode, which is based on a delayed work, but usually we are able to stop this once we have IRQ fully operational. However, if we abort the probe quite early then during unwind we might try to destroy the workqueue while there is still a pending delayed work that attempts to restart itself which triggers a WARN. This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] ------------[ cut here ]------------ [ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq [ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710 [ ] RIP: 0010:__queue_work+0x287/0x710 [ ] Call Trace: [ ] delayed_work_timer_fn+0x19/0x30 [ ] call_timer_fn+0xa1/0x2a0 Exit the CT safe mode on unwind to avoid that warning. Fixes: 09b286950f29 ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250612220937.857-3-michal.wajdeczko@intel.com
2025-06-24drm/xe: Process deferred GGTT node removals on device unwindMichal Wajdeczko
While we are indirectly draining our dedicated workqueue ggtt->wq that we use to complete asynchronous removal of some GGTT nodes, this happends as part of the managed-drm unwinding (ggtt_fini_early), which could be later then manage-device unwinding, where we could already unmap our MMIO/GMS mapping (mmio_fini). This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747340 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747540 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747240 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747040 tiles_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746840 mmio_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747f40 xe_bo_pinned_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746b40 devm_drm_dev_init_release (16 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] drmres release begin [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef81640 __fini_relay (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80d40 guc_ct_fini (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80040 __drmm_mutex_release (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80140 ggtt_fini_early (8 bytes) and this was leading to: [ ] BUG: unable to handle page fault for address: ffffc900058162a0 [ ] #PF: supervisor write access in kernel mode [ ] #PF: error_code(0x0002) - not-present page [ ] Oops: Oops: 0002 [#1] SMP NOPTI [ ] Tainted: [W]=WARN [ ] Workqueue: xe-ggtt-wq ggtt_node_remove_work_func [xe] [ ] RIP: 0010:xe_ggtt_set_pte+0x6d/0x350 [xe] [ ] Call Trace: [ ] <TASK> [ ] xe_ggtt_clear+0xb0/0x270 [xe] [ ] ggtt_node_remove+0xbb/0x120 [xe] [ ] ggtt_node_remove_work_func+0x30/0x50 [xe] [ ] process_one_work+0x22b/0x6f0 [ ] worker_thread+0x1e8/0x3d Add managed-device action that will explicitly drain the workqueue with all pending node removals prior to releasing MMIO/GSM mapping. Fixes: 919bb54e989c ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250612220937.857-2-michal.wajdeczko@intel.com
2025-06-24drm/xe/hwmon: Fix xe_hwmon_power_max_writeKarthik Poosa
Prevent other bits of mailbox power limit from being overwritten with 0. This issue was due to a missing read and modify of current power limit, before setting a requested mailbox power limit, which is added in this patch. v2: - Improve commit message. (Anshuman) v3: - Rebase. - Rephrase commit message. (Riana) - Add read-modify-write variant of xe_hwmon_pcode_write_power_limit() i.e. xe_hwmon_pcode_rmw_power_limit(). (Badal) - Use xe_hwmon_pcode_rmw_power_limit() to set mailbox power limits. - Remove xe_hwmon_pcode_write_power_limit() as all mailbox power limits writes use xe_hwmon_pcode_rmw_power_limit() only. v4: - Use PWR_LIM in place of (PWR_LIM_EN | PWR_LIM_VAL) wherever applicable. (Riana) Fixes: 25a2aa779fc3 ("drm/xe/hwmon: Add support to manage power limits though mailbox") Reviewed-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250617120030.612819-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 8aa7306631f088881759398972d503757cf0c901) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-24drm/xe/display: Add check for alloc_ordered_workqueue()Haoxiang Li
Add check for the return value of alloc_ordered_workqueue() in xe_display_create() to catch potential exception. Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/4ee1b0e5d1626ce1dde2e82af05c2edaed50c3aa.1747397638.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 5b62d63395d5b7d4094e7cd380bccae4b25415cb) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-24drm/xe: move DPT l2 flush to a more sensible placeMatthew Auld
Only need the flush for DPT host updates here. Normal GGTT updates don't need special flush. Fixes: 01570b446939 ("drm/xe/bmg: implement Wa_16023588340") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250606104546.1996818-4-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-24drm/xe: Move DSB l2 flush to a more sensible placeMaarten Lankhorst
Flushing l2 is only needed after all data has been written. Fixes: 01570b446939 ("drm/xe/bmg: implement Wa_16023588340") Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://lore.kernel.org/r/20250606104546.1996818-3-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-24drm/xe/bmg: Update Wa_22019338487Vinay Belgaumkar
Limit GT max frequency to 2600MHz and wait for frequency to reduce before proceeding with a transient flush. This is really only needed for the transient flush: if L2 flush is needed due to 16023588340 then there's no need to do this additional wait since we are already using the bigger hammer. v2: Use generic names, ensure user set max frequency requests wait for flush to complete (Rodrigo) v3: - User requests wait via wait_var_event_timeout (Lucas) - Close races on flush + user requests (Lucas) - Fix xe_guc_pc_remove_flush_freq_limit() being called on last gt rather than root gt (Lucas) v4: - Only apply the freq reducing part if a TDF is needed: L2 flush trumps the need for waiting a lower frequency Fixes: aaa08078e725 ("drm/xe/bmg: Apply Wa_22019338487") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-4-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-24drm/xe: Split xe_device_td_flush()Lucas De Marchi
xe_device_td_flush() has 2 possible implementations: an entire L2 flush or a transient flush, depending on WA 16023588340. Make this clear by splitting the function so it calls each of them. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-3-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-24drm/xe/xe_guc_pc: Lock once to update stashed frequenciesLucas De Marchi
pc_set_mert_freq_cap() currently lock()/unlock() the mutex multiple times to stash the current frequencies. It's not a problem since xe_guc_pc_restore_stashed_freq() is guaranteed to be called only later in the init sequence. However, now that we have _locked() variants for this functions, use them and avoid potential issues when called from other places or using the same pattern. While at it, prefer and early return for the WA check to reduce indentation. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-2-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-24drm/xe/guc_pc: Add _locked variant for min/max freqLucas De Marchi
There are places in which the getters/setters are called one after the other causing a multiple lock()/unlock(). These are not currently a problem since they are all happening from the same thread, but there's a race possibility as calls are added outside of the early init when the max/min and stashed values need to be correlated. Add the _locked() variants to prepare for that. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-1-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-24drm/amd/display: Add sanity checks for drm_edid_raw()Takashi Iwai
When EDID is retrieved via drm_edid_raw(), it doesn't guarantee to return proper EDID bytes the caller wants: it may be either NULL (that leads to an Oops) or with too long bytes over the fixed size raw_edid array (that may lead to memory corruption). The latter was reported actually when connected with a bad adapter. Add sanity checks for drm_edid_raw() to address the above corner cases, and return EDID_BAD_INPUT accordingly. Fixes: 48edb2a4256e ("drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid") Link: https://bugzilla.suse.com/show_bug.cgi?id=1236415 Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 648d3f4d209725d51900d6a3ed46b7b600140cdf) Cc: stable@vger.kernel.org
2025-06-24drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL valueMario Limonciello
[Why] commit 16dc8bc27c2a ("drm/amd/display: Export full brightness range to userspace") adjusted the brightness range to scale to larger values, but missed updating AMDGPU_MAX_BL_LEVEL which is needed to make sure that scaling works properly with custom brightness curves. [How] As the change for max brightness of 0xFFFF only applies to devices supporting DC, use existing DC define MAX_BACKLIGHT_LEVEL. Fixes: 16dc8bc27c2a ("drm/amd/display: Export full brightness range to userspace") Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250623171114.1156451-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5b852044eb0d3e1f1c946d32e05fcb068e0a20a0) Cc: stable@vger.kernel.org
2025-06-24drm/amdgpu/sdma7: add ucode version checks for userq supportAlex Deucher
SDMA 7.0.0/1: 7836028 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8c011408ed842dfccdd50a90a9cf6bccdb85cc0e)
2025-06-24drm/amdgpu/sdma6: add ucode version checks for userq supportAlex Deucher
SDMA 6.0.0 version 24 SDMA 6.0.2 version 21 SDMA 6.0.3 version 25 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e8cca30d8b34f1c4101c237914c53068d4a55e73)
2025-06-24drm/amd: Adjust output for discovery error handlingMario Limonciello
commit 017fbb6690c2 ("drm/amdgpu/discovery: check ip_discovery fw file available") added support for reading an amdgpu IP discovery bin file for some specific products. If it's not found then it will fallback to hardcoded values. However if it's not found there is also a lot of noise about missing files and errors. Adjust the error handling to decrease most messages to DEBUG and to show users less about missing files. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Marcus Seyfarth <m.seyfarth@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4312 Tested-by: Marcus Seyfarth <m.seyfarth@gmail.com> Fixes: 017fbb6690c2 ("drm/amdgpu/discovery: check ip_discovery fw file available") Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250617183052.1692059-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 49f1f9f6c3c9febf8ba93f94a8d9c8d03e1ea0a1)
2025-06-24drm/amdgpu/mes: add compatibility checks for set_hw_resource_1Alex Deucher
Seems some older MES firmware versions do not properly support this packet. Add back some the compatibility checks. v2: switch to fw version check (Shaoyun) Fixes: f81cd793119e ("drm/amd/amdgpu: Fix MES init sequence") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4295 Cc: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: shaoyun.liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0180e0a5dd5c6ff118043ee42dbbbddaf881f283) Cc: stable@vger.kernel.org
2025-06-24drm/amdgpu/gfx9: Add Cleaner Shader Support for GFX9.x GPUsSrinivasan Shanmugam
Enable the cleaner shader for other GFX9.x series of GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX9.x GPUs, previously available for GFX9.4.2. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Manu Rastogi <manu.rastogi@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 99808926d0ea6234a89e35240a7cb088368de9e1)
2025-06-24drm/amd/display: Add sanity checks for drm_edid_raw()Takashi Iwai
When EDID is retrieved via drm_edid_raw(), it doesn't guarantee to return proper EDID bytes the caller wants: it may be either NULL (that leads to an Oops) or with too long bytes over the fixed size raw_edid array (that may lead to memory corruption). The latter was reported actually when connected with a bad adapter. Add sanity checks for drm_edid_raw() to address the above corner cases, and return EDID_BAD_INPUT accordingly. Fixes: 48edb2a4256e ("drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid") Link: https://bugzilla.suse.com/show_bug.cgi?id=1236415 Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amd/pm: revise the pcie dpm parametersKenneth Feng
revise the pcie dpm parameters Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amd/display: Add a trace event for brightness programmingMario Limonciello
[Why] Brightness programming may involve a conversion of a user requested brightness against what was in a custom brightness curve. The values might not match what a user programmed. [How] Add a new trace event to show specific converted brightness values. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250623171114.1156451-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL valueMario Limonciello
[Why] commit 16dc8bc27c2a ("drm/amd/display: Export full brightness range to userspace") adjusted the brightness range to scale to larger values, but missed updating AMDGPU_MAX_BL_LEVEL which is needed to make sure that scaling works properly with custom brightness curves. [How] As the change for max brightness of 0xFFFF only applies to devices supporting DC, use existing DC define MAX_BACKLIGHT_LEVEL. Fixes: 16dc8bc27c2a ("drm/amd/display: Export full brightness range to userspace") Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250623171114.1156451-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amd: Fix spelling mistake "correctalbe" -> "correctable"Colin Ian King
There is a spelling mistake in a pr_info message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu/sdma7: add ucode version checks for userq supportAlex Deucher
SDMA 7.0.0/1: 7836028 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu/sdma6: add ucode version checks for userq supportAlex Deucher
SDMA 6.0.0 version 24 SDMA 6.0.2 version 21 SDMA 6.0.3 version 25 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu: Add more checks to PSP mailboxLijo Lazar
Instead of checking the response flag, use status mask also to check against any unexpected failures like a device drop. Also, log error if waiting on a psp response fails/times out. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu: Convert init_mem_ranges into common helpersHawking Zhang
They can be shared across multiple products Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu: Generalize is_multi_chiplet with a common helper v2Hawking Zhang
It is not necessary to be ip generation specific v2: rename the helper to is_multi_aid (Lijo) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu: Convert query_memory_partition into common helpersHawking Zhang
The query_memory_partition does not need to remain as soc specific callbacks. They can be shared across multiple products Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu: Move MAX_MEM_RANGES to amdgpu_gmc.hHawking Zhang
This relocation allows MAX_MEM_RANGES to be shared across multiple products Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu: Convert pre|post_partition_switch into common helpersHawking Zhang
So they can be reused for future products Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu: Convert update_supported_modes into a common helperHawking Zhang
So it can be used for future products Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu: Convert update_partition_sched_list into a common helper v3Hawking Zhang
The update_partition_sched_list function does not need to remain as a soc specific callback. It can be reused for future products. v2: bypass the function if xcp_mgr is not available (Likun) v3: Let caller check the availability of xcp_mgr (Lijo) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu: Convert select_sched into a common helper v3Hawking Zhang
The xcp select_sched function does not need to remain as a soc specific callback. It can be reused for future products v2: bypass the function if xcp_mgr is not available (Likun) v3: Let caller check the availability of xcp mgr (Lijo) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>