summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-08drm/xe/guc: Add XE_LP steered register listsZhanjun Dong
Add the ability for runtime allocation and freeing of steered register list extentions that depend on the detected HW config fuses. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-3-zhanjun.dong@intel.com
2024-10-08drm/xe/guc: Prepare GuC register list and update ADS size for error captureZhanjun Dong
Add referenced registers defines and list of registers. Update GuC ADS size allocation to include space for the lists of error state capture register descriptors. Then, populate GuC ADS with the lists of registers we want GuC to report back to host on engine reset events. This list should include global, engine-class and engine-instance registers for every engine-class type on the current hardware. Ensure we allocate a persistent storage for the register lists that are populated into ADS so that we don't need to allocate memory during GT resets when GuC is reloaded and ADS population happens again. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-2-zhanjun.dong@intel.com
2024-10-08drm/xe/xe3lpm: Add new "instance0" steering tableMatt Roper
MCR steering on Xe3 media IP is almost the same as it was on Xe2, except for one new range (0x38D0D0 - 0x38D0FF) which has changed to an MCR "MEDIAINF" range on Xe3. Since we can always steer to grpid / instanceid 0 for MEDIAINF ranges, define a new "INSTANCE0" steering table for Xe3 media. Xe3 can continue to use the same OADDRM/GPMXMT table as Xe2. v2: Merge continuous entries 38D0D0 - 38F0FF Bspec: 74298 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-7-matthew.s.atwood@intel.com
2024-10-08drm/xe/ptl: Add PTL platform definitionHaridhar Kalvala
PTL is an integrated GPU based on the Xe3 architecture. v2: explicitly turn off display until display patches land. Bspec: 72574 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-6-matthew.s.atwood@intel.com
2024-10-08drm/xe/ptl: PTL re-uses Xe2 MOCS tableHaridhar Kalvala
PTL is Xe3 architecture but there is no difference between LNL and PTL in MOCS table. So, PTL uses the same MOCS table as LNL. Bspec: 71582 Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-5-matthew.s.atwood@intel.com
2024-10-08drm/xe/xe3: Define Xe3 feature flagsHaridhar Kalvala
Define a common set of Xe3 feature flags and definitions that will be used for all platforms in this family. The feature flags are inherited unchanged from the Xe2 (XE2_FEATURES) platform. Following B-spec details inherited from Xe2 feature flag definition commit. v2: reuse graphics_xe2 definition Bspec: 58695 - dma_mask_size remains 46 (not documented in bspec) - supports_usm=1 (Bspec 59651) - has_flatccs=1 (Bspec 58797) - has_4tile=1 (Bspec 58788) - has_asid=1 (Bspec 59654, 59265, 60288) - has_range_tlb_invalidate=1 (Bspec 71126) - five-level page table (Bspec 59505) - 1 VD + 1 VE + 1 SFC (Bspec 67103, 70819) - platform engine mask (Bspec 60149) Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-3-matthew.s.atwood@intel.com
2024-10-08drm/xe/xe3: Xe3 uses the same PAT settings as Xe2Matt Roper
Xe3 platforms use the same PAT tables as Xe2. Bspec: 71582 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-2-matthew.s.atwood@intel.com
2024-10-08drm/xe/ptl: L3bank mask is not available on the media GTShekhar Chauhan
On PTL platforms with media version 30.00, the fuse registers for reporting L3 bank availability to the GT just read out as ~0 and do not provide proper values. Xe does not use the L3 bank mask for anything internally; it only passes the mask through to userspace via the GT topology query. Since we don't have any way to get the real L3 bank mask, we don't want to pass garbage to userspace. Passing a zeroed mask or a copy of the primary GT's L3 bank mask would also be inaccurate and likely to cause confusion for userspace. The best approach is to simply not include L3 in the list of masks returned by the topology query in cases where we aren't able to provide a meaningful value. This won't change the behavior for any existing platforms (where we can always obtain L3 masks successfully for all GTs), it will only prevent us from mis-reporting bad information on upcoming platform(s). There's a good chance this will become a formal workaround in the future, but for now we don't have a lineage number so "no_media_l3" is used in place of a lineage as the OOB workaround descriptor. v2: - Re-calculate query size to properly match data returned. (Gustavo) - Update kerneldoc to clarify that the L3bank mask may not be included in the query results if the hardware doesn't make it available. (Gustavo) Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Co-developed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Acked-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241007154143.2021124-2-matthew.d.roper@intel.com
2024-10-08drm/radeon: always set GEM function pointerChristian König
Make sure to always set the GEM function pointer even for in kernel allocations. This fixes a NULL pointer deref caused by switching to GEM references. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: fd69ef05029f ("drm/radeon: use GEM references instead of TTMs") Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08drm/amdgpu: fix dm_suspend/resume arguments to ip_blockSunil Khatri
"build failure after merge of the amdgpu tree" dm_suspend/dm_resume functions argument mismatch not caught in validation as it was under config CONFIG_DEBUG_KERNEL_DC which wasnt enabled by default. Change argument from adev to ip_block. Fixes: 982d7f9bfe4a ("drm/amdgpu: update the handle ptr in suspend") Fixes: 7feb4f3ad8be ("drm/amdgpu: update the handle ptr in resume") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08drm/amdgpu: no need to log error in multi ring writeSunil Khatri
No need to log error in multi ring write as its taken care during ring commit. This is inline with change done in amdgpu_ring_write. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08drm/amdgpu: move error log from ring write to commitSunil Khatri
Move the error message from ring write as an optimization to avoid printing that message on every write instead print once during commit if it exceeds write the allocated size i.e ring->count_dw. Also we do not want to log the error message in between a ring write and complete the write as its mostly not harmful as it will overwrite stale data only as GPU read from ring is faster than CPU write to ring. This reduces the size of amdgpu.ko module by around 600 Kb as write is very often used function and hence the print. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08drm/amdgpu: fix typosAndrew Kreimer
Fix typos in comments: "wether -> whether". Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08drm/amdgpu: Remove the while loop from amdgpu_job_prepare_jobTvrtko Ursulin
While loop makes it sound like amdgpu_vmid_grab() potentially needs to be called multiple times to produce a fence, while in reality all code paths either return an error, assign a valid job->vmid or assign a vmid which will be valid once the returned fence signals. Therefore we can remove the loop to make it clear the call does not need to be repeated. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08drm/amdgpu: Drop impossible condition from amdgpu_job_prepare_jobTvrtko Ursulin
Fence has been initialised to NULL so no need to test it. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08drm/amd/display: disable SG displays on cyan skillfishAlex Deucher
These parts were mainly for compute workloads, but they have a display that was available for the console. These chips should support SG display, but I don't know that the support was ever validated on Linux so disable it by default. It can still be enabled by setting sg_display=1 for those that want to play with it. These systems also generally had large carve outs so SG display was less of a factor. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3356 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08drm/amdgpu: Use drm_print_memory_stats helper from fdinfoTvrtko Ursulin
Convert fdinfo memory stats to use the common drm_print_memory_stats helper. This achieves alignment with the common keys as documented in drm-usage-stats.rst, adding specifically drm-total- key the driver was missing until now. Additionally I made the code stop skipping total size for objects which currently do not have a backing store, and I added resident, active and purgeable reporting. Legacy keys have been preserved, with the outlook of only potentially removing only the drm-memory- when the time gets right. The example output now looks like this: pos: 0 flags: 02100002 mnt_id: 24 ino: 1239 drm-driver: amdgpu drm-client-id: 4 drm-pdev: 0000:04:00.0 pasid: 32771 drm-total-cpu: 0 drm-shared-cpu: 0 drm-active-cpu: 0 drm-resident-cpu: 0 drm-purgeable-cpu: 0 drm-total-gtt: 2392 KiB drm-shared-gtt: 0 drm-active-gtt: 0 drm-resident-gtt: 2392 KiB drm-purgeable-gtt: 0 drm-total-vram: 44564 KiB drm-shared-vram: 31952 KiB drm-active-vram: 0 drm-resident-vram: 44564 KiB drm-purgeable-vram: 0 drm-memory-vram: 44564 KiB drm-memory-gtt: 2392 KiB drm-memory-cpu: 0 KiB amd-memory-visible-vram: 44564 KiB amd-evicted-vram: 0 KiB amd-evicted-visible-vram: 0 KiB amd-requested-vram: 44564 KiB amd-requested-visible-vram: 11952 KiB amd-requested-gtt: 2392 KiB drm-engine-compute: 46464671 ns v2: * Track purgeable via AMDGPU_GEM_CREATE_DISCARDABLE. Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Rob Clark <robdclark@chromium.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08drm/amdgpu: Drop unused fence argument from amdgpu_vmid_grab_usedTvrtko Ursulin
Fence argument is unused so lets drop it. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08drm: use drm_file client_name in fdinfoPierre-Eric Pelloux-Prayer
Add an optional drm-client-name field to drm fdinfo's output. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003124506.470931-3-pierre-eric.pelloux-prayer@amd.com Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>
2024-10-08drm: add DRM_SET_CLIENT_NAME ioctlPierre-Eric Pelloux-Prayer
Giving the opportunity to userspace to associate a free-form name with a drm_file struct is helpful for tracking and debugging. This is similar to the existing DMA_BUF_SET_NAME ioctl. Access to client_name is protected by a mutex, and the 'clients' debugfs file has been updated to print it. Userspace MR to use this ioctl: https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1428 If the string passed by userspace contains chars that would mess up output when it's going to be printed (in dmesg, fdinfo, etc), -EINVAL is returned. A 0-length string is a valid use, and clears the existing name. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003124506.470931-2-pierre-eric.pelloux-prayer@amd.com Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>
2024-10-08drm/i915/psr: Implement Wa 14019834836Jouni Högander
This patch implements HW workaround 14019834836 for display version 30. v2: - move Wa 14019834836 to it's own function - apply only for display version 30 Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Mika Kahola <mika.kahola@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240926064759.1313335-3-jouni.hogander@intel.com
2024-10-08drm/i915/psr: Add new SU area calculation helper to apply workaroundsJouni Högander
intel_psr2_sel_fetch_update is already quite long function. Now we are about to add one more HW workaround. Let's split applying workarounds to selective update area into a separate function. Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Mika Kahola <mika.kahola@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240926064759.1313335-2-jouni.hogander@intel.com
2024-10-07drm/xe/guc: Add a helper function for dumping GuC log to dmesgJohn Harrison
Create a helper function that can be used to dump the GuC log to dmesg in a manner that is reliable for extraction and decode. The intention is that calls to this can be added by developers when debugging specific issues that require a GuC log but do not allow easy capture of the log - e.g. failures in selftests and failues that lead to kernel hangs. Also note that this is really a temporary stop-gap. The aim is to allow on demand creation and dumping of devcoredump captures (which includes the GuC log and much more). Currently this is not possible as much of the devcoredump code requires a 'struct xe_sched_job' and those are not available at many places that might want to do the dump. v2: Add kerneldoc - review feedback from Michal W. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-12-John.C.Harrison@Intel.com
2024-10-07drm/xe/guc: Add GuC log to devcoredump capturesJohn Harrison
Include the GuC log in devcoredump captures because they can be useful with debugging certain types of bug. v2: Fix kerneldoc v3: Drop module parameter as now using more compact ascii85 encoding rather than hexdump (although still not compressed) (review feedback from Matthew B). Rebase onto recent refactoring of devcoredump code. v4: Don't move the submission snapshot inside the GuC internals structure 'cos it really doesn't belong there. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-11-John.C.Harrison@Intel.com
2024-10-07drm/xe/guc: Dump entire CTB on errorsJohn Harrison
The dump of the CT buffers was only showing the unprocessed data which is not generally useful for saying why a hang occurred - because it was probably caused by the commands that were just processed. So save and dump the entire buffer but in a more compact dump format. Also zero fill it on allocation to avoid confusion over uninitialised data in the dump. v2: Add kerneldoc - review feedback from Michal W. v3: Fix kerneldoc. v4: Use ascii85 instead of hexdump (review feedback from Matthew B). v5: Dump the entire CTB object rather than separately dumping just the H2G and G2H sections. That way it includes the full header info. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-10-John.C.Harrison@Intel.com
2024-10-07drm/xe/guc: Dead CT helperJohn Harrison
Add a worker function helper for asynchronously dumping state when an internal/fatal error is detected in CT processing. Being asynchronous is required to avoid deadlocks and scheduling-while-atomic or process-stalled-for-too-long issues. Also check for a bunch more error conditions and improve the handling of some existing checks. v2: Use compile time CONFIG check for new (but not directly CT_DEAD related) checks and use unsigned int for a bitmask, rename CT_DEAD_RESET to CT_DEAD_REARM and add some explaining comments, rename 'hxg' macro parameter to 'ctb' - review feedback from Michal W. Drop CT_DEAD_ALIVE as no need for a bitfield define to just set the entire mask to zero. v3: Fix kerneldoc v4: Nullify some floating pointers after free. v5: Add section headings and device info to make the state dump look more like a devcoredump to allow parsing by the same tools (eventual aim is to just call the devcoredump code itself, but that currently requires an xe_sched_job, which is not available in the CT code). v6: Fix potential for leaking snapshots with concurrent error conditions (review feedback from Julia F). v7: Don't complain about unexpected G2H messages yet because there is a known issue causing them. Fix bit shift bug with v6 change. Add GT id to fake coredump headers and use puts instead of printf. v8: Disable the head mis-match check in g2h_read because it is failing on various discrete platforms due to unknown reasons. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-9-John.C.Harrison@Intel.com
2024-10-07drm/print: Introduce drm_line_printerMichal Wajdeczko
This drm printer wrapper can be used to increase the robustness of the captured output generated by any other drm_printer to make sure we didn't lost any intermediate lines of the output by adding line numbers to each output line. Helpful for capturing some crash data. v2: Extended short int counters to full int (JohnH) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: dri-devel@lists.freedesktop.org Reviewed-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-8-John.C.Harrison@Intel.com
2024-10-07drm/xe/guc: Use a two stage dump for GuC logs and add more infoJohn Harrison
Split the GuC log dump into a two stage snapshot and print mechanism. This allows the log to be captured at the point of an error (which may be in a restricted context) and then dump it out later (from a regular context such as a worker function or a sysfs file handler). Also add a bunch of other useful pieces of information that can help (or are fundamentally required!) to decode and parse the log. v2: Add kerneldoc and fix a couple of comment typos - review feedback from Michal W. v3: Move chunking code to this patch as it makes the deltas simpler. Fix a bunch of kerneldoc issues. v4: Move the CS frequency out of the coredump snapshot function into the debugfs only code (as that info is already part of the main devcoredump). Add a header to the debugfs log to match the one in the devcoredump to aid processing by a unified tool. Add forcewake to the GuC timestamp read so it actually works. v6: Add colon to GuC version string (review feedback by Julia F). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-7-John.C.Harrison@Intel.com
2024-10-07drm/xe/guc: Copy GuC log prior to dumpingJohn Harrison
Add an extra stage to the GuC log print to copy the log buffer into regular host memory first, rather than printing the live GPU buffer object directly. Doing so helps prevent inconsistencies due to the log being updated as it is being dumped. It also allows the use of the ASCII85 helper function for printing the log in a more compact form than a straight hex dump. v2: Use %zx instead of %lx for size_t prints. v3: Replace hexdump code with ascii85 call (review feedback from Matthew B). Move chunking code into next patch as that reduces the deltas of both. v4: Add a prefix to the ASCII85 output to aid tool parsing. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-6-John.C.Harrison@Intel.com
2024-10-07drm/xe/devcoredump: Add ASCII85 dump helper functionJohn Harrison
There is a need to include the GuC log and other large binary objects in core dumps and via dmesg. So add a helper for dumping to a printer function via conversion to ASCII85 encoding. Another issue with dumping such a large buffer is that it can be slow, especially if dumping to dmesg over a serial port. So add a yield to prevent the 'task has been stuck for 120s' kernel hang check feature from firing. v2: Add a prefix to the output string. Fix memory allocation bug. v3: Correct a string size calculation and clean up a define (review feedback from Julia F). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-5-John.C.Harrison@Intel.com
2024-10-07drm/xe/devcoredump: Improve section headings and add tile infoJohn Harrison
The xe_guc_exec_queue_snapshot is not really a GuC internal thing and is definitely not a GuC CT thing. So give it its own section heading. The snapshot itself is really a capture of the submission backend's internal state. Although all it currently prints out is the submission contexts. So label it as 'Contexts'. If more general state is added later then it could be change to 'Submission backend' or some such. Further, everything from the GuC CT section onwards is GT specific but there was no indication of which GT it was related to (and that is impossible to work out from the other fields that are given). So add a GT section heading. Also include the tile id of the GT, because again significant information. Lastly, drop a couple of unnecessary line feeds within sections. v2: Add GT section heading, add tile id to device section. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-4-John.C.Harrison@Intel.com
2024-10-07drm/xe/devcoredump: Use drm_puts and already cached local variablesJohn Harrison
There are a bunch of calls to drm_printf with static strings. Switch them to drm_puts instead. There are also a bunch of 'coredump->snapshot.XXX' references when 'coredump->snapshot' has alread been cached locally as 'ss'. So use 'ss->XXX' instead. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-3-John.C.Harrison@Intel.com
2024-10-07drm/xe/guc: Remove spurious line feed in debug printJohn Harrison
Including line feeds at the start of a debug print messes up the output when sent to dmesg. The break appears between all the useful prefix information and the actual string being printed. In this case, each block of data has a very clear start line and an extra delimeter is really not necessary. So don't do it. v2: Fix typo in commit message (review feedback from Michal W.) Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-2-John.C.Harrison@Intel.com
2024-10-07drm/i915/display: Fix spelling mistake "Uncomressed" -> "Uncompressed"Colin Ian King
There is a spelling mistake in a drm_WARN message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241002074903.833232-1-colin.i.king@gmail.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-10-07drm/amdgpu: partially revert powerplay `__counted_by` changesAlex Deucher
Partially revert commit 0ca9f757a0e2 ("drm/amd/pm: powerplay: Add `__counted_by` attribute for flexible arrays") The count attribute for these arrays does not get set until after the arrays are allocated and populated leading to false UBSAN warnings. Fixes: 0ca9f757a0e2 ("drm/amd/pm: powerplay: Add `__counted_by` attribute for flexible arrays") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3662 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07Documentation/gpu: Document the situation with unqualified drm-memory-Tvrtko Ursulin
Currently it is not well defined what is drm-memory- compared to other categories. In practice the only driver which emits these keys is amdgpu and in them exposes the current resident buffer object memory (including shared). To prevent any confusion, document that drm-memory- is deprecated and an alias for drm-resident-memory-. While at it also clarify that the reserved sub-string 'memory' refers to the memory region component, and also clarify the intended semantics of other memory categories. v2: * Also mark drm-memory- as deprecated. * Add some more text describing memory categories. (Alex) v3: * Semantics of the amdgpu drm-memory is actually as drm-resident. Reviewed-by: Rob Clark <robdclark@gmail.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.keonig@amd.com> Cc: Rob Clark <robdclark@chromium.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdkfd: SMI report dropped event countPhilip Yang
Add new SMI event to report the dropped event count. When the event kfifo is full, drop count is not zero, or no enough space left to store the event message, increase drop count. After reading event out from kfifo, if event was dropped, drop_count is not zero, generate a dropped event record and reset drop count to zero. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdgpu: Add sysfs interfaces for NPS modeLijo Lazar
Add a sysfs interface to see available NPS modes to switch to - cat /sys/bus/pci/devices/../available_memory_paritition Make the current_memory_partition sysfs node read/write for requesting a new NPS mode. The request is only cached and at a later point a driver unload/reload is required to switch to the new NPS mode. Ex: echo NPS1 > /sys/bus/pci/devices/../current_memory_paritition echo NPS4 > /sys/bus/pci/devices/../current_memory_paritition The above interfaces will be available only if the SOC supports more than one NPS mode. Also modify the current memory partition sysfs logic to be more generic. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdgpu: Add gmc interface to request NPS modeLijo Lazar
Add a common interface in GMC to request NPS mode through PSP. Also add a variable in hive and gmc control to track the last requested mode. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdgpu/gfx10: Apply Isolation Enforcement to GFX & Compute ringsSrinivasan Shanmugam
This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v10_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amd/display: fix hibernate entry for DCN35+Hamza Mahfooz
Since, two suspend-resume cycles are required to enter hibernate and, since we only need to enable idle optimizations in the first cycle (which is pretty much equivalent to s2idle). We can check in_s0ix, to prevent the system from entering idle optimizations before it actually enters hibernate (from display's perspective). Also, call dc_set_power_state() before dc_allow_idle_optimizations(), since it's safer to do so because dc_set_power_state() writes to DMUB. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amd/display: Fetch the EDID from _DDC if available for eDPMario Limonciello
Some manufacturers have intentionally put an EDID that differs from the EDID on the internal panel on laptops. Attempt to fetch this EDID if it exists and prefer it over the EDID that is provided by the panel. If a user prefers to use the EDID from the panel, offer a DC debugging parameter that would disable this. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amd/display: remove redundant freesync parser for DPMelissa Wen
When updating connector under drm_edid infrastructure, many calculations and validations are already done and become redundant inside AMD driver. Remove those driver-specific code in favor of the DRM common code. Signed-off-by: Melissa Wen <mwen@igalia.com> Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amd/display: always call connector_update when parsing freesync_capsMelissa Wen
Update connector caps with drm_edid data before parsing info for freesync. Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amd/display: switch to setting physical address directlyMelissa Wen
Connectors have source physical address available in display info. Use drm_dp_cec_attach() to use it instead of parsing the EDID again. Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amd/display: switch amdgpu_dm_connector to use struct drm_edidMelissa Wen
Replace raw edid handling (struct edid) with the opaque EDID type (struct drm_edid) on amdgpu_dm_connector for consistency. It may also prevent mismatch of approaches in different parts of the driver code. Signed-off-by: Melissa Wen <mwen@igalia.com> Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdgpu: Add PSP interface for NPS switchRajneesh Bhardwaj
Implement PSP ring command interface for memory partitioning on the fly on the supported asics. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amd/display: 3.2.304Aric Cyr
This DC patchset brings improvements in multiple areas. In summary, we highlight: - Improvements to seemless boot. - Adjustments for DSC dock. - DML improvements - DMCUB fixes for D0/D3 and new register offset. - Code cleanup. Signed-off-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amd/display: Initialize new backlight_level_params structureKaitlyn Tse
[Why] Initialize the new backlight_level_params structure as part of the ABC framework, the information in this structure is needed to be passed down to the DMCUB to identify the backlight control type, to adjust the backlight of the panel and to perform any required conversions from PWM to nits or vice versa. [How] Created initial framework of the backlight_level_params struct and modified existing functions to include the new structure. Reviewed-by: Harry Vanzylldejong <harry.vanzylldejong@amd.com> Reviewed-by: Iswara Nagulendran <iswara.nagulendran@amd.com> Reviewed-by: Anthony Koo <anthony.koo@amd.com> Signed-off-by: Kaitlyn Tse <Kaitlyn.Tse@amd.com> Signed-off-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amd/display: Initialize replay_config varKaitlyn Tse
[Why] Uninitialized variables could cause some bits to be set, thus enabling features unintentionally. [How] Initialize replay_config variable to avoid future issues. Reviewed-by: Harry Vanzylldejong <harry.vanzylldejong@amd.com> Reviewed-by: Iswara Nagulendran <iswara.nagulendran@amd.com> Reviewed-by: Anthony Koo <anthony.koo@amd.com> Signed-off-by: Kaitlyn Tse <Kaitlyn.Tse@amd.com> Signed-off-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>