summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-14drm/amd/display: Reset DMCUB before HW initNicholas Kazlauskas
[Why] If the firmware wasn't reset by PSP or HW and is currently running then the firmware will hang or perform underfined behavior when we modify its firmware state underneath it. [How] Reset DMCUB before setting up cache windows and performing HW init. Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14drm/amd/display: [FW Promotion] Release 0.0.97Anthony Koo
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Anthony Koo <Anthony.Koo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14drm/amd/display: Force det buf size to 192KB with 3+ streams and upscalingMichael Strauss
[WHY] This workaround resolves underflow caused by incorrect DST_Y_PREFETCH. Overriding to 192KB DET buf size until the DST_Y_PREFETCH calc is fixed. Reviewed-by: Eric Yang <Eric.Yang2@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14drm/amd/display: parse and check PSR SU capsMikita Lipski
[why] Adding a function to read PSR capabilities and ALPM capabilities. Also adding a helper function to validate if the sink and the driver support PSR SU. [how] - isolated all PSR and ALPM reading calls to a separate funciton - set all required PSR caps - added a helper function to check if PSR SU is supported by sink and the driver Reviewed-by: Roman Li <Roman.Li@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Mikita Lipski <mikita.lipski@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14drm/amd/display: Add src/ext ID info for dummy serviceSolomon Chiu
[Why] Current error log of dummy irq service doesn't have src/ext ID info in the log. [How] Add src/ext ID in ack/set of dummy irq service. Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14drm/amd/display: Add debugfs entry for ILRWayne Lin
[Why & How] In order to know the intermediate link rates supported by the eDP panel and test to select the optimized link rate to save power, create a new debugfs entry "ilr_setting" for setting ILR. Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14drm/amd/display: Set exit_optimized_pwr_state for DCN31Nicholas Kazlauskas
[Why] SMU now respects the PHY refclk disable request from driver. This causes a hang during hotplug when PHY refclk was disabled because it's not being re-enabled and the transmitter control starts on dc_link_detect. [How] We normally would re-enable the clk with exit_optimized_pwr_state but this is only set on DCN21 and DCN301. Set it for dcn31 as well. This fixes DMCUB timeouts in the PHY. Fixes: 64b1d0e8d500 ("drm/amd/display: Add DCN3.1 HWSEQ") Reviewed-by: Eric Yang <Eric.Yang2@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14drm/i915/debugfs: add noreclaim annotationsMatthew Auld
We have a debugfs hook to directly call into i915_gem_shrink() with the fs_reclaim acquire annotations to simulate hitting direct reclaim. However we should also annotate this with memalloc_noreclaim, which will set PF_MEMALLOC for us on the current context, to ensure we can't re-enter direct reclaim(just like "real" direct reclaim does). This is an issue now that ttm_bo_validate could potentially be called here, which might try to allocate a tiny amount of memory to hold the new ttm_resource struct, as per the below splat: [ 2507.913844] WARNING: possible recursive locking detected [ 2507.913848] 5.16.0-rc4+ #5 Tainted: G U [ 2507.913853] -------------------------------------------- [ 2507.913856] gem_exec_captur/1825 is trying to acquire lock: [ 2507.913861] ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: kmem_cache_alloc_trace+0x30/0x390 [ 2507.913875] but task is already holding lock: [ 2507.913879] ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: i915_drop_caches_set+0x1c9/0x2c0 [i915] [ 2507.913962] other info that might help us debug this: [ 2507.913966] Possible unsafe locking scenario: [ 2507.913970] CPU0 [ 2507.913973] ---- [ 2507.913975] lock(fs_reclaim); [ 2507.913979] lock(fs_reclaim); [ 2507.913983] DEADLOCK *** [ 2507.913988] May be due to missing lock nesting notation [ 2507.913992] 4 locks held by gem_exec_captur/1825: [ 2507.913997] #0: ffff888101f6e460 (sb_writers#17){..}-{0:0}, at: ksys_write+0xe9/0x1b0 [ 2507.914009] #1: ffff88812d99e2b8 (&attr->mutex){..}-{3:3}, at: simple_attr_write+0xbb/0x220 [ 2507.914019] #2: ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: i915_drop_caches_set+0x1c9/0x2c0 [i915] [ 2507.914085] #3: ffff8881b4a11b20 (reservation_ww_class_mutex){..}-{3:3}, at: ww_mutex_trylock+0x43f/0xcb0 [ 2507.914097] stack backtrace: [ 2507.914102] CPU: 0 PID: 1825 Comm: gem_exec_captur Tainted: G U 5.16.0-rc4+ #5 [ 2507.914109] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [ 2507.914115] Call Trace: [ 2507.914118] <TASK> [ 2507.914121] dump_stack_lvl+0x59/0x73 [ 2507.914128] __lock_acquire.cold+0x227/0x3b0 [ 2507.914135] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 2507.914141] ? __lock_acquire+0x23ca/0x5000 [ 2507.914147] lock_acquire+0x19c/0x4b0 [ 2507.914152] ? kmem_cache_alloc_trace+0x30/0x390 [ 2507.914157] ? lock_release+0x690/0x690 [ 2507.914163] ? lock_is_held_type+0xe4/0x140 [ 2507.914170] ? ttm_sys_man_alloc+0x47/0xb0 [ttm] [ 2507.914178] fs_reclaim_acquire+0x11a/0x160 [ 2507.914183] ? kmem_cache_alloc_trace+0x30/0x390 [ 2507.914188] kmem_cache_alloc_trace+0x30/0x390 [ 2507.914192] ? lock_release+0x37f/0x690 [ 2507.914198] ttm_sys_man_alloc+0x47/0xb0 [ttm] [ 2507.914206] ttm_bo_pipeline_gutting+0x70/0x440 [ttm] [ 2507.914214] ? ttm_mem_io_free+0x150/0x150 [ttm] [ 2507.914221] ? lock_is_held_type+0xe4/0x140 [ 2507.914227] ttm_bo_validate+0x2fb/0x370 [ttm] [ 2507.914234] ? lock_acquire+0x19c/0x4b0 [ 2507.914239] ? ttm_bo_bounce_temp_buffer.constprop.0+0xf0/0xf0 [ttm] [ 2507.914246] ? lock_acquire+0x131/0x4b0 [ 2507.914251] ? lock_is_held_type+0xe4/0x140 [ 2507.914257] i915_ttm_shrinker_release_pages+0x2bc/0x490 [i915] [ 2507.914339] ? i915_ttm_swap_notify+0x130/0x130 [i915] [ 2507.914429] ? i915_gem_object_release_mmap_offset+0x32/0x250 [i915] [ 2507.914529] i915_gem_shrink+0xb14/0x1290 [i915] [ 2507.914616] ? ___i915_gem_object_make_shrinkable+0x3e0/0x3e0 [i915] [ 2507.914698] ? _raw_spin_unlock_irqrestore+0x2d/0x60 [ 2507.914705] ? track_intel_runtime_pm_wakeref+0x180/0x230 [i915] [ 2507.914777] i915_gem_shrink_all+0x4b/0x70 [i915] [ 2507.914857] i915_drop_caches_set+0x227/0x2c0 [i915] Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211213125530.3960007-1-matthew.auld@intel.com
2021-12-14drm/i915/ttm: fix large buffer population trucationRobert Beckett
ttm->num_pages is uint32_t which was causing very large buffers to only populate a truncated size. This fixes gem_create@create-clear igt test on large memory systems. Fixes: 7ae034590cea ("drm/i915/ttm: add tt shmem backend") Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211210195005.2582884-1-bob.beckett@collabora.com
2021-12-14drm: document DRM_IOCTL_MODE_GETFB2Simon Ser
There are a few details specific to the GETFB2 IOCTL. It's not immediately clear how user-space should check for the number of planes. Suggest using the handles field or the pitches field. The modifier array is filled with zeroes, ie. DRM_FORMAT_MOD_LINEAR. So explicitly tell user-space to not look at it unless the flag is set. Changes in v2 (Daniel): - Mention that handles should be used to compute the number of planes, and only refer to pitches as a fallback. - Reword bit about undefined modifier. Signed-off-by: Simon Ser <contact@emersion.fr> Acked-by: Daniel Vetter <daniel@ffwll.ch> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Acked-by: Daniel Stone <daniels@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211123112400.22245-1-contact@emersion.fr
2021-12-14drm/i915: Test all device memory on probingChris Wilson
This extends the previous sanitychecking of device memory to read/write all the memory on the device during the device probe, ala memtest86, as an optional module parameter: i915.memtest=1. This is not expected to be fast, but a reasonably thorough verfification that the device memory is accessible and doesn't return bit errors. v2: Rebased. Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211208153404.27546-4-ramalingam.c@intel.com
2021-12-14drm/i915: Sanitycheck device iomem on probeChris Wilson
As we setup the memory regions for the device, give each a quick test to verify that we can read and write to the full iomem range. This ensures that our physical addressing for the device's memory is correct, and some reassurance that the memory is functional. v2: wrapper for memtest [Chris] v3: Removed the unused ptr i915 [Chris] v4: used the %pa for the resource_size_t. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211209162620.5218-1-ramalingam.c@intel.com
2021-12-14drm/i915: Exclude reserved stolen from driver useChris Wilson
Remove the portion of stolen memory reserved for private use from driver access. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211208153404.27546-2-ramalingam.c@intel.com
2021-12-14Merge v5.16-rc5 into drm-nextDaniel Vetter
Thomas Zimmermann requested a fixes backmerge, specifically also for 96c5f82ef0a1 ("drm/vc4: fix error code in vc4_create_object()") Just a bunch of adjacent changes conflicts, even the big pile of them in vc4. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-12-14drm/i915: Fix implicit use of struct pci_devMark Brown
intel_device_info.h references struct pci_dev but does not ensure that the struct has been declared, causing build failures if something in other headers changes so that the implicit dependency it is relying on is no longer satisfied: In file included from drivers/gpu/drm/i915/intel_device_info.h:32, from drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h:11, from drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c:11: drivers/gpu/drm/i915/display/intel_display.h:643:39: error: 'struct pci_dev' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] 643 | bool intel_modeset_probe_defer(struct pci_dev *pdev); | ^~~~~~~ cc1: all warnings being treated as errors Add a declaration of the struct to fix this. Signed-off-by: Mark Brown <broonie@kernel.org> Fixes: 94b541f53db1 ("drm/i915: Add intel_modeset_probe_defer() helper") Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211213170753.3680209-1-broonie@kernel.org
2021-12-14drm/mediatek: Set the default value of rotation to DRM_MODE_ROTATE_0Mark Yacoub
At the reset hook, call __drm_atomic_helper_plane_reset which is called at the initialization of the plane and sets the default value of rotation on all planes to DRM_MODE_ROTATE_0 which is equal to 1. Tested on Jacuzzi (MTK). Resolves IGT@kms_properties@plane-properties-{legacy,atomic} Signed-off-by: Mark Yacoub <markyacoub@chromium.org> Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2021-12-13drm/msm/a6xx: Skip crashdumper state if GPU needs_hw_initRob Clark
I am seeing some crash logs which imply that we are trying to use crashdumper hw to read back GPU state when the GPU isn't initialized. This doesn't go well (for example, GPU could be in 32b address mode and ignoring the upper bits of buffer that it is trying to dump state to). I'm not *quite* sure how we get into this state in the first place, but lets not make a bad situation worse by triggering iova fault crashes. While we're at it, also add the information about whether the GPU is initialized to the devcore dump to make this easier to see in the logs (which makes the WARN_ON() redundant and even harmful because it fills up the small bit of dmesg we get with the crash report). Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20211209193118.1163248-1-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-12-13drm:amdgpu:remove unneeded variablechiminghao
return value form directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cm> Signed-off-by: chiminghao <chi.minghao@zte.com.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: split amdgpu/index for readabilityYann Dirson
This starts to make the formated index much more manageable to the reader. Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Yann Dirson <ydirson@free.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drivers/amd/pm: drop statement to print FW version for smu_v13Mario Limonciello
Update smu_v13 to match smu_v12 and smu_v11 behavior where this is fetched from debugfs rather than in kernel logs on every boot. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd/pm: fix reading SMU FW version from amdgpu_firmware_info on YCMario Limonciello
This value does not get cached into adev->pm.fw_version during startup for smu13 like it does for other SMU like smu12. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd/display: fix function scopesIsabella Basso
This turns previously global functions into static, thus removing compile-time warnings such as: warning: no previous prototype for 'get_highest_allowed_voltage_level' [-Wmissing-prototypes] 742 | unsigned int get_highest_allowed_voltage_level(uint32_t chip_family, uint32_t hw_internal_rev, uint32_t pci_revision_id) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ warning: no previous prototype for 'rv1_vbios_smu_send_msg_with_param' [-Wmissing-prototypes] 102 | int rv1_vbios_smu_send_msg_with_param(struct clk_mgr_internal *clk_mgr, unsigned int msg_id, unsigned int param) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Changes since v1: - As suggested by Rodrigo Siqueira: 1. Rewrite function signatures to make them more readable. 2. Get rid of unused functions in order to remove 'defined but not used' warnings. Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd/display: Reduce stack size for dml31 UseMinimumDCFCLKMichel Dänzer
Use the struct display_mode_lib pointer instead of passing lots of large arrays as parameters by value. Addresses this warning (resulting in failure to build a RHEL debug kernel with Werror enabled): ../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c: In function ‘UseMinimumDCFCLK’: ../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:7478:1: warning: the frame size of 2128 bytes is larger than 2048 bytes [-Wframe-larger-than=] NOTE: AFAICT this function previously had no observable effect, since it only modified parameters passed by value and doesn't return anything. Now it may modify some values in struct display_mode_lib passed in by reference. Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd/display: Reduce stack size for ↵Michel Dänzer
dml31_ModeSupportAndSystemConfigurationFull Move code using the Pipe struct to a new helper function. Works around[0] this warning (resulting in failure to build a RHEL debug kernel with Werror enabled): ../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c: In function ‘dml31_ModeSupportAndSystemConfigurationFull’: ../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:5740:1: warning: the frame size of 2144 bytes is larger than 2048 bytes [-Wframe-larger-than=] The culprit seems to be the Pipe struct, so pull the relevant block out into its own sub-function. (This is porting commit a62427ef9b55 ("drm/amd/display: Reduce stack size for dml21_ModeSupportAndSystemConfigurationFull") from dml31 to dml21) [0] AFAICT this doesn't actually reduce the total amount of stack which can be used, just moves some of it from dml31_ModeSupportAndSystemConfigurationFull to the new helper function, so the former happens to no longer exceed the limit for a single function. Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: re-format file header commentsIsabella Basso
Fix the warning below: warning: Cannot understand * \file amdgpu_ioc32.c on line 2 - I thought it was a doc line Changes since v1: - As suggested by Alexander Deucher: 1. Reduce diff to minimum as this DOC section doesn't provide much value. Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: remove unnecessary variablesIsabella Basso
This fixes the warnings below, and also drops the display_count variable, as it's unused. In function 'svm_range_map_to_gpu': warning: variable 'bo_va' set but not used [-Wunused-but-set-variable] 1172 | struct amdgpu_bo_va bo_va; | ^~~~~ ... In function 'dcn201_update_clocks': warning: variable 'enter_display_off' set but not used [-Wunused-but-set-variable] 132 | bool enter_display_off = false; | ^~~~~~~~~~~~~~~~~ Changes since v1: - As suggested by Rodrigo Siqueira: 1. Drop display_count variable. - As suggested by Felix Kuehling: 1. Remove block surrounding amdgpu_xgmi_same_hive. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: fix amdgpu_ras_mca_query_error_status scopeIsabella Basso
This commit fixes the compile-time warning below: warning: no previous prototype for ‘amdgpu_ras_mca_query_error_status’ [-Wmissing-prototypes] Changes since v1: - As suggested by Alexander Deucher: 1. Make function static instead of adding prototype. Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd: move variable to local scopeMario Limonciello
`edp_stream` is only used when backend is enabled on eDP, don't declare the variable outside that scope. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd: add some extra checks that is_dig_enabled is definedMario Limonciello
There are a few places that this isn't checked that could potentially be a NULL pointer access. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: Reduce SG bo memory usage for mGPUsPhilip Yang
For userptr bo, if adev is not in IOMMU isolation mode, RAM direct map to GPU, multiple GPUs use same system memory dma mapping address, they can share the original mem->bo in attachment to reduce dma address array memory usage. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: Detect if amdgpu in IOMMU direct map modePhilip Yang
If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode, set adev->ram_is_direct_mapped flag which will be used to optimize memory usage for multi GPU mappings. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: Add amdgpu and dc glossaryRodrigo Siqueira
In the DC driver, we have multiple acronyms that are not obvious most of the time; the same idea is valid for amdgpu. This commit introduces a DC and amdgpu glossary in order to make it easier to navigate through our driver. Changes since V3: - Yann: Add new acronyms to amdgpu glossary - Daniel: Add link between dc and amdgpu glossary Changes since V2: - Add MMHUB Changes since V1: - Yann: Divide glossary based on driver context. - Alex: Make terms more consistent and update CPLIB - Add new acronyms to the glossary Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: Add basic overview of DC pipelineRodrigo Siqueira
This commit describes how DCN works by providing high-level diagrams with an explanation of each component. In particular, it details the Global Sync signals. Change since V2: - Add a comment about MMHUBBUB. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: How to collect DTN logRodrigo Siqueira
Introduce how to collect DTN log from debugfs. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: Document pipe split visual confirmationRodrigo Siqueira
Display core provides a feature that makes it easy for users to debug Pipe Split. This commit introduces how to use such a debug option. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: Document amdgpu_dm_visual_confirm debugfs entryRodrigo Siqueira
Display core provides a feature that makes it easy for users to debug Multiple planes by enabling a visual notification at the bottom of each plane. This commit introduces how to use such a feature. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: Reorganize DC documentationRodrigo Siqueira
Display core documentation is not well organized, and it is hard to find information due to the lack of sections. This commit reorganizes the documentation layout, and it is preparation work for future changes. Changes since V1: - Christian: Group amdgpu documentation together. - Daniel: Drop redundant amdgpu prefix. - Jani: Create index pages. - Yann: Mirror display folder in the documentation. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: add support for SMU debug optionLang Yu
SMU firmware expects the driver maintains error context and doesn't interact with SMU any more when SMU errors occurred. That will aid in debugging SMU firmware issues. Add SMU debug option support for this request, it can be enabled or disabled via amdgpu_smu_debug debugfs file. Use a 32-bit mask to indicate corresponding debug modes. Currently, only one mode(HALT_ON_ERROR) is supported. When enabled, it brings hardware to a kind of halt state so that no one can touch it any more in the envent of SMU errors. The dirver interacts with SMU via sending messages. And threre are three ways to sending messages to SMU in current implementation. Handle them respectively as following: 1, smu_cmn_send_smc_msg_with_param() for normal timeout cases Halt on any error. 2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response() for longer timeout cases Halt on errors apart from ETIME. Otherwise this way won't work. Let the user handle ETIME error in such a case. 3, smu_cmn_send_msg_without_waiting() for no waiting cases Halt on errors apart from ETIME. Otherwise second way won't work. == Command Guide == 1, enable HALT_ON_ERROR mode # echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug 2, disable HALT_ON_ERROR mode # echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug v5: - Use bit mask to allow more debug features.(Evan) - Use WRAN() instead of BUG().(Evan) v4: - Set to halt state instead of a simple hang.(Christian) v3: - Use debugfs_create_bool().(Christian) - Put variable into smu_context struct. - Don't resend command when timeout. v2: - Resend command when timeout.(Lijo) - Use debugfs file instead of module parameter. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: introduce a kind of halt state for amdgpu deviceLang Yu
It is useful to maintain error context when debugging SW/FW issues. Introduce amdgpu_device_halt() for this purpose. It will bring hardware to a kind of halt state, so that no one can touch it any more. Compare to a simple hang, the system will keep stable at least for SSH access. Then it should be trivial to inspect the hardware state and see what's going on. v2: - Set adev->no_hw_access earlier to avoid potential crashes.(Christian) Suggested-by: Christian Koenig <christian.koenig@amd.com> Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Christian Koenig <christian.koenig@amd.co> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: check df_funcs and its callback pointersHawking Zhang
in case they are not avaiable in early phase Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: don't override default ECO_BITs settingHawking Zhang
Leave this bit as hardware default setting Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORELe Ma
should count on GC IP base address Signed-off-by: Le Ma <le.ma@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: read and authenticate ip discovery binaryHawking Zhang
read and authenticate ip discovery binary getting from vram first, if it is not valid, read and authenticate the one getting from file Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: add helper to verify ip discovery binary signatureHawking Zhang
To be used to check ip discovery binary signature Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: rename discovery_read_binary helperHawking Zhang
add _from_vram in the funciton name to diffrentiate the one used to read from file Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: add helper to load ip_discovery binary from fileHawking Zhang
To be used when ip_discovery binary is not carried by vbios Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: fix incorrect VCN revision in SRIOVLeslie Shi
Guest OS will setup VCN instance 1 which is disabled as an enabled instance and execute initialization work on it, but this causes VCN ib ring test failure on the disabled VCN instance during modprobe: amdgpu 0000:00:08.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 5 on hub 1 amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_dec_0 (-110). amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc_0.0 (-110). [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110). v2: drop amdgpu_discovery_get_vcn_version and rename sriov_config to vcn_config v3: modify VCN's revision in SR-IOV and bare-metal Fixes: baf3f8f3740625 ("drm/amdgpu: handle SRIOV VCN revision parsing") Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: add modifiers in amdgpu_vkms_plane_init()Leslie Shi
Fix following warning in SRIOV during modprobe: amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu] Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: disable default navi2x co-op kernel supportJonathan Kim
This patch reverts the following: commit 48733b224fa7ba ("drm/amdkfd: add Navi2x to GWS init conditions") Disable GWS usage in default settings for now due to FW bugs. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdkfd: add Navi2x to GWS init conditionsGraham Sider
Initalize GWS on Navi2x with mec2_fw_version >= 0x42. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-and-tested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>