Age | Commit message (Collapse) | Author |
|
EnhancedPrefetchScheduleAccelerationFinal DCN35"
This reverts
commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35")
[why & how]
The offending commit exposes a hang with lid close/open behavior.
Both issues seem to be related to ODM 2:1 mode switching, so there
is another issue generic to that sequence that needs to be
investigated.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7)
Cc: stable@vger.kernel.org
|
|
The following 3 commits landed in parallel:
commit d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting")
commit 7a1613e47e65 ("drm/amdgpu/smu13: always apply the powersave optimization")
commit 7c210ca5a2d7 ("drm/amdgpu: handle default profile on on devices without fullscreen 3D")
While everything is set correctly, this caused the profile to be
reported incorrectly because both the powersave and fullscreen3d bits
were set in the mask and when the driver prints the profile, it looks
for the first bit set.
Fixes: d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting")
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Flag KFD support for per-queue reset on GFX9 devices.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
- skip to print CE ACA log.
- optimize ACA log print for MCA.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Separated xgmi ta is required for specific APU, and driver needs
parse the ta binary properly with aux xgmi ta packed.
v2: make the check function more generic (Lijo)
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
To check the status of S3 suspend completion,
use the PM core pm_suspend_global_flags bit(1)
to detect S3 abort events. Therefore, clean up
the AMDGPU driver's private flag suspend_complete.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In the normal S3 entry, the TOS cycle counter is not
reset during BIOS execution the _S3 method, so it doesn't
determine whether the _S3 method is executed exactly.
Howerver, the PM core performs the S3 suspend will set the
PM_SUSPEND_FLAG_FW_RESUME bit if all the devices suspend
successfully. Therefore, drivers can check the
pm_suspend_global_flags bit(1) to detect the S3 suspend
abort event.
Fixes: 6704dbf71928 ("drm/amdgpu: update suspend status for aborting from deeper suspend")
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:
[ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[ 33.861816] Tainted: [W]=WARN
[ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[ 33.861822] Call Trace:
[ 33.861826] <TASK>
[ 33.861829] dump_stack_lvl+0x66/0x90
[ 33.861838] print_report+0xce/0x620
[ 33.861853] kasan_report+0xda/0x110
[ 33.862794] kasan_check_range+0xfd/0x1a0
[ 33.862799] __asan_memset+0x23/0x40
[ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.867135] dev_attr_show+0x43/0xc0
[ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0
[ 33.867155] seq_read_iter+0x3f8/0x1140
[ 33.867173] vfs_read+0x76c/0xc50
[ 33.867198] ksys_read+0xfb/0x1d0
[ 33.867214] do_syscall_64+0x90/0x160
...
[ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[ 33.867358] kasan_save_stack+0x33/0x50
[ 33.867364] kasan_save_track+0x17/0x60
[ 33.867367] __kasan_kmalloc+0x87/0x90
[ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu]
[ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[ 33.869608] local_pci_probe+0xda/0x180
[ 33.869614] pci_device_probe+0x43f/0x6b0
Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.
Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.
The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.
v2:
* Drop impossible v3_0 case. (Mario)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Wenyou Yang <WenYou.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This version brings along following fixes:
- Fix polling DSC registers during S0i3
- Fix idle optimizations entry log
- Change MPC Tree visual confirm colours
- Fix underflow when playing 8K video in full screen mode
- Optimize power up sequence for specific OLED
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add some scruct for secure display.
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Previously dscl_prog_data stored pointer to sharpness 1dlut table.
SPL had four pre-generated tables, one for each setup. This allowed
us to minimize number of times we had to recalculate table when
switching between setups.
However, with dual display, this becomes an issue because for a given
setup, we could have a different per app sharpness value than the
global sharpness value. So the pre-generated table will change
but both displays may point to the same table and one of them
will have the wrong sharpness setting.
[How]
Store the sharpness 1dlut table in dscl_prog_data. This ensures
that each display can have its own sharpness setting.
Reviewed-by: Ilya Bakoulin <ilya.bakoulin@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[why & how]
DSC may be power gated when coming out of S0i3, so avoid polling
DSC registers since it will fail anyways. Only read if it is known
that DSC is in use.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why & How]
Whether we really enter idle optimizations are decided within DC.
Printing into dmesg before calling the DC API gives an incorrect
indication that we are entering idle optimization in cases where its
disabled manually.
To fix this, remove the print in DM and add them in DC
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
MPC background colours that use fractional components look different if
MPC OGAM is in use vs in bypass mode. The current red and orange colours
look very similar when OGAM is in bypass, so the colours need to change
to be consistently very easy to tell apart.
[How]
Use colours that only have 0 or MAX values in each component
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHAT & HOW]
The variable "ips_supported" is redundant and we can return from
dcn35_smu_get_ips_supported directly.
This fixes 1 UNUSED_VALUE issue reported by Coverity.
Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHAT & HOW]
misc0, temp and split_pipe are assigned but immediately re-assigned
to other values. The early assignments are useless and are removed.
Unused variables are removed as well.
This fixes 5 UNUSED_VALUE issues reported by Coverity.
Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
If max_downscale_src_width check fails, we exit early from
spl_calculate_scaler_params but dscl_prog_data is not fully
populated. If viewport is left at 0, it can cause crash in dml.
[How]
Call spl_set_dscl_prog_data before we exit early from
spl_calculate_scaler_params to populate dscl_prog_data
Populate taps in spl_get_optimal_number_of_taps
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why&How]
Flickering observed while playing 8k HEVC-10 bit video in full screen
mode with black border. We didn't support this case for subvp.
Make change to the existing check to disable subvp for this corner case.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Leo Ma <hanghong.ma@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
For Header related changes for core
[How]
Refactoring if and endif statements to enable DC_LOGGER
Reviewed-by: Mounika Adhuri <mounika.adhuri@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Lohita Mudimela <lohita.mudimela@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix DP Compliance test 4.2.1.3, 4.2.2.8, 4.3.1.12, 4.3.1.13
when IPS enabled.
Original HPD detection interval is set to 5s which violates DP
compliance.
Reduce the interval parameter, such that link training can be
finished within 5 seconds.
Fixes: afca033f10d3 ("drm/amd/display: Add periodic detection for IPS")
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
EnhancedPrefetchScheduleAccelerationFinal DCN35"
This reverts
commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35")
[why & how]
The offending commit exposes a hang with lid close/open behavior.
Both issues seem to be related to ODM 2:1 mode switching, so there
is another issue generic to that sequence that needs to be
investigated.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY&HOW]
Adds support for P-State stall timeout detection in DCHUBBUB.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Spread on DPREFCLK by 0.3 percent can have a negative effect on sink
when PHY SSC is also spread by 0.3 percent
[How]
Add boot option for DMU to lower PHY SSC
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Hansen Dsouza <Hansen.Dsouza@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[why & how]
OLED power up sequence takes an extra 150ms via hardcoded delay,
but there is a strict requirement on DisplayOn resume time.
For customer panel, remove these delays to meet target until a
cleaner solution is can be put in place.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Volatile only prevents the compiler from re-ordering reads and writes.
Since we always only modify the ring buffer from one CPU thread and have
an explicit barrier before signaling the HW this should have no effect at
all and just prevents compiler optimisations.
While at it drop the local variables as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Replace the last few remaining instances of string enable(d)/disable(d)
choices with the linux string choice helpers to avoid further
cocci warnings.
Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023054655.4017489-1-sai.teja.pottumuttu@intel.com
|
|
There has been an update to the BSpec in which we need to set
tx_misc=0x5 field for C20 TX Context programming for HDMI TMDS for
Xe2_LPD and newer. That field is mapped to the bits 7:0 of
SRAM_GENERIC_<A/B>_TX_CNTX_CFG_1, which in turn translates to tx[1] of
our state struct. Update the algorithm to reflect this change.
v2:
- Fix Bspec reference (Sai Teja)
- Use struct intel_display instead of drm_i915_private. (Jani)
- Use the correct bit width for C20_PHY_TX_MISC_MASK. (Jani)
Bspec: 74491
Cc: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> #v1
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023153352.144146-3-gustavo.sousa@intel.com
|
|
The variable crtc_state already contains everything that
intel_c20_compute_hdmi_tmds_pll() needs. Simplify the function's
signature by passing that struct instead of separate variables.
Suggested-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023153352.144146-2-gustavo.sousa@intel.com
|
|
blk_rq_map_user_bvec currently only has ad-hoc checks for queue limits,
and the last fix to it enabled valid NVMe I/O to pass, but also allowed
invalid one for drivers that set a max_segment_size or seg_boundary
limit.
Fix it once for all by using the bio_split_rw_at helper from the I/O
path that indicates if and where a bio would be have to be split to
adhere to the queue limits, and it returns a positive value, turn that
into -EREMOTEIO to retry using the copy path.
Fixes: 2ff949441802 ("block: fix sanity checks in blk_rq_map_user_bvec")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241028090840.446180-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in the path
of dma-fences, and dma-fences are in the path of reclaim. Mark scheduler
work queue with WQ_MEM_RECLAIM to ensure forward progress during
reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward
progress during reclaim.
v2:
- Fixes tags (Philipp)
- Reword commit message (Philipp)
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Philipp Stanner <pstanner@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 34f50cc6441b ("drm/sched: Use drm sched lockdep map for submit_wq")
Fixes: a6149f039369 ("drm/sched: Convert drm scheduler to use a work queue rather than kthread")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Philipp Stanner <pstanner@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023235917.1836428-1-matthew.brost@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Fix the new "LPE0F28" code path using the uninitialized ctx variable
to log an error.
Fixes: 6668610b4d8c ("ASoC: Intel: sst: Support LPE0F28 ACPI HID")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410261106.EBx49ssy-lkp@intel.com/
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patch.msgid.link/20241026143615.171821-1-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Clang-19 and above sometimes end up with multiple copies of the large
a6xx_hfi_msg_bw_table structure on the stack. The problem is that
a6xx_hfi_send_bw_table() calls a number of device specific functions to
fill the structure, but these create another copy of the structure on
the stack which gets copied to the first.
If the functions get inlined, that busts the warning limit:
drivers/gpu/drm/msm/adreno/a6xx_hfi.c:631:12: error: stack frame size (1032) exceeds limit (1024) in 'a6xx_hfi_send_bw_table' [-Werror,-Wframe-larger-than]
Fix this by kmalloc-ating struct a6xx_hfi_msg_bw_table instead of using
the stack. Also, use this opportunity to skip re-initializing this table
to optimize gpu wake up latency.
Cc: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/621814/
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
|
As there are duplicated kernel headers in tools/include libc can pick
up the wrong definitions. This was causing the wrong system call for
capget in perf.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: e25ebda78e230283 ("perf cap: Tidy up and improve capability testing")
Closes: https://lore.kernel.org/lkml/cc7d6bdf-1aeb-4179-9029-4baf50b59342@intel.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241026055448.312247-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The etnaviv_cmdbuf.c doesn't reference any functions or data members
defined in drm_mm.h, remove unneeded headers may reduce kernel compile
times.
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
Because it is not get used, drop it.
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
The shader L1 cache is a writeback cache for shader loads/stores
and thus must be flushed before any BOs backing the shader buffers
are potentially freed.
Cc: stable@vger.kernel.org
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
Since the kernel ringbuffers are allocated from a larger suballocated
area, same as the user commandbufs, they don't need to be CPU page
sized. Allocate 4KB for the kernel ring buffers, as we never use more
than that.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
Etnaviv assumes that GPU page size is 4KiB, however, GPUVA ranges collision
when using softpin capable GPUs on a non 4KiB CPU page size configuration.
The root cause is that kernel side BO takes up bigger address space than
userspace expect, the size of backing memory of GEM buffer objects are
required to align to the CPU PAGE_SIZE. Therefore, results in userspace
allocated GPUVA range fails to be inserted to the specified hole exactly.
To solve this problem, record the GPU visiable size of a BO firstly, then
map and unmap the SG entry strictly with respect to the total GPUVA size.
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
The GPU visible size of a GEM BO is not necessarily PAGE_SIZE aligned,
which happens when CPU page size is not equal to GPU page size. Extra
precious resources such as GPU page tables and GPU TLBs may being paid
because of this but never get used.
Track the size of GPU visible part of GEM BO separately, ensure no
GPUVA range wasting by aligning that size to GPU page size.
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
To pick up the changes in:
7f053812dab3946c ("random: vDSO: minimize and simplify header includes")
That required adding a copy of include/vdso/unaligned.h and its checking
in tools/perf/check-headers.h.
Addressing this perf tools build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/linux/unaligned.h include/linux/unaligned.h
Please see tools/include/uapi/README for further details.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zx-uHvAbPAESofEN@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Large draws can make the GPU appear to be stuck to the current hangcheck
logic as the FE address will not move until the draw is finished. However,
the FE has a debug register, which records the current primitive ID within
a draw. Using this debug register we can extend the timeout as long as the
draw progresses.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
Update state_hi.xml.h header from etna_viv commit
8f43a34fd9cd ("rndb: document FE current primitve debug reg")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
A later change will use the FE debug registers to improve GPU
progress monitoring. Instead of having to keep track of the
usage state of the debug registers and lock access to the
VIVS_HI_CLOCK_CONTROL register, statically enable debug
register access during GPU init.
The Vivante downstream driver seems to do the same thing since
a while, so it should be okay to keep access enabled. (See
gckHARDWARE_InitializeHardware in 6.4.11 downstream driver).
Many debug registers contain bogus data if clock gating is
enabled, so even if they are always accessible performance
profiling still needs to manage some prerequisites.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
To get the changes in:
924725707d80bc25 ("arm64: cputype: Add Neoverse-N3 definitions")
That makes this perf source code to be rebuilt:
CC /tmp/build/perf-tools/util/arm-spe.o
The changes in the above patch add MIDR_NEOVERSE_N3, that probably need
changes in arm-spe.c, so probably we need to add it to that array? Or
maybe we need to leave this for later when this is all tested on those
machines?
static const struct midr_range neoverse_spe[] = {
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
{},
};
Mark Rutland recommended about arm-spe.c in a previous update to this
file:
"I would not touch this for now -- someone would have to go audit the
TRMs to check that those other cores have the same encoding, and I think
it'd be better to do that as a follow-up."
That addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zx-dffKdGsgkhG96@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The perf counter read functions don't just read registers, but they
also mutate state to direct the reads towards the correct pipe and
engine. Assert that the GPU mutex is held at this point, so that
those state changes don't interfere with others.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
The perfmon sampling mutates shared GPU state (e.g. VIVS_HI_CLOCK_CONTROL
to select the pipe for the perf counter reads). To avoid clashing with
other functions mutating the same state (e.g. etnaviv_gpu_update_clock)
the perfmon sampling needs to hold the GPU lock.
Fixes: 68dc0b295dcb ("drm/etnaviv: use 'sync points' for performance monitor requests")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
To pick up the changes in this cset:
947697c6f0f75f98 ("uapi: Define GENMASK_U128")
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/bits.h include/uapi/linux/bits.h
diff -u tools/include/linux/bits.h include/linux/bits.h
Please see tools/include/uapi/README for further details.
Acked-by: Yury Norov <yury.norov@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zx-ZVH7bHqtFn8Dv@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In the etnaviv_pdev_probe() and etnaviv_gpu_platform_probe() function, the
value of '&pdev->dev' has been cached to the local auto variable 'dev'.
But some callers use 'dev', while the rest use '&pdev->dev'. To keep it
consistent, use 'dev' uniformly.
Tested-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
Currently, the calling of mutex_destroy() is ignored on error handling
code path. It is safe for now, since mutex_destroy() actually does
nothing in non-debug builds. But the mutex_destroy() is used to mark
the mutex uninitialized on debug builds, and any subsequent use of the
mutex is forbidden.
It also could lead to problems if mutex_destroy() gets extended, add
missing mutex_destroy() to eliminate potential concerns.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
Currently, the etnaviv_gem_submit.c isn't call any runtime power management
functions. So drop this unused header, we can include it back when it
really get used though.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|