summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-03-17drm/i915: Define more pipe timestamp registersVille Syrjälä
Add definitions for various pipe timestamp registers: - frame timestamp (last start of vblank) (g4x+), already had this defined - flip timestamp (when SURF was last written) (g4x+) - flipdone timestamp (when last flipdone was signalled) (tgl+) Note that on pre-tgl the flip related timestamps are only updated for primary plane flips, but on tgl+ we can select which plane updates them (via PIPE_MISC2). Let's define those related bits as well. Curiously VLV/CHV do not have the frame/flip timestamp registers, despite all the other related registers being inherited from g4x. This means we can get rid of the pipe_offsets[] usage for these, and thus the implicit dev_priv is gone as well. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-4-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-03-17drm/i915: s/PIPEMISC/PIPE_MISC/Ville Syrjälä
This PIPEMISC vs. PIPE_MISC inconsitency is ugly. Unify the naming (PIPE_MISC is also what bspec has always called it). Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-3-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-03-17drm/i915: Stop using pipe_offsets[] for PIPE_MISC*Ville Syrjälä
The PIPE_MISC registers don't exist on pre-bdw hardware, so there is no point in using pipe_offsets[] for them. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-2-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-03-17drm/ttm/ttm_bo: Provide a missing 'bulk' description and correct misnaming ↵Lee Jones
of 'placement' 'bulk' description taken from another in the same file. Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/ttm/ttm_bo.c:98: warning: Function parameter or member 'bulk' not described in 'ttm_bo_set_bulk_move' drivers/gpu/drm/ttm/ttm_bo.c:768: warning: Function parameter or member 'placement' not described in 'ttm_bo_mem_space' drivers/gpu/drm/ttm/ttm_bo.c:768: warning: Excess function parameter 'proposed_placement' description in 'ttm_bo_mem_space' Signed-off-by: Lee Jones <lee@kernel.org> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230317081718.2650744-6-lee@kernel.org
2023-03-16drm/format-helper: Make "destination_pitch" test usable for monoArthur Grillo
This test case uses an arbitrary pitch size, different of the default one, to test if the conversions methods obey. Change the "destination_pitch" colors to change the monochrome expected result from being just zeros, as this makes the arbitrary pitch use unusable. Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230311125141.564801-3-arthurgrillo@riseup.net
2023-03-16drm/format-helper: Add Kunit tests for drm_fb_xrgb8888_to_mono()Arthur Grillo
Extend the existing test cases to test the conversion from XRGB8888 to monochromatic. Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230311125141.564801-2-arthurgrillo@riseup.net
2023-03-16drm/i915/display: ignore link training failures in CIVinod Govindapillai
If the ignore long HPD flag is set, ignore the link training failures as well. Because of spurious HPDs, some unexpected link training failures are happening while executing IGT test cases. Ignore the link training failures for the time being if the long HPDs are also ignored in the environments like CI. Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230215083832.287519-3-vinod.govindapillai@intel.com
2023-03-16drm/i915/display: ignore long HPDs based on a flagVinod Govindapillai
Some panels generate long HPD events even while connected to the port. This cause some unexpected CI execution issues. A new flag is added to track if such spurious long HPDs can be ignored and are not processed further if the flag is set. Debugfs entry is added to control the ignore long hpd flag. v2: Address patch styling comments (Jani Nikula) v3: Ignoring the HPD moved to hotplug handler and now applies to all types of outputs (Imre Deak) v4: use debugfs_create_bool and squash patches (Jani Nikula) Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230215083832.287519-2-vinod.govindapillai@intel.com
2023-03-16drm/nouveau/nvfw/acr: set wpr_generic_header_dump storage-class-specifier to ↵Tom Rix
static gcc with W=1 reports drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: error: no previous prototype for ‘wpr_generic_header_dump’ [-Werror=missing-prototypes] 49 | wpr_generic_header_dump(struct nvkm_subdev *subdev, | ^~~~~~~~~~~~~~~~~~~~~~~ wpr_generic_header_dump is only used in acr.c, so it should be static Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <kherbst@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230302124819.686469-1-trix@redhat.com
2023-03-16drm/nouveau/fifo: set nvkm_engn_cgrp_get storage-class-specifier to staticTom Rix
smatch reports drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:33:18: warning: symbol 'nvkm_engn_cgrp_get' was not declared. Should it be static? nvkm_engn_cgrp_get is only used in runl.c, so it should be static Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <kherbst@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230228221533.3240520-1-trix@redhat.com
2023-03-16drm/nouveau/fifo: set gf100_fifo_nonstall_block_dump storage-class-specifier ↵Tom Rix
to static gcc with W=1 reports drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: error: no previous prototype for ‘gf100_fifo_nonstall_block’ [-Werror=missing-prototypes] 451 | gf100_fifo_nonstall_block(struct nvkm_event *event, int type, int index) | ^~~~~~~~~~~~~~~~~~~~~~~~~ gf100_fifo_nonstall_block is only used in gf100.c, so it should be static Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <kherbst@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230303132731.1919329-1-trix@redhat.com
2023-03-16drm/i915/opregion: Fix CONFIG_ACPI=n builds adding missing ↵Imre Deak
intel_opregion_cleanup() prototype Add the missing intel_opregion_cleanup() prototype fixing CONFIG_ACPI=n builds. Fixes: 3e226e4a2180 ("drm/i915/opregion: Cleanup opregion after errors during driver loading") Cc: Jani Nikula <jani.nikula@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202303141610.6L1VO7Gw-lkp@intel.com/ Signed-off-by: Imre Deak <imre.deak@intel.com>
2023-03-15drm/amdgpu: Don't resume IOMMU after incomplete initFelix Kuehling
Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other kgd2kfd calls. This should fix IOMMU errors on resume from suspend when KFD IOMMU initialization failed. Reported-by: Matt Fagnani <matt.fagnani@bell.net> Link: https://lore.kernel.org/r/4a3b225c-2ffd-e758-4de1-447375e34cad@bell.net/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=217170 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2454 Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info> Cc: stable@vger.kernel.org Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Matt Fagnani <matt.fagnani@bell.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd: fix compilation issue with legacy gccbobzhou
This patch is used to fix following compilation issue with legacy gcc error: ‘for’ loop initial declarations are only allowed in C99 mode Signed-off-by: bobzhou <bob.zhou@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdgpu: Retire pcie_gen3_enable functionHawking Zhang
Not needed since from vi. drop the function so we don't duplicate code when introduce new asics. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdgpu: Move to common helper to query soc rev_idHawking Zhang
Replace soc15, nv, soc21 get_rev_id callback with common helper so we don't need to duplicate code when introduce new asics. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdgpu: Move to common indirect reg access helperHawking Zhang
Replace soc15, nv, soc21 specific callbacks with common one. so we don't need to duplicate code when introduce new asics. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdgpu: drop ras check at asic level for new blocksHawking Zhang
amdgpu_ras_register_ras_block should always be invoked by ras_sw_init, where driver needs to check ras caps at ip level, instead of asic level. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdgpu: Rework pcie_bif ras sw_initHawking Zhang
pcie_bif ras blocks needs to be initialized as early as possible to handle fatal error detected in hw_init phase. also align the pcie_bif ras sw_init with other ras blocks Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdgpu: Rework xgmi_wafl_pcs ras sw_initHawking Zhang
To align with other IP blocks. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdgpu: Rework mca ras sw_initHawking Zhang
To align with other IP blocks Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdkfd: Fixed kfd_process cleanup on module exit.David Belanger
Handle case when module is unloaded (kfd_exit) before a process space (mm_struct) is released. v2: Fixed potential race conditions by removing all kfd_process from the process table first, then working on releasing the resources. v3: Fixed loop element access / synchronization. Fixed extra empty lines. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: 3.2.227Aric Cyr
This version brings along the following: - FW Release 0.0.158.0 - Fixes to HDCP, DP MST and more - Improvements on USB4 links and more - Code re-architecture on link.h Reviewed-by: Aric Cyr <aric.cyr@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: [FW Promotion] Release 0.0.158.0Anthony Koo
[Why & How] Add boot control bit to control dispclk and dppclk deep sleep Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Anthony Koo <Anthony.Koo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: fix assert conditionSamson Tam
[Why & How] Reversed assert condition when checking that phy_pix_clk[] is not 0 Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: Clearly states if long or short HPD event in dmesg logsStylon Wang
[Why] The log "DMUB HPD callback" is crucial to identify when DP tunneling is been established and driver is notified of this event from DMUB. Same log is shared for long and short hotplug event and we need to check trailing DC debug log to distinguish between them two, making debugging on DPIA related issues a bit more troublesome. [How] Clearly states in dmesg logs whether this is a long or short hotplug event. Reviewed-by: Hamza Mahfooz <Hamza.Mahfooz@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Stylon Wang <stylon.wang@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: Make DCN32 functions available to future DCNsWesley Chalmers
[Why & How] Make DCN32 functions available for more DCNs. Reviewed-by: Chris Park <Chris.Park@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdgpu: Init MMVM_CONTEXTS_DISABLE in gmc11 golden setting under SRIOVYifan Zha
[Why] If disable the mmhub vm contexts(set MMVM_CONTEXTS_DISABLE to 0xffff), driver loading failed on vf due to fence fallback timer expired on all rings. FLR cannot reset MMVM_CONTEXTS_DISABLE. So this vf can not be recovered anymore unless trigger a whole gpu reset. [How] Under SRIOV, init MMVM_CONTEXTS_DISABLE in gmc11 golden register setting. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Reviewed-by: Horace Chen <Horace.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: reallocate DET for dual displays with high pixel rate ratioSamson Tam
[Why] For dual displays where pixel rate is much higher on one display, we may get underflow when DET is evenly allocated. [How] Allocate less DET segments for the lower pixel rate display and more DET segments for the higher pixel rate display Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: disconnect MPCC only on OTG changeAyush Gupta
[Why] Framedrops are observed while playing Vp9 and Av1 10 bit video on 8k resolution using VSR while playback controls are disappeared/appeared [How] Now ODM 2 to 1 is disabled for 5k or greater resolutions on VSR. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Ayush Gupta <ayugupta@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: Fix DP MST sinks removal issueCruise Hung
[Why] In USB4 DP tunneling, it's possible to have this scenario that the path becomes unavailable and CM tears down the path a little bit late. So, in this case, the HPD is high but fails to read any DPCD register. That causes the link connection type to be set to sst. And not all sinks are removed behind the MST branch. [How] Restore the link connection type if it fails to read DPCD register. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Cruise Hung <Cruise.Hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amd/display: hpd rx irq not working with eDP interfaceRobin Chen
[Why] This is the fix for the defect of commit ab144f0b4ad6 ("drm/amd/display: Allow individual control of eDP hotplug support"). [How] To revise the default eDP hotplug setting and use the enum to git rid of the magic number for different options. Fixes: ab144f0b4ad6 ("drm/amd/display: Allow individual control of eDP hotplug support") Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Robin Chen <robin.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/i915: Don't send idle pattern after DP2.0 link trainingVille Syrjälä
Bspec calls us to select pattern 2 after link training for DP 2.0. Let's do that... by doing nothing because we will be transmitting pattern 2 at the end of the link training already. Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230308212627.7601-2-ville.syrjala@linux.intel.com
2023-03-15drm/i915: Don't switch to TPS1 when disabling DP_TP_CTLVille Syrjälä
AFAICS Bspec has never asked us to switch to TPS1 when *disabling* DP_TP_CTL. Let's stop doing that in case it confuses something. We do have to switch before we *enable* DP_TP_CTL, but that is already being handled correctly. v2: Do the same for FDI v3: Rebase Reviewed-by: Imre Deak <imre.deak@intel.com> #v1 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230308212627.7601-1-ville.syrjala@linux.intel.com
2023-03-15drm/vmwgfx: Fix src/dst_pitch confusionZack Rusin
The src/dst_pitch got mixed up during the rework of the function, make sure the offset's refer to the correct one. Spotted by clang: Clang warns (or errors with CONFIG_WERROR): drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c:509:29: error: variable 'dst_pitch' is uninitialized when used here [-Werror,-Wuninitialized] src_offset = ddirty->top * dst_pitch + ddirty->left * stdu->cpp; ^~~~~~~~~ drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c:492:26: note: initialize the variable 'dst_pitch' to silence this warning s32 src_pitch, dst_pitch; ^ = 0 1 error generated. Signed-off-by: Zack Rusin <zackr@vmware.com> Reported-by: Nathan Chancellor <nathan@kernel.org> Reported-by: Dave Airlie <airlied@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1811 Fixes: 39985eea5a6d ("drm/vmwgfx: Abstract placement selection") Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Martin Krastev <krastevm@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230314211445.1363828-1-zack@kde.org
2023-03-15drm: Track clients by tgid and not tidTvrtko Ursulin
Thread group id (aka pid from userspace point of view) is a more interesting thing to show as an owner of a DRM fd, so track and show that instead of the thread id. In the next patch we will make the owner updated post file descriptor handover, which will also be tgid based to avoid ping-pong when multiple threads access the fd. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230314141904.1210824-2-tvrtko.ursulin@linux.intel.com
2023-03-15accel/habanalabs: Drop redundant pci_enable_pcie_error_reporting()Bjorn Helgaas
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since commit f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration, so the driver doesn't need to do it itself. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this only controls ERR_* Messages from the device. An ERR_* Message may cause the Root Port to generate an interrupt, depending on the AER Root Error Command register managed by the AER service driver. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15accel/habanalabs: postpone mem_mgr IDR destruction to hpriv_release()Tomer Tayar
The memory manager IDR is currently destroyed when user releases the file descriptor. However, at this point the user context might be still held, and memory buffers might be still in use. Later on, calls to release those buffers will fail due to not finding their handles in the IDR, leading to a memory leak. To avoid this leak, split the IDR destruction from the memory manager fini, and postpone it to hpriv_release() when there is no user context and no buffers are used. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15accel/habanalabs: move soft-reset wait to soft-reset executeDafna Hirschfeld
We plan to do soft-reset either by mmio or by using cpucp packet depending on the FW version. We don't want to check FW version in two different places for that (execute soft-reset and wait to soft-reset) so move the waiting to gaudi2_execute_soft_reset. This also makes sense because the cpucp also does the waiting. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15accel/habanalabs: add uapi to stall/resume engineKoby Elbaz
The user might want to stall/resume engines to perform power testing for various scenarios. Because our current HL_CS_FLAGS_ENGINE_CORE_COMMAND command only handles the engines' cores, we need to add another opcode for handling entire engine and not just its core. The user supplies an array, where each entry holds the engine's ID and the command to send to the engine. The size of the array is limited by the number of engines in the ASIC (only Gaudi2 is currently supported). Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15accel/habanalabs: use scnprintf() in print_device_in_use_info()Tomer Tayar
compose_device_in_use_info() was added to handle the snprintf() return value in a single place. However, the buffer size in print_device_in_use_info() is set such that it would be enough for the max possible print, so compose_device_in_use_info() is not really needed. Moreover, scnprintf() can be used instead of snprintf(), to save the check if the return value larger than the given size. Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
2023-03-15accel/habanalabs: unify err log of hw-fini failure in dirty stateDafna Hirschfeld
print more informative message when failing in dirty state Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
2023-03-15accel/habanalabs: use a mutex rather than a spinlockKoby Elbaz
There are two reasons why mutex is better here: 1. There's a critical section relatively long, where in certain scenarios (e.g., multiple VM allocations) taking a spinlock might cause noticeable performance degradation. 2. It will remove the incorrect usage of mutex under spin_lock (where preemption is disabled). Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15accel/habanalabs: allow getting HL_INFO_DRAM_USAGE during soft-resetDafna Hirschfeld
We can allow userspace to query the dram usage during soft-reset. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15accel/habanalabs: fix register address on PDMA/EDMA idle checkKoby Elbaz
The PDMA/EDMA is_idle routines didn't check the correct CORE register in order to get the accurate idle state. Moreover, it's better to make the is_idle routine more robust by adding additional checks (IS_HALTED) before announcing that the core is idle. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15accel/habanalabs: remove a useless is_idle TPC flagKoby Elbaz
Is appears that the flag - DCORE0_TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK, has no actual use when it comes to querying TPC idleness, since this flag's corresponding bit turns-off after stalling the engine, and turns back on after resuming it. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15accel/habanalabs: fix few misspelled words in the codefarah kassabri
Run spell checker on the code and fix accordingly. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15accel/habanalabs: verify return code after scrubbing ARCs DCCMsKoby Elbaz
In case the KDMA fails scrubbing the DCCMs (following a soft-reset upon device release), the driver will only print failure until reset flow ends, rather than escalating it into a hard-reset. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15accel/habanalabs: use notifications and graceful reset for decoderTomer Tayar
Add notifications to user in case of decoder abnormal interrupts, and use the graceful reset mechanism if reset is required. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15accel/habanalabs: assert return value of hw_finiDafna Hirschfeld
Since hw_fini return error code for failure indication, we should check its return value. Currently it might only fail upon soft-reset from hl_device_reset. Later patch will add hw_fini failure in case of polling timeout in hard-reset. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>