summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-13drm/amd: add some extra checks that is_dig_enabled is definedMario Limonciello
There are a few places that this isn't checked that could potentially be a NULL pointer access. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: Reduce SG bo memory usage for mGPUsPhilip Yang
For userptr bo, if adev is not in IOMMU isolation mode, RAM direct map to GPU, multiple GPUs use same system memory dma mapping address, they can share the original mem->bo in attachment to reduce dma address array memory usage. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: Detect if amdgpu in IOMMU direct map modePhilip Yang
If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode, set adev->ram_is_direct_mapped flag which will be used to optimize memory usage for multi GPU mappings. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: Add amdgpu and dc glossaryRodrigo Siqueira
In the DC driver, we have multiple acronyms that are not obvious most of the time; the same idea is valid for amdgpu. This commit introduces a DC and amdgpu glossary in order to make it easier to navigate through our driver. Changes since V3: - Yann: Add new acronyms to amdgpu glossary - Daniel: Add link between dc and amdgpu glossary Changes since V2: - Add MMHUB Changes since V1: - Yann: Divide glossary based on driver context. - Alex: Make terms more consistent and update CPLIB - Add new acronyms to the glossary Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: Add basic overview of DC pipelineRodrigo Siqueira
This commit describes how DCN works by providing high-level diagrams with an explanation of each component. In particular, it details the Global Sync signals. Change since V2: - Add a comment about MMHUBBUB. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: How to collect DTN logRodrigo Siqueira
Introduce how to collect DTN log from debugfs. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: Document pipe split visual confirmationRodrigo Siqueira
Display core provides a feature that makes it easy for users to debug Pipe Split. This commit introduces how to use such a debug option. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: Document amdgpu_dm_visual_confirm debugfs entryRodrigo Siqueira
Display core provides a feature that makes it easy for users to debug Multiple planes by enabling a visual notification at the bottom of each plane. This commit introduces how to use such a feature. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13Documentation/gpu: Reorganize DC documentationRodrigo Siqueira
Display core documentation is not well organized, and it is hard to find information due to the lack of sections. This commit reorganizes the documentation layout, and it is preparation work for future changes. Changes since V1: - Christian: Group amdgpu documentation together. - Daniel: Drop redundant amdgpu prefix. - Jani: Create index pages. - Yann: Mirror display folder in the documentation. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: add support for SMU debug optionLang Yu
SMU firmware expects the driver maintains error context and doesn't interact with SMU any more when SMU errors occurred. That will aid in debugging SMU firmware issues. Add SMU debug option support for this request, it can be enabled or disabled via amdgpu_smu_debug debugfs file. Use a 32-bit mask to indicate corresponding debug modes. Currently, only one mode(HALT_ON_ERROR) is supported. When enabled, it brings hardware to a kind of halt state so that no one can touch it any more in the envent of SMU errors. The dirver interacts with SMU via sending messages. And threre are three ways to sending messages to SMU in current implementation. Handle them respectively as following: 1, smu_cmn_send_smc_msg_with_param() for normal timeout cases Halt on any error. 2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response() for longer timeout cases Halt on errors apart from ETIME. Otherwise this way won't work. Let the user handle ETIME error in such a case. 3, smu_cmn_send_msg_without_waiting() for no waiting cases Halt on errors apart from ETIME. Otherwise second way won't work. == Command Guide == 1, enable HALT_ON_ERROR mode # echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug 2, disable HALT_ON_ERROR mode # echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug v5: - Use bit mask to allow more debug features.(Evan) - Use WRAN() instead of BUG().(Evan) v4: - Set to halt state instead of a simple hang.(Christian) v3: - Use debugfs_create_bool().(Christian) - Put variable into smu_context struct. - Don't resend command when timeout. v2: - Resend command when timeout.(Lijo) - Use debugfs file instead of module parameter. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: introduce a kind of halt state for amdgpu deviceLang Yu
It is useful to maintain error context when debugging SW/FW issues. Introduce amdgpu_device_halt() for this purpose. It will bring hardware to a kind of halt state, so that no one can touch it any more. Compare to a simple hang, the system will keep stable at least for SSH access. Then it should be trivial to inspect the hardware state and see what's going on. v2: - Set adev->no_hw_access earlier to avoid potential crashes.(Christian) Suggested-by: Christian Koenig <christian.koenig@amd.com> Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Christian Koenig <christian.koenig@amd.co> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: check df_funcs and its callback pointersHawking Zhang
in case they are not avaiable in early phase Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: don't override default ECO_BITs settingHawking Zhang
Leave this bit as hardware default setting Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORELe Ma
should count on GC IP base address Signed-off-by: Le Ma <le.ma@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: read and authenticate ip discovery binaryHawking Zhang
read and authenticate ip discovery binary getting from vram first, if it is not valid, read and authenticate the one getting from file Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: add helper to verify ip discovery binary signatureHawking Zhang
To be used to check ip discovery binary signature Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: rename discovery_read_binary helperHawking Zhang
add _from_vram in the funciton name to diffrentiate the one used to read from file Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: add helper to load ip_discovery binary from fileHawking Zhang
To be used when ip_discovery binary is not carried by vbios Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: fix incorrect VCN revision in SRIOVLeslie Shi
Guest OS will setup VCN instance 1 which is disabled as an enabled instance and execute initialization work on it, but this causes VCN ib ring test failure on the disabled VCN instance during modprobe: amdgpu 0000:00:08.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 5 on hub 1 amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_dec_0 (-110). amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc_0.0 (-110). [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110). v2: drop amdgpu_discovery_get_vcn_version and rename sriov_config to vcn_config v3: modify VCN's revision in SR-IOV and bare-metal Fixes: baf3f8f3740625 ("drm/amdgpu: handle SRIOV VCN revision parsing") Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: add modifiers in amdgpu_vkms_plane_init()Leslie Shi
Fix following warning in SRIOV during modprobe: amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu] Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: disable default navi2x co-op kernel supportJonathan Kim
This patch reverts the following: commit 48733b224fa7ba ("drm/amdkfd: add Navi2x to GWS init conditions") Disable GWS usage in default settings for now due to FW bugs. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdkfd: add Navi2x to GWS init conditionsGraham Sider
Initalize GWS on Navi2x with mec2_fw_version >= 0x42. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-and-tested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: only hw fini SMU fisrt for ASICs need thatLang Yu
We found some headaches on ASICs don't need that, so remove that for them. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Kevin Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: remove power on/off SDMA in SMU hw_init/fini()Lang Yu
Currently, we don't find some neccesities to power on/off SDMA in SMU hw_init/fini(). It makes more sense in SDMA hw_init/fini(). Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Kevin Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdkfd: Make KFD support on Hawaii experimentalFelix Kuehling
Hawaii support is mostly untested these days. ROCm user mode also depends on custom firmware for AQL packet processing, that was never pushed upstream due to quality regressions in graphics driver testing. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdkfd: Don't split unchanged SVM rangesFelix Kuehling
If an existing SVM range overlaps an svm_range_set_attr call, we would normally split it in order to update only the overlapping part. However, if the attributes of the existing range would not be changed splitting it is unnecessary. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdkfd: Fix svm_range_is_same_attrsFelix Kuehling
The existing function doesn't compare the access bitmaps and flags. This can result in failure to update those attributes in existing ranges when all other attributes remained unchanged. Because the access and flags attributes modify only some bits in the respective bitmaps, we cannot compare them directly. Instead we need to check whether applying the attributes to a particular range would change the bitmaps. A PREFETCH_LOC attribute must always trigger a migration, even if the attribute value remains unchanged. E.g. if some pages were migrated due to a CPU page fault, a prefetch must still be executed to migrate pages back to VRAM. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdkfd: Fix error handling in svm_range_addFelix Kuehling
Add null-pointer check after the last svm_range_new call. This was originally reported by Zhou Qingyang <zhou1615@umn.edu> based on a static analyzer. To avoid duplicating the unwinding code from svm_range_handle_overlap, I merged the two functions into one. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Zhou Qingyang <zhou1615@umn.edu> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: Handle fault with same timestampPhilip Yang
Remove not unique timestamp WARNING as same timestamp interrupt happens on some chips, Drain fault need to wait for the processed_timestamp to be truly greater than the checkpoint or the ring to be empty to be sure no stale faults are handled. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1818 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: fix location of prototype for amdgpu_kms_compat_ioctlIsabella Basso
This fixes the warning below by changing the prototype to a location that's actually included by the .c files that call amdgpu_kms_compat_ioctl: warning: no previous prototype for ‘amdgpu_kms_compat_ioctl’ [-Wmissing-prototypes] 37 | long amdgpu_kms_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) | ^~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd: append missing includesIsabella Basso
This fixes warnings caused by global functions lacking prototypes:, such as: warning: no previous prototype for 'dcn303_hw_sequencer_construct' [-Wmissing-prototypes] 12 | void dcn303_hw_sequencer_construct(struct dc *dc) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... warning: no previous prototype for ‘amdgpu_has_atpx’ [-Wmissing-prototypes] 76 | bool amdgpu_has_atpx(void) { | ^~~~~~~~~~~~~~~ Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdkfd: fix function scopesIsabella Basso
This turns previously global functions into static, thus removing compile-time warnings such as: warning: no previous prototype for 'pm_set_resources_vi' [-Wmissing-prototypes] 113 | int pm_set_resources_vi(struct packet_manager *pm, uint32_t *buffer, | ^~~~~~~~~~~~~~~~~~~ Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: fix function scopesIsabella Basso
This turns previously global functions into static, thus removing compile-time warnings such as: warning: no previous prototype for 'amdgpu_vkms_output_init' [-Wmissing-prototypes] 399 | int amdgpu_vkms_output_init(struct drm_device *dev, | ^~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd: fix improper docstring syntaxIsabella Basso
This fixes various warnings relating to erroneous docstring syntax, of which some are listed below: warning: Function parameter or member 'adev' not described in 'amdgpu_atomfirmware_ras_rom_addr' ... warning: expecting prototype for amdgpu_atpx_validate_functions(). Prototype was for amdgpu_atpx_validate() instead ... warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new' ... warning: Cannot understand * @kfd_get_cu_occupancy - Collect number of waves in-flight on this device ... warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd: Mark IP_BASE definition as __maybe_unusedIsabella Basso
Silences 166 compile-time warnings like: warning: 'UVD0_BASE' defined but not used [-Wunused-const-variable=] 129 | static const struct IP_BASE UVD0_BASE ={ { { { 0x00007800, 0x00007E00, 0, 0, 0 } }, | ^~~~~~~~~ warning: 'UMC0_BASE' defined but not used [-Wunused-const-variable=] 123 | static const struct IP_BASE UMC0_BASE ={ { { { 0x00014000, 0, 0, 0, 0 } }, | ^~~~~~~~~ Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: extended waiting SRIOV VF reset completion timeout to 10sZhigang Luo
For the ASIC has big FB, it need more time to clear FB during reset. This change extended SRIOV VF waiting reset completion timeout from 5s to 10s. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Acked-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: recover XGMI topology for SRIOV VF after resetZhigang Luo
For SRIOV VF, the XGMI topology was not recovered after reset. This change added code to SRIOV VF reset function to update XGMI topology for SRIOV VF after reset. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: added PSP XGMI initialization for SRIOV VF during recoverZhigang Luo
For SRIOV VF, XGMI was not initialized in PSP during recover. This change added PSP XGMI initialization for SRIOV VF during recover. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: skip reset other device in the same hive if it's SRIOV VFZhigang Luo
On SRIOV, host driver can support FLR(function level reset) on individual VF within the hive which might bring the individual device back to normal without the necessary to execute the hive reset. If the FLR failed , host driver will trigger the hive reset, each guest VF will get reset notification before the real hive reset been executed. The VF device can handle the reset request individually in it's reset work handler. This change updated gpu recover sequence to skip reset other device in the same hive for SRIOV VF. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd/display: Add feature flags to disable LTTPRAurabindo Pillai
[Why] Allow for disabling non transparent mode of LTTPR for running tests. [How] Add a feature flag and set them during init sequence. The flags are already being used in DC. Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: enable RAS poison flag when GPU is connected to CPUTao Zhou
The RAS poison mode is enabled by default on the platform. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd/display: Add Debugfs Entry to Force in SST SequenceFangzhi Zuo
It is to force SST sequence on MST capable receivers. v2: squash in compilation fix when CONFIG_DRM_AMD_DC_DCN is not set Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13kunit: tool: suggest using decode_stacktrace.sh on kernel crashDaniel Latypov
kunit.py isn't very clear that 1) it stashes a copy of the unparsed output in $BUILD_DIR/test.log 2) it sets $BUILD_DIR=.kunit by default So it's trickier than it should be for a user to come up with the right command to do so. Make kunit.py print out a command for this if a) we saw a test case crash b) we only ran one kernel (test.log only contains output from the last) Example suggested command: $ scripts/decode_stacktrace.sh .kunit/vmlinux .kunit < .kunit/test.log | tee .kunit/decoded.log | ./tools/testing/kunit/kunit.py parse Without debug info a user might see something like [14:11:25] Call Trace: [14:11:25] ? kunit_binary_assert_format (:?) [14:11:25] kunit_try_run_case (test.c:?) [14:11:25] ? __kthread_parkme (kthread.c:?) [14:11:25] kunit_generic_run_threadfn_adapter (try-catch.c:?) [14:11:25] ? kunit_generic_run_threadfn_adapter (try-catch.c:?) [14:11:25] kthread (kthread.c:?) [14:11:25] new_thread_handler (:?) [14:11:25] [CRASHED] `tee` is in GNU coreutils, so it seems fine to add that into the pipeline by default, that way users can inspect the otuput in more detail. Note: to turn on debug info, users would need to do something like $ echo -e 'CONFIG_DEBUG_KERNEL=y\nCONFIG_DEBUG_INFO=y' >> .kunit/.kunitconfig $ ./tools/testing/kunit/kunit.py config $ ./tools/testing/kunit/kunit.py build $ <then run decode_stacktrace.sh now vmlinux is updated> This feels too clunky to include in the instructions. With --kconfig_add [1], it would become a bit less painful. [1] https://lore.kernel.org/linux-kselftest/20211106013058.2621799-2-dlatypov@google.com/ Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-12-13kunit: tool: reconfigure when the used kunitconfig changesDaniel Latypov
Problem: currently, if you remove something from your kunitconfig, kunit.py will not regenerate the .config file. The same thing happens if you did --kunitconfig_add=CONFIG_KASAN=y [1] and then ran again without it. Your new run will still have KASAN. The reason is that kunit.py won't regenerate the .config file if it's a superset of the kunitconfig. This speeds it up a bit for iterating. This patch adds an additional check that forces kunit.py to regenerate the .config file if the current kunitconfig doesn't match the previous one. What this means: * deleting entries from .kunitconfig works as one would expect * dropping a --kunitconfig_add also triggers a rebuild * you can still edit .config directly to turn on new options We implement this by creating a `last_used_kunitconfig` file in the build directory (so .kunit, by default) after we generate the .config. When comparing the kconfigs, we compare python sets, so duplicates and permutations don't trip us up. The majority of this patch is adding unit tests for the existing logic and for the new case where `last_used_kunitconfig` differs. [1] https://lore.kernel.org/linux-kselftest/20211106013058.2621799-2-dlatypov@google.com/ Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-12-13kunit: tool: revamp message for invalid kunitconfigDaniel Latypov
The current error message is precise, but not very clear if you don't already know what it's talking about, e.g. > $ make ARCH=um olddefconfig O=.kunit > ERROR:root:Provided Kconfig is not contained in validated .config. Following fields found in kunitconfig, but not in .config: CONFIG_DRM=y Try to reword the error message so that it's * your missing options usually have unsatisified dependencies * if you're on UML, that might be the cause (it is, in this example) Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-12-13kunit: tool: add --kconfig_add to allow easily tweaking kunitconfigsDaniel Latypov
E.g. run tests but with KASAN $ ./tools/testing/kunit/kunit.py run --arch=x86_64 --kconfig_add=CONFIG_KASAN=y This also works with --kunitconfig $ ./tools/testing/kunit/kunit.py run --arch=x86_64 --kunitconfig=fs/ext4 --kconfig_add=CONFIG_KASAN=y This flag is inspired by TuxMake's --kconfig-add, see https://gitlab.com/Linaro/tuxmake#examples. Our version just uses "_" as the delimiter for consistency with pre-existing flags like --build_dir, --make_options, --kernel_args, etc. Note: this does make it easier to run into a pre-existing edge case: $ ./tools/testing/kunit/kunit.py run --arch=x86_64 --kconfig_add=CONFIG_KASAN=y $ ./tools/testing/kunit/kunit.py run --arch=x86_64 This second invocation ^ still has KASAN enabled! kunit.py won't call olddefconfig if our current .config is already a superset of the provided kunitconfig. Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-12-13kunit: tool: move Kconfig read_from_file/parse_from_string to package-levelDaniel Latypov
read_from_file() clears its `self` Kconfig object and parses a config file. It is a way to construct Kconfig objects more so than an operation on Kconfig objects. This is reflected in the fact its only ever used as: kconfig = kunit_config.Kconfig() kconfig.read_from_file(path) So clean this up and simplify callers by replacing it with kconfig = kunit_config.parse_file(path) Do the same thing for the related parse_from_string() function as well. Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-12-13kunit: tool: print parsed test results fully incrementallyDaniel Latypov
With the parser rework [1] and run_kernel() rework [2], this allows the parser to print out test results incrementally. Currently, that's held up by the fact that the LineStream eagerly pre-fetches the next line when you call pop(). This blocks parse_test_result() from returning until the line *after* the "ok 1 - test name" line is also printed. One can see this with the following example: $ (echo -e 'TAP version 14\n1..3\nok 1 - fake test'; sleep 2; echo -e 'ok 2 - fake test 2'; sleep 3; echo -e 'ok 3 - fake test 3') | ./tools/testing/kunit/kunit.py parse Before this patch [1]: there's a pause before 'fake test' is printed. After this patch: 'fake test' is printed out immediately. This patch also adds * a unit test to verify LineStream's behavior directly * a test case to ensure that it's lazily calling the generator * an explicit exception for when users go beyond EOF [1] https://lore.kernel.org/linux-kselftest/20211006170049.106852-1-dlatypov@google.com/ [2] https://lore.kernel.org/linux-kselftest/20211005011340.2826268-1-dlatypov@google.com/ Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-12-13kunit: Report test parameter results as (K)TAP subtestsDavid Gow
Currently, the results for individial parameters in a parameterised test are simply output as (K)TAP diagnostic lines. As kunit_tool now supports nested subtests, report each parameter as its own subtest. For example, here's what the output now looks like: # Subtest: inode_test_xtimestamp_decoding ok 1 - 1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits ok 2 - 1969-12-31 Upper bound of 32bit < 0 timestamp, no extra bits ok 3 - 1970-01-01 Lower bound of 32bit >=0 timestamp, no extra bits ok 4 - 2038-01-19 Upper bound of 32bit >=0 timestamp, no extra bits ok 5 - 2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on ok 6 - 2106-02-07 Upper bound of 32bit <0 timestamp, lo extra sec bit on ok 7 - 2106-02-07 Lower bound of 32bit >=0 timestamp, lo extra sec bit on ok 8 - 2174-02-25 Upper bound of 32bit >=0 timestamp, lo extra sec bit on ok 9 - 2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on ok 10 - 2242-03-16 Upper bound of 32bit <0 timestamp, hi extra sec bit on ok 11 - 2242-03-16 Lower bound of 32bit >=0 timestamp, hi extra sec bit on ok 12 - 2310-04-04 Upper bound of 32bit >=0 timestamp, hi extra sec bit on ok 13 - 2310-04-04 Upper bound of 32bit>=0 timestamp, hi extra sec bit 1. 1 ns ok 14 - 2378-04-22 Lower bound of 32bit>= timestamp. Extra sec bits 1. Max ns ok 15 - 2378-04-22 Lower bound of 32bit >=0 timestamp. All extra sec bits on ok 16 - 2446-05-10 Upper bound of 32bit >=0 timestamp. All extra sec bits on # inode_test_xtimestamp_decoding: pass:16 fail:0 skip:0 total:16 ok 1 - inode_test_xtimestamp_decoding Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-12-13kunit: Don't crash if no parameters are generatedDavid Gow
It's possible that a parameterised test could end up with zero parameters. At the moment, the test function will nevertheless be called with NULL as the parameter. Instead, don't try to run the test code, and just mark the test as SKIPped. Reported-by: Daniel Latypov <dlatypov@google.com> Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>