linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-11-29	drm/amd/display: set per pipe dppclk to 0 when dpp is off	Dmytro Laktyushkin
	The 'commit 52e4fdf09ebc ("drm/amd/display: use low clocks for no plane configs")' introduced a change that set low clock values for DCN31 and DCN32. As a result of these changes, DC started to spam the log with the following warning: ------------[ cut here ]------------ WARNING: CPU: 8 PID: 1486 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_dccg.c:58 dccg2_update_dpp_dto+0x3f/0xf0 [amdgpu] [..] CPU: 8 PID: 1486 Comm: kms_atomic Tainted: G W 5.18.0+ #1 RIP: 0010:dccg2_update_dpp_dto+0x3f/0xf0 [amdgpu] RSP: 0018:ffffbbd8025334d0 EFLAGS: 00010206 RAX: 00000000000001ee RBX: ffffa02c87dd3de0 RCX: 00000000000a7f80 RDX: 000000000007dec3 RSI: 0000000000000000 RDI: ffffa02c87dd3de0 RBP: ffffbbd8025334e8 R08: 0000000000000001 R09: 0000000000000005 R10: 00000000000331a0 R11: ffffffffc0b03d80 R12: ffffa02ca576d000 R13: ffffa02cd02c0000 R14: 00000000001453bc R15: ffffa02cdc280000 [..] dcn20_update_clocks_update_dpp_dto+0x4e/0xa0 [amdgpu] dcn32_update_clocks+0x5d9/0x650 [amdgpu] dcn20_prepare_bandwidth+0x49/0x100 [amdgpu] dcn30_prepare_bandwidth+0x63/0x80 [amdgpu] dc_commit_state_no_check+0x39d/0x13e0 [amdgpu] dc_commit_streams+0x1f9/0x3b0 [amdgpu] dc_commit_state+0x37/0x120 [amdgpu] amdgpu_dm_atomic_commit_tail+0x5e5/0x2520 [amdgpu] ? _raw_spin_unlock_irqrestore+0x1f/0x40 ? down_trylock+0x2c/0x40 ? vprintk_emit+0x186/0x2c0 ? vprintk_default+0x1d/0x20 ? vprintk+0x4e/0x60 We can easily trigger this issue by using a 4k@120 or a 2k@165 and running some of the kms_atomic tests. This warning is triggered because the per-pipe clock update is not happening; this commit fixes this issue by ensuring that DPPCLK is updated when calculating the watermark and dlg is invoked. Fixes: 2641c7b78081 ("drm/amd/display: use low clocks for no plane configs") Reported-by: Mark Broadworth <mark.broadworth@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amd/display: Fix race condition in DPIA AUX transfer	Stylon Wang
	[Why] This fix was intended for improving on coding style but in the process uncovers a race condition, which explains why we are getting incorrect length in DPIA AUX replies. Due to the call path of DPIA AUX going from DC back to DM layer then again into DC and the added complexities on top of current DC AUX implementation, a proper fix to rely on current dc_lock to address the race condition is difficult without a major overhual on how DPIA AUX is implemented. [How] - Add a mutex dpia_aux_lock to protect DPIA AUX transfers - Remove DMUB_ASYNC_TO_SYNC_ACCESS_* codes and rely solely on aux_return_code_type for error reporting and handling - Separate SET_CONFIG from DPIA AUX transfer because they have quiet different processing logic - Remove unnecessary type casting to and from void * type Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Stylon Wang <stylon.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amd/display: MALL SS calculations should iterate over all pipes for cursor	Dillon Varone
	[Description] MALL SS allocation calculations should iterate over all pipes to determine the the allocation size required for HW cursor. Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amd/display: 3.2.214	Aric Cyr
	This version brings along following fixes: -Program output transfer function when required -Fix arthmetic errror in MALL size caluclations for subvp -DCC Meta pitch used for MALL allocation -Debugfs entry to tell if connector is DPIA link -Use largest vready_offset in pipe group -Fixes race condition in DPIA Aux transfer Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Aric Cyr <Aric.Cyr@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amd/display: Use the largest vready_offset in pipe group	Wesley Chalmers
	[WHY] Corruption can occur in LB if vready_offset is not large enough. DML calculates vready_offset for each pipe, but we currently select the top pipe's vready_offset, which is not necessarily enough for all pipes in the group. [HOW] Wherever program_global_sync is currently called, iterate through the entire pipe group and find the highest vready_offset. Reviewed-by: Dillon Varone <Dillon.Varone@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/radeon: Fix PCI device refcount leak in radeon_atrm_get_bios()	Xiongfeng Wang
	As comment of pci_get_class() says, it returns a pci_device with its refcount increased and decreased the refcount for the input parameter @from if it is not NULL. If we break the loop in radeon_atrm_get_bios() with 'pdev' not NULL, we need to call pci_dev_put() to decrease the refcount. Add the missing pci_dev_put() to avoid refcount leak. Fixes: d8ade3526b2a ("drm/radeon: handle non-VGA class pci devices with ATRM") Fixes: c61e2775873f ("drm/radeon: split ATRM support out from the ATPX handler (v3)") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdkfd: Remove unnecessary condition in kfd_topology_add_device()	Dan Carpenter
	We re-arranged this code recently so "ret" is always zero at this point. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	swsmu/amdgpu_smu: Fix the wrong if-condition	Yu Songping
	The logical operator '&&' will make smu->ppt_funcs->set_gfx_power_up_by_imu segment fault when smu->ppt_funcs is NULL. Signed-off-by: Yu Songping <yusongping@huawei.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: remove redundant NULL check	Yushan Zhou
	release_firmware() checks whether firmware pointer is NULL. Remove the redundant NULL check in psp_sw_fini(). Signed-off-by: Yushan Zhou <katrinzhou@tencent.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	amdgpu/nv.c: Corrected typo in the video capabilities resolution	Veerabadhran Gopalakrishnan
	Corrected the typo in the 4K resolution parameters. Fixes: b3a24461f9fb15 ("amdgpu/nv.c - Added codec query for Beige Goby") Fixes: 9075096b09e590 ("amdgpu/nv.c - Optimize code for video codec support structure") Fixes: 9ac0edaa0f8323 ("drm/amdgpu: add vcn_4_0_0 video codec query") Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: Fix potential double free and null pointer dereference	Liang He
	In amdgpu_get_xgmi_hive(), we should not call kfree() after kobject_put() as the PUT will call kfree(). In amdgpu_device_ip_init(), we need to check the returned hive which can be NULL before we dereference it. Signed-off-by: Liang He <windhl@126.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: enable PSP IP v13.0.11 support	Tim Huang
	Enable PSP FW loading for PSP IP v13.0.11 Signed-off-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: Enable pg/cg flags on GC11_0_4 for VCN	Saleemkhan Jamadar
	This enable VCN PG, CG and JPEG PG, CG Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu/discovery: enable nbio support for NBIO v7.7.1	Yifan Zhang
	this patch is to enable nbio support for NBIO v7.7.1. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu/pm: use the specific mailbox registers only for SMU IP v13.0.4	Tim Huang
	The SMU IP v13.0.4 ppt interface is shared by IP v13.0.11, they use the different mailbox register offset. So use the specific mailbox registers offset for v13.0.4. Signed-off-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu/soc21: add mode2 asic reset for SMU IP v13.0.11	Tim Huang
	Set the default reset method to mode2 for SMU IP v13.0.11 Signed-off-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu/pm: add GFXOFF control IP version check for SMU IP v13.0.11	Yifan Zhang
	Enable the SMU IP v13.0.11 GFXOFF control Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: add smu 13 support for smu 13.0.11	Yifan Zhang
	this patch to add smu 13 support for smu 13.0.11. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu/pm: enable swsmu for SMU IP v13.0.11	Yifan Zhang
	Add the entry to set the ppt functions for SMU IP v13.0.11. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdkfd: add GC 11.0.4 KFD support	Yifan Zhang
	Add initial support for GC 11.0.4 in KFD compute driver. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: add gmc v11 support for GC 11.0.4	Yifan Zhang
	Add gmc v11 support for GC 11.0.4. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: add soc21 common ip block support for GC 11.0.4	Yifan Zhang
	Add common soc21 ip block support for GC 11.0.4. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: add gfx support for GC 11.0.4	Yifan Zhang
	this patch to add GC 11.0.4 gfx support to gfx11 implementation. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu/discovery: set the APU flag for GC 11.0.4	Yifan Zhang
	Set the APU flag appropriately for GC 11.0.4. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: set GC 11.0.4 family	Yifan Zhang
	this patch is to set GC 11.0.4 family. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu/discovery: add PSP IP v13.0.11 support	Tim Huang
	Add PSP IP v13.0.11 ip discovery support. Signed-off-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu/discovery: enable mes support for GC v11.0.4	Yifan Zhang
	this patch is to enable mes for GC 11.0.4. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu/discovery: enable gfx v11 for GC 11.0.4	Yifan Zhang
	Enable gfx v11 for GC 11.0.4. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu/discovery: enable gmc v11 for GC 11.0.4	Yifan Zhang
	Enable gmc (graphic memory controller) v11 for GC 11.0.4. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu/discovery: enable soc21 common for GC 11.0.4	Yifan Zhang
	Enable soc21 common for GC 11.0.4. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: skip vram reserve on firmware_v2_2 for bare-metal	Likun Gao
	vram_usagebyfirmware v2_2 is only used in SRIOV case, skip the related settings in bare-metal case currently. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: add printing to indicate rpm completeness	Guchun Chen
	Add an explicit printing to tell when finishing rpm execution in amdgpu. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amd/pm/smu11: poll BACO status after RPM BACO exits	Guchun Chen
	After executing BACO exit, driver needs to poll the status to ensure FW has completed BACO exit sequence to prevent timing issue. v2: use usleep_range to replace msleep to fix checkpatch.pl warnings Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amd/pm/smu11: BACO is supported when it's in BACO state	Guchun Chen
	Return true early if ASIC is in BACO state already, no need to talk to SMU. It can fix the issue that driver was not calling BACO exit at all in runtime pm resume, and a timing issue leading to a PCI AER error happened eventually. Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: add drv_vram_usage_va for virt data exchange	Tong Liu01
	For vram_usagebyfirmware_v2_2, fw_vram_reserve is not used. So fw_vram_usage_va is NULL, and cannot do virt data exchange anymore. Should add drv_vram_usage_va to do virt data exchange in vram_usagebyfirmware_v2_2 case. And refine some code style checks in pre add vram reservation logic patch Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	drm/amdgpu: fix stall on CPU when allocate large system memory	James Zhu
	-v2: 1. rename variable to redue confuse 2. optimize the code -v3: move new define out of the middle of the code -v4: squash in minmax error fix (Luben) When applications try to allocate large system (more than > 128GB), "stall cpu" is reported. for such large system memory, walk_page_range takes more than 20s usually. The warning message can be removed when splitting hmm range into smaller ones which is not more 64GB for each walk_page_range. [ 164.437617] amdgpu:amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu:1753: amdgpu: create BO VA 0x7f63c7a00000 size 0x2f16000000 domain CPU [ 164.488847] amdgpu:amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu:1785: amdgpu: creating userptr BO for user_addr = 7f63c7a00000 [ 185.439116] rcu: INFO: rcu_sched self-detected stall on CPU [ 185.439125] rcu: 8-....: (20999 ticks this GP) idle=e22/1/0x4000000000000000 softirq=2242/2242 fqs=5249 [ 185.439137] (t=21000 jiffies g=6325 q=1215) [ 185.439141] NMI backtrace for cpu 8 [ 185.439143] CPU: 8 PID: 3470 Comm: kfdtest Kdump: loaded Tainted: G O 5.12.0-0_fbk5_zion_rc1_5697_g2c723fb88626 #1 [ 185.439147] Hardware name: HPE ProLiant XL675d Gen10 Plus/ProLiant XL675d Gen10 Plus, BIOS A47 11/06/2020 [ 185.439150] Call Trace: [ 185.439153] <IRQ> [ 185.439157] dump_stack+0x64/0x7c [ 185.439163] nmi_cpu_backtrace.cold.7+0x30/0x65 [ 185.439165] ? lapic_can_unplug_cpu+0x70/0x70 [ 185.439170] nmi_trigger_cpumask_backtrace+0xf9/0x100 [ 185.439174] rcu_dump_cpu_stacks+0xc5/0xf5 [ 185.439178] rcu_sched_clock_irq.cold.97+0x112/0x38c [ 185.439182] ? tick_sched_handle.isra.21+0x50/0x50 [ 185.439185] update_process_times+0x8c/0xc0 [ 185.439189] tick_sched_timer+0x63/0x70 [ 185.439192] __hrtimer_run_queues+0xff/0x250 [ 185.439195] hrtimer_interrupt+0xf4/0x200 [ 185.439199] __sysvec_apic_timer_interrupt+0x51/0xd0 [ 185.439201] sysvec_apic_timer_interrupt+0x69/0x90 [ 185.439206] </IRQ> [ 185.439207] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 185.439211] RIP: 0010:clear_page_rep+0x7/0x10 [ 185.439214] Code: e8 fe 7c 51 00 44 89 e2 48 89 ee 48 89 df e8 60 ff ff ff c6 03 00 5b 5d 41 5c c3 cc cc cc cc cc cc cc cc b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 [ 185.439218] RSP: 0018:ffffc9000f58f818 EFLAGS: 00000246 [ 185.439220] RAX: 0000000000000000 RBX: 0000000000000881 RCX: 000000000000005c [ 185.439223] RDX: 0000000000100dca RSI: 0000000000000000 RDI: ffff88a59e0e5d20 [ 185.439225] RBP: ffffea0096783940 R08: ffff888118c35280 R09: ffffea0096783940 [ 185.439227] R10: ffff888000000000 R11: 0000160000000000 R12: ffffea0096783980 [ 185.439228] R13: ffffea0096783940 R14: ffff88b07fdfdd00 R15: 0000000000000000 [ 185.439232] prep_new_page+0x81/0xc0 [ 185.439236] get_page_from_freelist+0x13be/0x16f0 [ 185.439240] ? release_pages+0x16a/0x4a0 [ 185.439244] __alloc_pages_nodemask+0x1ae/0x340 [ 185.439247] alloc_pages_vma+0x74/0x1e0 [ 185.439251] __handle_mm_fault+0xafe/0x1360 [ 185.439255] handle_mm_fault+0xc3/0x280 [ 185.439257] hmm_vma_fault.isra.22+0x49/0x90 [ 185.439261] __walk_page_range+0x692/0x9b0 [ 185.439265] walk_page_range+0x9b/0x120 [ 185.439269] hmm_range_fault+0x4f/0x90 [ 185.439274] amdgpu_hmm_range_get_pages+0x24f/0x260 [amdgpu] [ 185.439463] amdgpu_ttm_tt_get_user_pages+0xc2/0x190 [amdgpu] [ 185.439603] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x49f/0x7a0 [amdgpu] [ 185.439774] kfd_ioctl_alloc_memory_of_gpu+0xfb/0x410 [amdgpu] Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29	arm64/fp: Use a struct to pass data to fpsimd_bind_state_to_cpu()	Mark Brown
	For reasons that are unclear to this reader fpsimd_bind_state_to_cpu() populates the struct fpsimd_last_state_struct that it uses to store the active floating point state for KVM guests by passing an argument for each member of the structure. As the richness of the architecture increases this is resulting in a function with a rather large number of arguments which isn't ideal. Simplify the interface by using the struct directly as the single argument for the function, renaming it as we lift the definition into the header. This could be built on further to reduce the work we do adding storage for new FP state in various places but for now it just simplifies this one interface. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-9-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29	arm64/sve: Leave SVE enabled on syscall if we don't context switch	Mark Brown
	The syscall ABI says that the SVE register state not shared with FPSIMD may not be preserved on syscall, and this is the only mechanism we have in the ABI to stop tracking the extra SVE state for a process. Currently we do this unconditionally by means of disabling SVE for the process on syscall, causing userspace to take a trap to EL1 if it uses SVE again. These extra traps result in a noticeable overhead for using SVE instead of FPSIMD in some workloads, especially for simple syscalls where we can return directly to userspace and would not otherwise need to update the floating point registers. Tests with fp-pidbench show an approximately 70% overhead on a range of implementations when SVE is in use - while this is an extreme and entirely artificial benchmark it is clear that there is some useful room for improvement here. Now that we have the ability to track the decision about what to save seprately to TIF_SVE we can improve things by leaving TIF_SVE enabled on syscall but only saving the FPSIMD registers if we are in a syscall. This means that if we need to restore the register state from memory (eg, after a context switch or kernel mode NEON) we will drop TIF_SVE and reenable traps for userspace but if we can just return to userspace then traps will remain disabled. Since our current implementation and hence ABI has the effect of zeroing all the SVE register state not shared with FPSIMD on syscall we replace the disabling of TIF_SVE with a flush of the non-shared register state, this means that there is still some overhead for syscalls when SVE is in use but it is very much reduced. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-8-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29	arm64/fpsimd: SME no longer requires SVE register state	Mark Brown
	Now that we track the type of the stored register state separately to what is active in the task, it is valid to have the FPSIMD register state stored while in streaming mode. Remove the special case handling for SME when setting FPSIMD register state. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-7-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29	arm64/fpsimd: Load FP state based on recorded data type	Mark Brown
	Now that we are recording the type of floating point register state we are saving when we write the register state out to memory we can use that information when we load from memory to decide which format to load, bringing TIF_SVE into line with what we saved rather than relying on TIF_SVE to determine what to load. The SME state details are already recorded directly in the saved SVCR and handled based on the information there. Since we are not changing any of the save paths there should be no functional change from this patch, further patches will make use of this to optimise and clarify the code. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-6-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29	arm64/fpsimd: Stop using TIF_SVE to manage register saving in KVM	Mark Brown
	Now that we are explicitly telling the host FP code which register state it needs to save we can remove the manipulation of TIF_SVE from the KVM code, simplifying it and allowing us to optimise our handling of normal tasks. Remove the manipulation of TIF_SVE from KVM and instead rely on to_save to ensure we save the correct data for it. There should be no functional or performance impact from this change. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-5-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29	arm64/fpsimd: Have KVM explicitly say which FP registers to save	Mark Brown
	In order to avoid needlessly saving and restoring the guest registers KVM relies on the host FPSMID code to save the guest registers when we context switch away from the guest. This is done by binding the KVM guest state to the CPU on top of the task state that was originally there, then carefully managing the TIF_SVE flag for the task to cause the host to save the full SVE state when needed regardless of the needs of the host task. This works well enough but isn't terribly direct about what is going on and makes it much more complicated to try to optimise what we're doing with the SVE register state. Let's instead have KVM pass in the register state it wants saving when it binds to the CPU. We introduce a new FP_STATE_CURRENT for use during normal task binding to indicate that we should base our decisions on the current task. This should not be used when actually saving. Ideally we might want to use a separate enum for the type to save but this enum and the enum values would then need to be named which has problems with clarity and ambiguity. In order to ease any future debugging that might be required this patch does not actually update any of the decision making about what to save, it merely starts tracking the new information and warns if the requested state is not what we would otherwise have decided to save. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-4-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29	arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE	Mark Brown
	When we save the state for the floating point registers this can be done in the form visible through either the FPSIMD V registers or the SVE Z and P registers. At present we track which format is currently used based on TIF_SVE and the SME streaming mode state but particularly in the SVE case this limits our options for optimising things, especially around syscalls. Introduce a new enum which we place together with saved floating point state in both thread_struct and the KVM guest state which explicitly states which format is active and keep it up to date when we change it. At present we do not use this state except to verify that it has the expected value when loading the state, future patches will introduce functional changes. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-3-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29	KVM: arm64: Discard any SVE state when entering KVM guests	Mark Brown
	Since 8383741ab2e773a99 (KVM: arm64: Get rid of host SVE tracking/saving) KVM has not tracked the host SVE state, relying on the fact that we currently disable SVE whenever we perform a syscall. This may not be true in future since performance optimisation may result in us keeping SVE enabled in order to avoid needing to take access traps to reenable it. Handle this by clearing TIF_SVE and converting the stored task state to FPSIMD format when preparing to run the guest. This is done with a new call fpsimd_kvm_prepare() to keep the direct state manipulation functions internal to fpsimd.c. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-2-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29	Merge tag 'at91-fixes-6.1-3' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes AT91 fixes for 6.1 #3 It contains: - build fix for SAMA5D3 devices which don't have an L2 cache and due to this accesssing outer_cache.write_sec in sama5_secure_cache_init() could throw undefined reference to `outer_cache' if CONFIG_OUTER_CACHE is disabled from common sama5_defconfig. * tag 'at91-fixes-6.1-3' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: at91: fix build for SAMA5D3 w/o L2 cache Link: https://lore.kernel.org/r/20221125093521.382105-1-claudiu.beznea@microchip.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-11-29	ASoC: qcom: lpass-sc7180: Add maybe_unused tag for system PM ops	Srinivasa Rao Mandadapu
	Add __maybe_unused tag for system PM ops suspend and resume. This is required to fix allmodconfig compilation issue. Fixes: a3a96e93cc88 ("ASoC: qcom: lpass-sc7280: Add system suspend/resume PM ops") Signed-off-by: Srinivasa Rao Mandadapu <quic_srivasam@quicinc.com> Link: https://lore.kernel.org/r/1669726428-3140-1-git-send-email-quic_srivasam@quicinc.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-11-29	random: remove extraneous period and add a missing one in comments	Jason A. Donenfeld
	Just some trivial typo fixes, and reflowing of lines. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-29	drivers/perf: hisi: Add TLP filter support	Yicong Yang
	The PMU support to filter the TLP when counting the bandwidth with below options: - only count the TLP headers - only count the TLP payloads - count both TLP headers and payloads In the current driver it's default to count the TLP payloads only, which will have an implicity side effects that on the traffic only have header only TLPs, we'll get no data. Make this user configuration through "len_mode" parameter and make it default to count both TLP headers and payloads when user not specified. Also update the documentation for it. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20221117084136.53572-5-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29	Documentation: perf: Indent filter options list of hisi-pcie-pmu	Bagas Sanjaya
	The "Filter options" list have a rather ugly indentation. Also, the first paragraph after list name is rendered without separator (as continuation from the name). Align the list by indenting the list items and add a blank line separator for each list name. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20221117084136.53572-4-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29	docs: perf: Fix PMU instance name of hisi-pcie-pmu	Yicong Yang
	The PMU instance will be called hisi_pcie<sicl>_core<core> rather than hisi_pcie<sicl>_<core>. Fix this in the documentation. Fixes: c8602008e247 ("docs: perf: Add description for HiSilicon PCIe PMU driver") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20221117084136.53572-3-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>