summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-26drm/amd/display: Fix IPS handshake for idle optimizationsNicholas Kazlauskas
[Why] Intermittent reboot hangs are observed introduced by "Improve x86 and dmub ips handshake". [How] Bring back the commit but fix the polling. Avoid hanging in place forever by bounding the delay and ensure that we still message DMCUB on IPS2 exit to notify driver idle has been cleared. Reviewed-by: Duncan Ma <duncan.ma@amd.com> Reviewed-by: Jun Lei <jun.lei@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: implement map dc pipe with callback in DML2Wenjing Liu
[why] Unify pipe resource management logic in dc resource layer. V2: Add default case for switch. CC: Hamza Mahfooz <hamza.mahfooz@amd.com> Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Reviewed-by: Jun Lei <jun.lei@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: add pipe resource management callbacks to DML2Wenjing Liu
[why] Need DML2 to support new pipe resource management APIs. Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: Reduce default backlight min from 5 nits to 1 nitsSwapnil Patel
[Why & How] Currently set_default_brightness_aux function uses 5 nits as lower limit to check for valid default_backlight setting. However some newer panels can support even lower default settings Reviewed-by: Agustin Gutierrez <agustin.gutierrez@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Swapnil Patel <swapnil.patel@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: Update SDP VSC colorimetry from DP test automation requestGeorge Shen
[Why] Certain test equipment vendors check the SDP VSC for colorimetry against the value from the test request during certain DP link layer tests for YCbCr test cases. [How] Update SDP VSC with colorimetry from test automation request. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: Add a check for idle power optimizationSung Joon Kim
[why] Need a helper function to check idle power is allowed so that dc doesn't access any registers that are power-gated. [how] Implement helper function to check idle power optimization. Enable a hook to check if detection is allowed. V2: Add function hooks for set and get idle states. Check if function hook was properly initialized. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Reviewed-by: Nicholas Choi <nicholas.choi@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Sung Joon Kim <sungkim@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: Revert "Improve x86 and dmub ips handshake"Nicholas Kazlauskas
This reverts commit 1288d702080949f87688d49dfeeacc99f40adc9b. Causes intermittent hangs during reboot stress testing. Reviewed-by: Duncan Ma <duncan.ma@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: Fix MST Multi-Stream Not Lighting Up on dcn35Fangzhi Zuo
dcn35 misses .enable_symclk_se hook that makes MST DSC not functional when having multiple FE clk to be enabled. Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26net/mlx5: fix uninit value usePrzemek Kitszel
Avoid use of uninitialized state variable. In case of mlx5e_tx_reporter_build_diagnose_output_sq_common() it's better to still collect other data than bail out entirely. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/netdev/8bd30131-c9f2-4075-a575-7fa2793a1760@moroto.mountain Fixes: d17f98bf7cc9 ("net/mlx5: devlink health: use retained error fmsg API") Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://lore.kernel.org/r/20231025145050.36114-1-przemyslaw.kitszel@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26drm/amd: Explicitly disable ASPM when dynamic switching disabledMario Limonciello
Currently there are separate but related checks: * amdgpu_device_should_use_aspm() * amdgpu_device_aspm_support_quirk() * amdgpu_device_pcie_dynamic_switching_supported() Simplify into checking whether DPM was enabled or not in the auto case. This works because amdgpu_device_pcie_dynamic_switching_supported() populates that value. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd: Move AMD_IS_APU check for ASPM into top level functionMario Limonciello
There is no need for every ASIC driver to perform the same check. Move the duplicated code into amdgpu_device_should_use_aspm(). Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd: Disable PP_PCIE_DPM_MASK when dynamic speed switching not supportedMario Limonciello
Rather than individual ASICs checking for the quirk, set the quirk at the driver level. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26Revert "drm/amdkfd: Use partial migrations in GPU page faults"Philip Yang
This reverts commit dc427a473e5d119232ddb27530920d9796cdea70. The change prevents migrating the entire range to VRAM because retry fault restore_pages map the remaining system memory range to GPUs. It will work correctly to submit together with partial mapping to GPU patch later. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26Revert "drm/amdkfd:remove unused code"Philip Yang
This reverts commit f9caf6cdd5cc1f4006fd7b6b113658c0b0159f23. Needed for the next revert patch. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: Fix copyright notice in DC codeStylon Wang
[Why & How] Fix incomplete copyright notice in DC code. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Stylon Wang <stylon.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: Fix copyright notice in DML2 codeStylon Wang
[Why & How] Fix incomplete copyright notice in DML2 code. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Stylon Wang <stylon.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: Add missing copyright notice in DMUBStylon Wang
[Why & How] Add missing/incomplete copyright notice in DMUB files Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Stylon Wang <stylon.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdgpu remove restriction of sriov max_pfn on Vega10Lin.Cao
Remove restriction of sriov max_pfn so that TBA and TMA can move to high 47 bits address. Regression test: change range alloc flag of libdrm as AMDGPU_VA_RANGE_HIGH and there is no flr occur when testing amdgpu_test of drm. Signed-off-by: Lin.Cao <lincao12@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdkfd: Address 'remap_list' not described in 'svm_range_add'Srinivasan Shanmugam
Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2073: warning: Function parameter or member 'remap_list' not described in 'svm_range_add' Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULLQu Huang
In certain types of chips, such as VEGA20, reading the amdgpu_regs_smc file could result in an abnormal null pointer access when the smc_rreg pointer is NULL. Below are the steps to reproduce this issue and the corresponding exception log: 1. Navigate to the directory: /sys/kernel/debug/dri/0 2. Execute command: cat amdgpu_regs_smc 3. Exception Log:: [4005007.702554] BUG: kernel NULL pointer dereference, address: 0000000000000000 [4005007.702562] #PF: supervisor instruction fetch in kernel mode [4005007.702567] #PF: error_code(0x0010) - not-present page [4005007.702570] PGD 0 P4D 0 [4005007.702576] Oops: 0010 [#1] SMP NOPTI [4005007.702581] CPU: 4 PID: 62563 Comm: cat Tainted: G OE 5.15.0-43-generic #46-Ubunt u [4005007.702590] RIP: 0010:0x0 [4005007.702598] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.702600] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.702605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.702609] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.702612] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.702615] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.702618] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.702622] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.702626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.702629] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0 [4005007.702633] Call Trace: [4005007.702636] <TASK> [4005007.702640] amdgpu_debugfs_regs_smc_read+0xb0/0x120 [amdgpu] [4005007.703002] full_proxy_read+0x5c/0x80 [4005007.703011] vfs_read+0x9f/0x1a0 [4005007.703019] ksys_read+0x67/0xe0 [4005007.703023] __x64_sys_read+0x19/0x20 [4005007.703028] do_syscall_64+0x5c/0xc0 [4005007.703034] ? do_user_addr_fault+0x1e3/0x670 [4005007.703040] ? exit_to_user_mode_prepare+0x37/0xb0 [4005007.703047] ? irqentry_exit_to_user_mode+0x9/0x20 [4005007.703052] ? irqentry_exit+0x19/0x30 [4005007.703057] ? exc_page_fault+0x89/0x160 [4005007.703062] ? asm_exc_page_fault+0x8/0x30 [4005007.703068] entry_SYSCALL_64_after_hwframe+0x44/0xae [4005007.703075] RIP: 0033:0x7f5e07672992 [4005007.703079] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 e c 28 48 89 54 24 [4005007.703083] RSP: 002b:00007ffe03097898 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [4005007.703088] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f5e07672992 [4005007.703091] RDX: 0000000000020000 RSI: 00007f5e06753000 RDI: 0000000000000003 [4005007.703094] RBP: 00007f5e06753000 R08: 00007f5e06752010 R09: 00007f5e06752010 [4005007.703096] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000 [4005007.703099] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [4005007.703105] </TASK> [4005007.703107] Modules linked in: nf_tables libcrc32c nfnetlink algif_hash af_alg binfmt_misc nls_ iso8859_1 ipmi_ssif ast intel_rapl_msr intel_rapl_common drm_vram_helper drm_ttm_helper amd64_edac t tm edac_mce_amd kvm_amd ccp mac_hid k10temp kvm acpi_ipmi ipmi_si rapl sch_fq_codel ipmi_devintf ipm i_msghandler msr parport_pc ppdev lp parport mtd pstore_blk efi_pstore ramoops pstore_zone reed_solo mon ip_tables x_tables autofs4 ib_uverbs ib_core amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) iommu_v 2 amd_sched(OE) amdkcl(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm igb ahci xhci_pci libahci i2c_piix4 i2c_algo_bit xhci_pci_renesas dca [4005007.703184] CR2: 0000000000000000 [4005007.703188] ---[ end trace ac65a538d240da39 ]--- [4005007.800865] RIP: 0010:0x0 [4005007.800871] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.800874] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.800878] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.800881] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.800883] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.800886] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.800888] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.800891] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.800895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.800898] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0 Signed-off-by: Qu Huang <qu.huang@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdgpu: bypass RAS error reset in some conditionsTao Zhou
PMFW is responsible for RAS error reset in some conditions, driver can skip the operation. v2: add check for ras->in_recovery, it's set earlier than amdgpu_in_reset. v3: fix error in gpu reset check. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdgpu: enable RAS poison mode for APUTao Zhou
Enable it by default on APU platform. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdgpu/vpe: correct queue stop programingLang Yu
Otherwise IB test would fail during GPU reset. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: Fix DMUB errors introduced by DML2Rodrigo Siqueira
When DML 2 was introduced, it changed part of the generic sequence of DC, which caused issues on previous DCNs with DMUB support. This commit ensures the new sequence only works for new DCNs from 3.5 and above. Changes since V1: - Harry: Use the attribute using_dml2 instead of check the DCN version. Cc: Vitaly Prosyak <vprosyak@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Qingqing Zhuo <Qingqing.Zhuo@amd.com> Cc: Daniel Wheeler <daniel.wheeler@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Fixes: 7966f319c66d ("drm/amd/display: Introduce DML2") Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd: Disable ASPM for VI w/ all Intel systemsMario Limonciello
Originally we were quirking ASPM disabled specifically for VI when used with Alder Lake, but it appears to have problems with Rocket Lake as well. Like we've done in the case of dpm for newer platforms, disable ASPM for all Intel systems. Cc: stable@vger.kernel.org # 5.15+ Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") Reported-and-tested-by: Paolo Gentili <paolo.gentili@canonical.com> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: Remove power sequencing checkAgustin Gutierrez
[Why] Some ASICs keep backlight powered on after dpms off command has been issued. [How] The check for no edp power sequencing was never going to pass. The value is never changed from what it is set by design. Cc: stable@vger.kernel.org # 6.1+ Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2765 Reviewed-by: Swapnil Patel <swapnil.patel@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Agustin Gutierrez <agustin.gutierrez@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: Set the DML2 attribute to false in all DCNs older than ↵Rodrigo Siqueira
version 3.5 When DML2 was introduced, it targeted only new DCN versions. For controlling which ASIC should use this new version of DML, it was introduced the using_dml2 attribute. To avoid ambiguities, this commit explicitly sets using_dml2 to false in all ASICs that do not support DML2. Cc: Vitaly Prosyak <vprosyak@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Qingqing Zhuo <Qingqing.Zhuo@amd.com> Cc: Daniel Wheeler <daniel.wheeler@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/pm: Fix the return value in default caseMa Jun
Fix the return value in default case and drop redundant 'break'. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdgpu: Add API to get full IP versionLijo Lazar
Fetch the full version of IP including variant and subrevision. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdgpu: add tmz support for GC IP v11.5.0Jiadong Zhu
Add tmz support for GC 11.5.0. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/pm: drop unneeded dpm features disablement for SMU 14.0.0Jiadong Zhu
PMFW will handle the features disablement properly for gpu reset case, driver involvement may cause some unexpected issues. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdgpu: modify if condition in nbio_v7_7.cLi Ma
remove unnecessary "enable" in if condition. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdkfd: Fix shift out-of-bounds issueJesse Zhang
[ 567.613292] shift exponent 255 is too large for 64-bit type 'long unsigned int' [ 567.614498] CPU: 5 PID: 238 Comm: kworker/5:1 Tainted: G OE 6.2.0-34-generic #34~22.04.1-Ubuntu [ 567.614502] Hardware name: AMD Splinter/Splinter-RPL, BIOS WS43927N_871 09/25/2023 [ 567.614504] Workqueue: events send_exception_work_handler [amdgpu] [ 567.614748] Call Trace: [ 567.614750] <TASK> [ 567.614753] dump_stack_lvl+0x48/0x70 [ 567.614761] dump_stack+0x10/0x20 [ 567.614763] __ubsan_handle_shift_out_of_bounds+0x156/0x310 [ 567.614769] ? srso_alias_return_thunk+0x5/0x7f [ 567.614773] ? update_sd_lb_stats.constprop.0+0xf2/0x3c0 [ 567.614780] svm_range_split_by_granularity.cold+0x2b/0x34 [amdgpu] [ 567.615047] ? srso_alias_return_thunk+0x5/0x7f [ 567.615052] svm_migrate_to_ram+0x185/0x4d0 [amdgpu] [ 567.615286] do_swap_page+0x7b6/0xa30 [ 567.615291] ? srso_alias_return_thunk+0x5/0x7f [ 567.615294] ? __free_pages+0x119/0x130 [ 567.615299] handle_pte_fault+0x227/0x280 [ 567.615303] __handle_mm_fault+0x3c0/0x720 [ 567.615311] handle_mm_fault+0x119/0x330 [ 567.615314] ? lock_mm_and_find_vma+0x44/0x250 [ 567.615318] do_user_addr_fault+0x1a9/0x640 [ 567.615323] exc_page_fault+0x81/0x1b0 [ 567.615328] asm_exc_page_fault+0x27/0x30 [ 567.615332] RIP: 0010:__get_user_8+0x1c/0x30 Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdgpu: refine ras error kernel log printYang Wang
refine ras error kernel log to avoid user-ridden ambiguity. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amdgpu: fix find ras error node errorYang Wang
the origin function might return the wrong node. Fixes: 5b1270beb380 ("drm/amdgpu: add ras_err_info to identify RAS error source") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/display: reprogram det size while seamless bootHugo Hu
[Why] During system boot in second screen only mode on a seamless boot system, there is a chance that the pipe's det size might not be reset. [How] Reset the det size while resetting the pipe during seamless boot. Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Hugo Hu <hugo.hu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26drm/amd/pm: record mca debug mode in RASTao Zhou
Call amdgpu_ras_set_mca_debug_mode when we set mca debug mode in smu v13_0_6. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-27usb: typec: altmodes/displayport: fixup drm internal api change vs new user.Dave Airlie
usb: typec: altmodes/displayport: Signal hpd low when exiting mode and drm: Add HPD state to drm_connector_oob_hotplug_event() sideswiped each other. Signal disconnected always. Signed-off-by: Dave Airlie <airlied@redhat.com>
2023-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames") 6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c 61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void") d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26exportfs: Change bcachefs fid_type enum to avoid conflictsKent Overstreet
Per Amir Goldstein, the fid types that bcachefs picked conflicted with xfs and fuse, which previously were in use but not deviced in the master enum. Since bcachefs is still out of tree, we can move. https://lore.kernel.org/linux-next/20231026203733.fx65mjyic4pka3e5@moria.home.lan/T/#ma59f65ba61f605b593e69f4690dbd317526d83ba Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-26drm/amdgpu: move buffer funcs setting up a levelAlex Deucher
Rather than doing this in the IP code for the SDMA paging engine, move it up to the core device level init level. This should fix the scheduler init ordering. v2: drop extra parens v3: drop SDMA helpers v4: Added a Fixes tag because amdgpu dereferences an uninitialized scheduler without this patch, and this patch fixes this. (Luben) Tested-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20231025171928.3318505-1-alexander.deucher@amd.com Acked-by: Christian König <christian.koenig@amd.com> Fixes: 56e449603f0ac5 ("drm/sched: Convert the GPU scheduler to variable number of run-queues") Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
2023-10-26MAINTAINERS: Update the GPU Scheduler emailLuben Tuikov
Update the GPU Scheduler maintainer email. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Acked-by: Alex Deucher <Alexander.Deucher@amd.com> Link: https://lore.kernel.org/r/20231026174438.18427-2-ltuikov89@gmail.com
2023-10-26landlock: Document network supportKonstantin Meskhidze
Describe network access rules for TCP sockets. Add network access example in the tutorial. Add kernel configuration support for network. Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20231026014751.414649-13-konstantin.meskhidze@huawei.com [mic: Update date, and do light cosmetic changes] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26samples/landlock: Support TCP restrictionsKonstantin Meskhidze
Add TCP restrictions to the sandboxer demo. It's possible to allow a sandboxer to bind/connect to a list of specified ports restricting network actions to the rest of them. This is controlled with the new LL_TCP_BIND and LL_TCP_CONNECT environment variables. Rename ENV_PATH_TOKEN to ENV_DELIMITER. Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20231026014751.414649-12-konstantin.meskhidze@huawei.com [mic: Extend commit message] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26selftests/landlock: Add network testsKonstantin Meskhidze
Add 82 test suites to check edge cases related to bind() and connect() actions. They are defined with 6 fixtures and their variants: The "protocol" fixture is extended with 12 variants defined as a matrix of: sandboxed/not-sandboxed, IPv4/IPv6/unix network domain, and stream/datagram socket. 4 related tests suites are defined: * bind: Tests bind action. * connect: Tests connect action. * bind_unspec: Tests bind action with the AF_UNSPEC socket family. * connect_unspec: Tests connect action with the AF_UNSPEC socket family. The "ipv4" fixture is extended with 4 variants defined as a matrix of: sandboxed/not-sandboxed, and stream/datagram socket. 1 related test suite is defined: * from_unix_to_inet: Tests to make sure unix sockets' actions are not restricted by Landlock rules applied to TCP ones. The "tcp_layers" fixture is extended with 8 variants defined as a matrix of: IPv4/IPv6 network domain, and different number of landlock rule layers. 2 related tests suites are defined: * ruleset_overlap: Tests nested layers with less constraints. * ruleset_expand: Tests nested layers with more constraints. In the "mini" fixture 4 tests suites are defined: * network_access_rights: Tests handling of known access rights. * unknown_access_rights: Tests handling of unknown access rights. * inval: Tests unhandled allowed access and zero access value. * tcp_port_overflow: Tests with port values greater than 65535. The "ipv4_tcp" fixture supports IPv4 network domain with stream socket. 2 tests suites are defined: * port_endianness: Tests with big/little endian port formats. * with_fs: Tests a ruleset with both filesystem and network restrictions. The "port_specific" fixture is extended with 4 variants defined as a matrix of: sandboxed/not-sandboxed, IPv4/IPv6 network domain, and stream socket. 2 related tests suites are defined: * bind_connect_zero: Tests with port 0. * bind_connect_1023: Tests with port 1023. Test coverage for security/landlock is 92.4% of 710 lines according to gcc/gcov-13. Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20231026014751.414649-11-konstantin.meskhidze@huawei.com [mic: Extend commit message, update test coverage, clean up capability use, fix useless TEST_F_FORK, and improve ipv4_tcp.with_fs] Co-developed-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26selftests/landlock: Share enforce_ruleset() helperKonstantin Meskhidze
Move enforce_ruleset() helper function to common.h so that it can be used both by filesystem tests and network ones. Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20231026014751.414649-10-konstantin.meskhidze@huawei.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26landlock: Support network rules with TCP bind and connectKonstantin Meskhidze
Add network rules support in the ruleset management helpers and the landlock_create_ruleset() syscall. Extend user space API to support network actions: * Add new network access rights: LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP. * Add a new network rule type: LANDLOCK_RULE_NET_PORT tied to struct landlock_net_port_attr. The allowed_access field contains the network access rights, and the port field contains the port value according to the controlled protocol. This field can take up to a 64-bit value but the maximum value depends on the related protocol (e.g. 16-bit value for TCP). Network port is in host endianness [1]. * Add a new handled_access_net field to struct landlock_ruleset_attr that contains network access rights. * Increment the Landlock ABI version to 4. Implement socket_bind() and socket_connect() LSM hooks, which enable to control TCP socket binding and connection to specific ports. Expand access_masks_t from u16 to u32 to be able to store network access rights alongside filesystem access rights for rulesets' handled access rights. Access rights are not tied to socket file descriptors but checked at bind() or connect() call time against the caller's Landlock domain. For the filesystem, a file descriptor is a direct access to a file/data. However, for network sockets, we cannot identify for which data or peer a newly created socket will give access to. Indeed, we need to wait for a connect or bind request to identify the use case for this socket. Likewise a directory file descriptor may enable to open another file (i.e. a new data item), but this opening is also restricted by the caller's domain, not the file descriptor's access rights [2]. [1] https://lore.kernel.org/r/278ab07f-7583-a4e0-3d37-1bacd091531d@digikod.net [2] https://lore.kernel.org/r/263c1eb3-602f-57fe-8450-3f138581bee7@digikod.net Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20231026014751.414649-9-konstantin.meskhidze@huawei.com [mic: Extend commit message, fix typo in comments, and specify endianness in the documentation] Co-developed-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26landlock: Refactor landlock_add_rule() syscallKonstantin Meskhidze
Change the landlock_add_rule() syscall to support new rule types with next commits. Add the add_rule_path_beneath() helper to support current filesystem rules. Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20231026014751.414649-8-konstantin.meskhidze@huawei.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26landlock: Refactor layer helpersKonstantin Meskhidze
Add a new key_type argument to the landlock_init_layer_masks() helper. Add a masks_array_size argument to the landlock_unmask_layers() helper. These modifications support implementing new rule types in the next Landlock versions. Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20231026014751.414649-7-konstantin.meskhidze@huawei.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26landlock: Move and rename layer helpersKonstantin Meskhidze
Move and rename landlock_unmask_layers() and landlock_init_layer_masks() helpers to ruleset.c to share them with Landlock network implementation in following commits. Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20231026014751.414649-6-konstantin.meskhidze@huawei.com Signed-off-by: Mickaël Salaün <mic@digikod.net>