summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-15net: ethernet: rtsn: fix potential memory leak in rtsn_start_xmit()Wang Hai
The rtsn_start_xmit() returns NETDEV_TX_OK without freeing skb in case of skb->len being too long, add dev_kfree_skb_any() to fix it. Fixes: b0d3969d2b4d ("net: ethernet: rtsn: Add support for Renesas Ethernet-TSN") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014144250.38802-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: xilinx: axienet: fix potential memory leak in axienet_start_xmit()Wang Hai
The axienet_start_xmit() returns NETDEV_TX_OK without freeing skb in case of dma_map_single() fails, add dev_kfree_skb_any() to fix it. Fixes: 71791dc8bdea ("net: axienet: Check for DMA mapping errors") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://patch.msgid.link/20241014143704.31938-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15Merge branch 'mptcp-prevent-mpc-handshake-on-port-based-signal-endpoints'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: prevent MPC handshake on port-based signal endpoints MPTCP connection requests toward a listening socket created by the in-kernel PM for a port based signal endpoint will never be accepted, they need to be explicitly rejected. - Patch 1: Explicitly reject such requests. A fix for >= v5.12. - Patch 2: Cover this case in the MPTCP selftests to avoid regressions. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> v1: https://lore.kernel.org/20240908180620.822579-1-xiyou.wangcong@gmail.com Link: https://lore.kernel.org/a5289a0d-2557-40b8-9575-6f1a0bbf06e4@redhat.com ==================== Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-0-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15selftests: mptcp: join: test for prohibited MPC to port-based endpPaolo Abeni
Explicitly verify that MPC connection attempts towards a port-based signal endpoint fail with a reset. Note that this new test is a bit different from the other ones, not using 'run_tests'. It is then needed to add the capture capability, and the picking the right port which have been extracted into three new helpers. The info about the capture can also be printed from a single point, which simplifies the exit paths in do_transfer(). The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-2-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15mptcp: prevent MPC handshake on port-based signal endpointsPaolo Abeni
Syzkaller reported a lockdep splat: ============================================ WARNING: possible recursive locking detected 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted -------------------------------------------- syz-executor364/5113 is trying to acquire lock: ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 but task is already holding lock: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-slock-AF_INET); lock(k-slock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by syz-executor364/5113: #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806 #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727 #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470 #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228 #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline] #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104 #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232 #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 stack backtrace: CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 check_deadlock kernel/locking/lockdep.c:3061 [inline] validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855 __lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279 subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874 tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853 tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267 ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775 process_backlog+0x662/0x15b0 net/core/dev.c:6108 __napi_poll+0xcb/0x490 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:6963 handle_softirqs+0x2c4/0x970 kernel/softirq.c:554 do_softirq+0x11b/0x1e0 kernel/softirq.c:455 </IRQ> <TASK> __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline] __dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450 dev_queue_xmit include/linux/netdevice.h:3105 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235 ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535 __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466 tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline] tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729 tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934 sk_backlog_rcv include/net/sock.h:1111 [inline] __release_sock+0x214/0x350 net/core/sock.c:3004 release_sock+0x61/0x1f0 net/core/sock.c:3558 mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733 mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597 ___sys_sendmsg net/socket.c:2651 [inline] __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737 __do_sys_sendmmsg net/socket.c:2766 [inline] __se_sys_sendmmsg net/socket.c:2763 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f04fb13a6b9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9 RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004 RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240 R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300 </TASK> As noted by Cong Wang, the splat is false positive, but the code path leading to the report is an unexpected one: a client is attempting an MPC handshake towards the in-kernel listener created by the in-kernel PM for a port based signal endpoint. Such connection will be never accepted; many of them can make the listener queue full and preventing the creation of MPJ subflow via such listener - its intended role. Explicitly detect this scenario at initial-syn time and drop the incoming MPC request. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Reported-by: syzbot+f4aacdfef2c6a6529c3e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-1-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net/smc: Fix searching in list of known pnetids in smc_pnet_add_pnetidLi RongQing
pnetid of pi (not newly allocated pe) should be compared Fixes: e888a2e8337c ("net/smc: introduce list of pnetids for Ethernet devices") Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Link: https://patch.msgid.link/20241014115321.33234-1-lirongqing@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: macb: Avoid 20s boot delay by skipping MDIO bus registration for ↵Oleksij Rempel
fixed-link PHY A boot delay was introduced by commit 79540d133ed6 ("net: macb: Fix handling of fixed-link node"). This delay was caused by the call to `mdiobus_register()` in cases where a fixed-link PHY was present. The MDIO bus registration triggered unnecessary PHY address scans, leading to a 20-second delay due to attempts to detect Clause 45 (C45) compatible PHYs, despite no MDIO bus being attached. The commit 79540d133ed6 ("net: macb: Fix handling of fixed-link node") was originally introduced to fix a regression caused by commit 7897b071ac3b4 ("net: macb: convert to phylink"), which caused the driver to misinterpret fixed-link nodes as PHY nodes. This resulted in warnings like: mdio_bus f0028000.ethernet-ffffffff: fixed-link has invalid PHY address mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 0 ... mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 31 This patch reworks the logic to avoid registering and allocation of the MDIO bus when: - The device tree contains a fixed-link node. - There is no "mdio" child node in the device tree. If a child node named "mdio" exists, the MDIO bus will be registered to support PHYs attached to the MACB's MDIO bus. Otherwise, with only a fixed-link, the MDIO bus is skipped. Tested on a sama5d35 based system with a ksz8863 switch attached to macb0. Fixes: 79540d133ed6 ("net: macb: Fix handling of fixed-link node") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241013052916.3115142-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: ethernet: aeroflex: fix potential memory leak in greth_start_xmit_gbit()Wang Hai
The greth_start_xmit_gbit() returns NETDEV_TX_OK without freeing skb in case of skb->len being too long, add dev_kfree_skb() to fix it. Fixes: d4c41139df6e ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://patch.msgid.link/20241012110434.49265-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15netdevsim: use cond_resched() in nsim_dev_trap_report_work()Eric Dumazet
I am still seeing many syzbot reports hinting that syzbot might fool nsim_dev_trap_report_work() with hundreds of ports [1] Lets use cond_resched(), and system_unbound_wq instead of implicit system_wq. [1] INFO: task syz-executor:20633 blocked for more than 143 seconds. Not tainted 6.12.0-rc2-syzkaller-00205-g1d227fcc7222 #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor state:D stack:25856 pid:20633 tgid:20633 ppid:1 flags:0x00004006 ... NMI backtrace for cpu 1 CPU: 1 UID: 0 PID: 16760 Comm: kworker/1:0 Not tainted 6.12.0-rc2-syzkaller-00205-g1d227fcc7222 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: events nsim_dev_trap_report_work RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x70 kernel/kcov.c:210 Code: 89 fb e8 23 00 00 00 48 8b 3d 04 fb 9c 0c 48 89 de 5b e9 c3 c7 5d 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 48 8b 04 24 65 48 8b 0c 25 c0 d7 03 00 65 8b 15 60 f0 RSP: 0018:ffffc90000a187e8 EFLAGS: 00000246 RAX: 0000000000000100 RBX: ffffc90000a188e0 RCX: ffff888027d3bc00 RDX: ffff888027d3bc00 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88804a2e6000 R08: ffffffff8a4bc495 R09: ffffffff89da3577 R10: 0000000000000004 R11: ffffffff8a4bc2b0 R12: dffffc0000000000 R13: ffff88806573b503 R14: dffffc0000000000 R15: ffff8880663cca00 FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc90a747f98 CR3: 000000000e734000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 000000000000002b DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <NMI> </NMI> <TASK> __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382 spin_unlock_bh include/linux/spinlock.h:396 [inline] nsim_dev_trap_report drivers/net/netdevsim/dev.c:820 [inline] nsim_dev_trap_report_work+0x75d/0xaa0 drivers/net/netdevsim/dev.c:850 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Fixes: ba5e1272142d ("netdevsim: avoid potential loop in nsim_dev_trap_report_work()") Reported-by: syzbot+d383dc9579a76f56c251@syzkaller.appspotmail.com Reported-by: syzbot+c596faae21a68bf7afd0@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20241012094230.3893510-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15macsec: don't increment counters for an unrelated SASabrina Dubroca
On RX, we shouldn't be incrementing the stats for an arbitrary SA in case the actual SA hasn't been set up. Those counters are intended to track packets for their respective AN when the SA isn't currently configured. Due to the way MACsec is implemented, we don't keep counters unless the SA is configured, so we can't track those packets, and those counters will remain at 0. The RXSC's stats keeps track of those packets without telling us which AN they belonged to. We could add counters for non-existent SAs, and then find a way to integrate them in the dump to userspace, but I don't think it's worth the effort. Fixes: 91ec9bd57f35 ("macsec: Fix traffic counters/statistics") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/f5ac92aaa5b89343232615f4c03f9f95042c6aa0.1728657709.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15gpu: host1x: Set up device DMA parametersThierry Reding
In order to store device DMA parameters, the DMA framework depends on the device's dma_parms field to point at a valid memory location. Add backing storage for this in struct host1x_memory_context and point to it. Reported-by: Jonathan Hunter <jonathanh@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240916133320.368620-1-thierry.reding@gmail.com (cherry picked from commit b4ad4ef374d66cc8df3188bb1ddb65bce5fc9e50) Signed-off-by: Thierry Reding <treding@nvidia.com>
2024-10-15drm/amdgpu/swsmu: Only force workload setup on initAlex Deucher
Needed to set the workload type at init time so that we can apply the navi3x margin optimization. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3618 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Fixes: c50fe289ed72 ("drm/amdgpu/swsmu: always force a state reprogram on init") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 580ad7cbd4b7be8d2cb5ab5c1fca6bb76045eb0e) Cc: stable@vger.kernel.org
2024-10-15drm/radeon: Fix encoder->possible_clonesVille Syrjälä
Include the encoder itself in its possible_clones bitmask. In the past nothing validated that drivers were populating possible_clones correctly, but that changed in commit 74d2aacbe840 ("drm: Validate encoder->possible_clones"). Looks like radeon never got the memo and is still not following the rules 100% correctly. This results in some warnings during driver initialization: Bogus possible_clones: [ENCODER:46:TV-46] possible_clones=0x4 (full encoder mask=0x7) WARNING: CPU: 0 PID: 170 at drivers/gpu/drm/drm_mode_config.c:615 drm_mode_config_validate+0x113/0x39c ... Cc: Alex Deucher <alexander.deucher@amd.com> Cc: amd-gfx@lists.freedesktop.org Fixes: 74d2aacbe840 ("drm: Validate encoder->possible_clones") Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://lore.kernel.org/dri-devel/20241009000321.418e4294@yea/ Tested-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3b6e7d40649c0d75572039aff9d0911864c689db) Cc: stable@vger.kernel.org
2024-10-15drm/amdgpu/smu13: always apply the powersave optimizationAlex Deucher
It can avoid margin issues in some very demanding applications. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3618 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Fixes: c50fe289ed72 ("drm/amdgpu/swsmu: always force a state reprogram on init") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 62f38b4ccaa6aa063ca781d80b10aacd39dc5c76) Cc: stable@vger.kernel.org
2024-10-15drm/amdkfd: Accounting pdd vram_usage for svmPhilip Yang
Process device data pdd->vram_usage is read by rocm-smi via sysfs, this is currently missing the svm_bo usage accounting, so "rocm-smi --showpids" per process VRAM usage report is incorrect. Add pdd->vram_usage accounting when svm_bo allocation and release, change to atomic64_t type because it is updated outside process mutex now. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 98c0b0efcc11f2a5ddf3ce33af1e48eedf808b04)
2024-10-15drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ringSrinivasan Shanmugam
This patch addresses a double unlock issue in the amdgpu_mes_add_ring function. The mutex was being unlocked twice under certain error conditions, which could lead to undefined behavior. The fix ensures that the mutex is unlocked only once before jumping to the clean_up_memory label. The unlock operation is moved to just before the goto statement within the conditional block that checks the return value of amdgpu_ring_init. This prevents the second unlock attempt after the clean_up_memory label, which is no longer necessary as the mutex is already unlocked by this point in the code flow. This change resolves the potential double unlock and maintains the correct mutex handling throughout the function. Fixes below: Commit d0c423b64765 ("drm/amdgpu/mes: use ring for kernel queue submission"), leads to the following Smatch static checker warning: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring() warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213) drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 1143 int amdgpu_mes_add_ring(struct amdgpu_device *adev, int gang_id, 1144 int queue_type, int idx, 1145 struct amdgpu_mes_ctx_data *ctx_data, 1146 struct amdgpu_ring **out) 1147 { 1148 struct amdgpu_ring *ring; 1149 struct amdgpu_mes_gang *gang; 1150 struct amdgpu_mes_queue_properties qprops = {0}; 1151 int r, queue_id, pasid; 1152 1153 /* 1154 * Avoid taking any other locks under MES lock to avoid circular 1155 * lock dependencies. 1156 */ 1157 amdgpu_mes_lock(&adev->mes); 1158 gang = idr_find(&adev->mes.gang_id_idr, gang_id); 1159 if (!gang) { 1160 DRM_ERROR("gang id %d doesn't exist\n", gang_id); 1161 amdgpu_mes_unlock(&adev->mes); 1162 return -EINVAL; 1163 } 1164 pasid = gang->process->pasid; 1165 1166 ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL); 1167 if (!ring) { 1168 amdgpu_mes_unlock(&adev->mes); 1169 return -ENOMEM; 1170 } 1171 1172 ring->ring_obj = NULL; 1173 ring->use_doorbell = true; 1174 ring->is_mes_queue = true; 1175 ring->mes_ctx = ctx_data; 1176 ring->idx = idx; 1177 ring->no_scheduler = true; 1178 1179 if (queue_type == AMDGPU_RING_TYPE_COMPUTE) { 1180 int offset = offsetof(struct amdgpu_mes_ctx_meta_data, 1181 compute[ring->idx].mec_hpd); 1182 ring->eop_gpu_addr = 1183 amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset); 1184 } 1185 1186 switch (queue_type) { 1187 case AMDGPU_RING_TYPE_GFX: 1188 ring->funcs = adev->gfx.gfx_ring[0].funcs; 1189 ring->me = adev->gfx.gfx_ring[0].me; 1190 ring->pipe = adev->gfx.gfx_ring[0].pipe; 1191 break; 1192 case AMDGPU_RING_TYPE_COMPUTE: 1193 ring->funcs = adev->gfx.compute_ring[0].funcs; 1194 ring->me = adev->gfx.compute_ring[0].me; 1195 ring->pipe = adev->gfx.compute_ring[0].pipe; 1196 break; 1197 case AMDGPU_RING_TYPE_SDMA: 1198 ring->funcs = adev->sdma.instance[0].ring.funcs; 1199 break; 1200 default: 1201 BUG(); 1202 } 1203 1204 r = amdgpu_ring_init(adev, ring, 1024, NULL, 0, 1205 AMDGPU_RING_PRIO_DEFAULT, NULL); 1206 if (r) 1207 goto clean_up_memory; 1208 1209 amdgpu_mes_ring_to_queue_props(adev, ring, &qprops); 1210 1211 dma_fence_wait(gang->process->vm->last_update, false); 1212 dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false); 1213 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1214 1215 r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id); 1216 if (r) 1217 goto clean_up_ring; ^^^^^^^^^^^^^^^^^^ 1218 1219 ring->hw_queue_id = queue_id; 1220 ring->doorbell_index = qprops.doorbell_off; 1221 1222 if (queue_type == AMDGPU_RING_TYPE_GFX) 1223 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id); 1224 else if (queue_type == AMDGPU_RING_TYPE_COMPUTE) 1225 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id, 1226 queue_id); 1227 else if (queue_type == AMDGPU_RING_TYPE_SDMA) 1228 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id, 1229 queue_id); 1230 else 1231 BUG(); 1232 1233 *out = ring; 1234 return 0; 1235 1236 clean_up_ring: 1237 amdgpu_ring_fini(ring); 1238 clean_up_memory: 1239 kfree(ring); --> 1240 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1241 return r; 1242 } Fixes: d0c423b64765 ("drm/amdgpu/mes: use ring for kernel queue submission") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Jack Xiao <Jack.Xiao@amd.com> Reported by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bfaf1883605fd0c0dbabacd67ed49708470d5ea4)
2024-10-15drm/amdgpu/mes: fix issue of writing to the same log buffer from 2 MES pipesMichael Chen
With Unified MES enabled in gfx12, need separate event log buffer for the 2 MES pipes to avoid data overwrite. Signed-off-by: Michael Chen <michael.chen@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 144df260f3daab42c4611021f929b3342de516e5) Cc: stable@vger.kernel.org # 6.11.x
2024-10-15drm/amdgpu: prevent BO_HANDLES error from being overwrittenMohammed Anees
Before this patch, if multiple BO_HANDLES chunks were submitted, the error -EINVAL would be correctly set but could be overwritten by the return value from amdgpu_cs_p1_bo_handles(). This patch ensures that if there are multiple BO_HANDLES, we stop. Fixes: fec5f8e8c6bc ("drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit") Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 40f2cd98828f454bdc5006ad3d94330a5ea164b7) Cc: stable@vger.kernel.org
2024-10-15drm/amdgpu: enable enforce_isolation sysfs node on VFsAlex Deucher
It should be enabled on both bare metal and VFs. Fixes: e189be9b2e38 ("drm/amdgpu: Add enforce_isolation sysfs attribute") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> (cherry picked from commit dc8847b054fd6679866ed4ee861e069e54c10799)
2024-10-15nvme-multipath: defer partition scanningKeith Busch
We need to suppress the partition scan from occuring within the controller's scan_work context. If a path error occurs here, the IO will wait until a path becomes available or all paths are torn down, but that action also occurs within scan_work, so it would deadlock. Defer the partion scan to a different context that does not block scan_work. Reported-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-10-15drm/amdgpu/swsmu: add automatic parameter to set_soft_freq_rangeAlex Deucher
On chips that support it, you can specificy 0 and 0xffff for min and max and the PMFW will use that to determine the optimal min and max. This enables optimal performance when the user manually switches between performance levels using sysfs. Previously we'd set soft min/max which could limit performance. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amdgpu: Fix off by one in current_memory_partition_show()Dan Carpenter
The >= ARRAY_SIZE() should be > ARRAY_SIZE() to prevent an out of bounds read. Fixes: 012be6f22c01 ("drm/amdgpu: Add sysfs interfaces for NPS mode") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUsAlex Deucher
This uses more aggressive hueristics than the the bootup default profile. On windows the OS has a special fullscreen 3D mode where this is used. Since we don't have the equivalent on Linux default to this profile for dGPUs. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3618 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1500 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Fixes: c50fe289ed72 ("drm/amdgpu/swsmu: always force a state reprogram on init") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amdgpu/swsmu: Only force workload setup on initAlex Deucher
Needed to set the workload type at init time so that we can apply the navi3x margin optimization. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3618 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Fixes: c50fe289ed72 ("drm/amdgpu/swsmu: always force a state reprogram on init") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/radeon: Fix encoder->possible_clonesVille Syrjälä
Include the encoder itself in its possible_clones bitmask. In the past nothing validated that drivers were populating possible_clones correctly, but that changed in commit 74d2aacbe840 ("drm: Validate encoder->possible_clones"). Looks like radeon never got the memo and is still not following the rules 100% correctly. This results in some warnings during driver initialization: Bogus possible_clones: [ENCODER:46:TV-46] possible_clones=0x4 (full encoder mask=0x7) WARNING: CPU: 0 PID: 170 at drivers/gpu/drm/drm_mode_config.c:615 drm_mode_config_validate+0x113/0x39c ... Cc: Alex Deucher <alexander.deucher@amd.com> Cc: amd-gfx@lists.freedesktop.org Fixes: 74d2aacbe840 ("drm: Validate encoder->possible_clones") Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://lore.kernel.org/dri-devel/20241009000321.418e4294@yea/ Tested-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amdgpu/smu13: always apply the powersave optimizationAlex Deucher
It can avoid margin issues in some very demanding applications. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3618 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Fixes: c50fe289ed72 ("drm/amdgpu/swsmu: always force a state reprogram on init") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: 3.2.305Aric Cyr
- Add sharpening policy to plane state - Clear pipe pointers on pipe reset - Resolve correct MALL size for dcn401 - Read Sink emission rate capability - IPX fixes - Coverity fixes Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: disable dynamic ODM when sharpness is enabledSamson Tam
[Why & How] Disable dynamic ODM when sharpness is enabled Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Check returned status from core_link_read_dpcdAlex Hung
[WHAT] The function core_link_read_dpcd returns status which is not used at all, making them useless assignments. [HOW] Print error messages if core_link_read_dpcd does not return DC_OK. This fixes 2 UNUSED_VALUE issues reported by Coverity. Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Clear pipe pointers on pipe resetJoshua Aberback
[Why] We want to clean up unnecessary asserts, one of which is an assert in resource_is_pipe_type that fires if a pipe has no stream and still has pointers to other pipes ("dangling state"). This gets hit because pipes are not properly cleaned up in reset_back_end_for_pipe. When resetting a pipe, the existing MPCC / ODM combine pointers are no longer valid, especially when we put ODM in bypass. [How] - reset pipe pointers in reset_back_end_for_pipe - remove useless code to avoid confusion (a long time ago it had a reason to be there, not anymore) Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Joshua Aberback <joshua.aberback@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: add sharpening policy to plane stateSamson Tam
[Why] Pass in sharpening policy through plane state from control side [How] Add sharpener support through dc_caps. Add sharpen policy to plane state and move to spl_input. Pass sharpen policy from plane state to SPL. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: resolve correct MALL size for dcn401Dillon Varone
[WHY] Code for dcn401 to calculate available MALL size for display was shared with dcn32 and did not provide the correct result for all ASICs. [HOW] Add dcn401 specific function to properly calculate the available MALL for display. Reviewed-by: Chris Park <chris.park@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Read Sink emission rate capabilityRobin Chen
[WHY] To get sink emission rate information for future supported refresh rate calculation. Reviewed-by: ChunTao Tso <chuntao.tso@amd.com> Signed-off-by: Robin Chen <robin.chen@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: print messages when programming shaper/3dlut failsAlex Hung
[WHAT & HOW] Print error messages when programming shaper lut or 3dlut fails. This fixes 5 UNUSED_VALUE issues reported by Coverity. Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Full exit out of IPS2 when all allow signals have been clearedLeo Chen
[Why] A race condition occurs between cursor movement and vertical interrupt control thread from OS, with both threads trying to exit IPS2. Vertical interrupt control thread clears the prev driver allow signal while not fully finishing the IPS2 exit process. [How] We want to detect all the allow signals have been cleared before we perform the full exit. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Leo Chen <leo.chen@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Fix Coverity change for visual confirmLeo (Hanghong) Ma
[Why && How] Previous change for Coverity has caused regression on visual confirm so fix it by reverting the part that affects visual confirm. Reviewed-by: Chris Park <chris.park@amd.com> Signed-off-by: Leo (Hanghong) Ma <hanghong.ma@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Add 3DLUT FL HW bug workaroundIlya Bakoulin
[Why] There is a known HW bug that causes the internal 3DLUT fetch signal to be lost at VREADY, regardless of whether the OTG lock is being held or not. A workaround is necessary to make sure that this internal signal stays up after OTG unlock. [How] Set the 3DLUT_ENABLE bit immediately before and after the unlock. Also use VUPDATE_KEEPOUT to prevent lock transition in the region between VSTARTUP and VREADY, which could cause issues with this WA sequence. Also including misc. 3DLUT DMA-related sequence fixes to address a few regressions causing corruption. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Add check for headless for idle optimizationRoman Li
[Why] Currently idle worker thread that checks for HPD while system is in IPS2 only supports headless and static screen use-cases. In other display-off scenarios hotplug may not work. [How] For display-off only allow idle optimization when no display is connected. Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Increase idle worker HPD detection timeRoman Li
[Why] Idle worker thread waits HPD_DETECTION_TIME for HPD processing complete. Some displays require longer time for that. [How] Increase HPD_DETECTION_TIME to 100ms. Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Remove useless assignmentsAlex Hung
[WHAT & HOW] "split_pipe" are assigned to test_pipe and then immediately are updated to other values. The same also applies to "status" as well. Similarly, "id", "dwb" and "unused_dpps" are assigned but the functions immediately return, and thus they have no effects. As a results, the assignments removed. This fixes 5 UNUSED_VALUE issues reported by Coverity. Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Check status from dpcd_get_tunneling_device_dataAlex Hung
[WHAT & HOW] dpcd_get_tunneling_device_data calls core_link_read_dpcd which can fail. The status from core_link_read_dpcd should be checked and error messages is printed in case of failures. This fixes 1 UNUSED_VALUE issue reported by Coverity. Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Check returns from drm_dp_dpcd_writeAlex Hung
[WHAT & HOW] drm_dp_dpcd_write() returns negative error on failure and thus returned values need to be checked. This fixes 3 UNUSED_VALUE issues reported by Coverity. Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amdgpu: Wait for reset on init completionLijo Lazar
When reset on initialization is requested, wait for the reset to finish. In cases where module is loaded after boot, this makes sure all initialization work is done after a successful return of modprobe. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ramesh Errabolu <ramesh.errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amdkfd: Accounting pdd vram_usage for svmPhilip Yang
Process device data pdd->vram_usage is read by rocm-smi via sysfs, this is currently missing the svm_bo usage accounting, so "rocm-smi --showpids" per process VRAM usage report is incorrect. Add pdd->vram_usage accounting when svm_bo allocation and release, change to atomic64_t type because it is updated outside process mutex now. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ringSrinivasan Shanmugam
This patch addresses a double unlock issue in the amdgpu_mes_add_ring function. The mutex was being unlocked twice under certain error conditions, which could lead to undefined behavior. The fix ensures that the mutex is unlocked only once before jumping to the clean_up_memory label. The unlock operation is moved to just before the goto statement within the conditional block that checks the return value of amdgpu_ring_init. This prevents the second unlock attempt after the clean_up_memory label, which is no longer necessary as the mutex is already unlocked by this point in the code flow. This change resolves the potential double unlock and maintains the correct mutex handling throughout the function. Fixes below: Commit d0c423b64765 ("drm/amdgpu/mes: use ring for kernel queue submission"), leads to the following Smatch static checker warning: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring() warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213) drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 1143 int amdgpu_mes_add_ring(struct amdgpu_device *adev, int gang_id, 1144 int queue_type, int idx, 1145 struct amdgpu_mes_ctx_data *ctx_data, 1146 struct amdgpu_ring **out) 1147 { 1148 struct amdgpu_ring *ring; 1149 struct amdgpu_mes_gang *gang; 1150 struct amdgpu_mes_queue_properties qprops = {0}; 1151 int r, queue_id, pasid; 1152 1153 /* 1154 * Avoid taking any other locks under MES lock to avoid circular 1155 * lock dependencies. 1156 */ 1157 amdgpu_mes_lock(&adev->mes); 1158 gang = idr_find(&adev->mes.gang_id_idr, gang_id); 1159 if (!gang) { 1160 DRM_ERROR("gang id %d doesn't exist\n", gang_id); 1161 amdgpu_mes_unlock(&adev->mes); 1162 return -EINVAL; 1163 } 1164 pasid = gang->process->pasid; 1165 1166 ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL); 1167 if (!ring) { 1168 amdgpu_mes_unlock(&adev->mes); 1169 return -ENOMEM; 1170 } 1171 1172 ring->ring_obj = NULL; 1173 ring->use_doorbell = true; 1174 ring->is_mes_queue = true; 1175 ring->mes_ctx = ctx_data; 1176 ring->idx = idx; 1177 ring->no_scheduler = true; 1178 1179 if (queue_type == AMDGPU_RING_TYPE_COMPUTE) { 1180 int offset = offsetof(struct amdgpu_mes_ctx_meta_data, 1181 compute[ring->idx].mec_hpd); 1182 ring->eop_gpu_addr = 1183 amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset); 1184 } 1185 1186 switch (queue_type) { 1187 case AMDGPU_RING_TYPE_GFX: 1188 ring->funcs = adev->gfx.gfx_ring[0].funcs; 1189 ring->me = adev->gfx.gfx_ring[0].me; 1190 ring->pipe = adev->gfx.gfx_ring[0].pipe; 1191 break; 1192 case AMDGPU_RING_TYPE_COMPUTE: 1193 ring->funcs = adev->gfx.compute_ring[0].funcs; 1194 ring->me = adev->gfx.compute_ring[0].me; 1195 ring->pipe = adev->gfx.compute_ring[0].pipe; 1196 break; 1197 case AMDGPU_RING_TYPE_SDMA: 1198 ring->funcs = adev->sdma.instance[0].ring.funcs; 1199 break; 1200 default: 1201 BUG(); 1202 } 1203 1204 r = amdgpu_ring_init(adev, ring, 1024, NULL, 0, 1205 AMDGPU_RING_PRIO_DEFAULT, NULL); 1206 if (r) 1207 goto clean_up_memory; 1208 1209 amdgpu_mes_ring_to_queue_props(adev, ring, &qprops); 1210 1211 dma_fence_wait(gang->process->vm->last_update, false); 1212 dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false); 1213 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1214 1215 r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id); 1216 if (r) 1217 goto clean_up_ring; ^^^^^^^^^^^^^^^^^^ 1218 1219 ring->hw_queue_id = queue_id; 1220 ring->doorbell_index = qprops.doorbell_off; 1221 1222 if (queue_type == AMDGPU_RING_TYPE_GFX) 1223 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id); 1224 else if (queue_type == AMDGPU_RING_TYPE_COMPUTE) 1225 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id, 1226 queue_id); 1227 else if (queue_type == AMDGPU_RING_TYPE_SDMA) 1228 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id, 1229 queue_id); 1230 else 1231 BUG(); 1232 1233 *out = ring; 1234 return 0; 1235 1236 clean_up_ring: 1237 amdgpu_ring_fini(ring); 1238 clean_up_memory: 1239 kfree(ring); --> 1240 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1241 return r; 1242 } Fixes: d0c423b64765 ("drm/amdgpu/mes: use ring for kernel queue submission") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Jack Xiao <Jack.Xiao@amd.com> Reported by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amd/display: Add hpd_source index out-of-bounds check for dcn3x link ↵Srinivasan Shanmugam
encoder creation This patch adds a boundary check for the hpd_source index during the link encoder creation process for dcn3x IP's. The check ensures that the index is within the valid range of the link_enc_hpd_regs array to prevent out-of-bounds access. Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amdgpu/mes: fix issue of writing to the same log buffer from 2 MES pipesMichael Chen
With Unified MES enabled in gfx12, need separate event log buffer for the 2 MES pipes to avoid data overwrite. Signed-off-by: Michael Chen <michael.chen@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amdgpu: Show current compute partition on VFLijo Lazar
Enable sysfs node for current compute partition mode on VFs also. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com> Tested-by: Vignesh Chander <Vignesh.Chander@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amdgpu: Fetch NPS mode for GCv9.4.3 VFsLijo Lazar
Use the memory ranges published in discovery table to deduce NPS mode of GC v9.4.3 VFs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com> Tested-by: Vignesh Chander <Vignesh.Chander@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15ring-buffer: Fix reader locking when changing the sub buffer orderPetr Pavlu
The function ring_buffer_subbuf_order_set() updates each ring_buffer_per_cpu and installs new sub buffers that match the requested page order. This operation may be invoked concurrently with readers that rely on some of the modified data, such as the head bit (RB_PAGE_HEAD), or the ring_buffer_per_cpu.pages and reader_page pointers. However, no exclusive access is acquired by ring_buffer_subbuf_order_set(). Modifying the mentioned data while a reader also operates on them can then result in incorrect memory access and various crashes. Fix the problem by taking the reader_lock when updating a specific ring_buffer_per_cpu in ring_buffer_subbuf_order_set(). Link: https://lore.kernel.org/linux-trace-kernel/20240715145141.5528-1-petr.pavlu@suse.com/ Link: https://lore.kernel.org/linux-trace-kernel/20241010195849.2f77cc3f@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20241011112850.17212b25@gandalf.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241015112440.26987-1-petr.pavlu@suse.com Fixes: 8e7b58c27b3c ("ring-buffer: Just update the subbuffers when changing their allocation order") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>