summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-15drm/amd/pm: Fix temperature unit of SMU v13.0.6Lijo Lazar
Temperature needs to be reported in millidegree Celsius. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amd: Use pci_dev_id() to simplify the codeXiongfeng Wang
PCI core API pci_dev_id() can be used to get the BDF number for a pci device. We don't need to compose it mannually. Use pci_dev_id() to simplify the code a little bit. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amdkfd: fix double assign skip process context clearJonathan Kim
Remove redundant assignment when skipping process ctx clear. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amd/display: Update replay for clk_mgr optimizationsBhawanpreet Lakha
Add Replay calls to clk_mgr updates (just like PSR) Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amdgpu: Fix identifier names to function definition arguments in atom.hSrinivasan Shanmugam
Fixes the following: WARNING: function definition argument 'struct card_info *' should also have an identifier name WARNING: function definition argument 'uint32_t' should also have an identifier name WARNING: function definition argument 'void *' should also have an identifier name WARNING: function definition argument 'struct atom_context *' should also have an identifier name WARNING: function definition argument 'int' should also have an identifier name WARNING: function definition argument 'uint32_t *' should also have an identifier name WARNING: Unnecessary space before function pointer name ERROR: space prohibited after that '*' (ctx:BxW) CHECK: Prefer kernel type 'u32' over 'uint32_t' Cc: Guchun Chen <guchun.chen@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amdgpu/pm: fix throttle_status for other than MP1 11.0.7Umio Yasuno
Use the right metrics table version based on the firmware. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2720 Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amdgpu: mode1 reset needs to recover mp1 for mp0 v13_0_10YiPeng Chai
Mode1 reset needs to recover mp1 in fatal error case for mp0 v13_0_10. v2: Define a macro to wrap psp function calls. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amd/pm: avoid driver getting empty metrics table for the first timeYang Wang
add metrics.AccumulationCouter check to avoid driver getting an empty metrics data since metrics table not updated completely in pmfw side. Signed-off-by: Yang Wang <KevinYang.Wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Tested-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amdkfd: Use memdup_user() rather than duplicating its implementationAtul Raut
To prevent its redundant implementation and streamline code, use memdup_user. This fixes warnings reported by Coccinelle: ./drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:2811:13-20: WARNING opportunity for memdup_user Signed-off-by: Atul Raut <rauji.raut@gmail.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amdgpu: Remove unnecessary ras cap checkHawking Zhang
RAS global isr will only be invoked by hardware interrupt. Don't need to query ras capability in isr In addition, amdgpu_ras_interrupt_fatal_error_handler ensures the isr won't be called from guest linux side by accident. The RAS cap check in isr that introduced to fix sriov crash is not needed any more Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amdkfd: fix build failure without CONFIG_DYNAMIC_DEBUGArnd Bergmann
When CONFIG_DYNAMIC_DEBUG is disabled altogether, calling _dynamic_func_call_no_desc() does not work: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c: In function 'svm_range_set_attr': drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:52:9: error: implicit declaration of function '_dynamic_func_call_no_desc' [-Werror=implicit-function-declaration] 52 | _dynamic_func_call_no_desc("svm_range_dump", svm_range_debug_dump, svms) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:3564:9: note: in expansion of macro 'dynamic_svm_range_dump' 3564 | dynamic_svm_range_dump(svms); | ^~~~~~~~~~~~~~~~~~~~~~ Add a compile-time conditional in addition to the runtime check. Fixes: 8923137dbe4b ("drm/amdkfd: avoid svm dump when dynamic debug disabled") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16netfilter: nft_dynset: disallow object mapsPablo Neira Ayuso
Do not allow to insert elements from datapath to objects maps. Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: GC transaction race with netns dismantlePablo Neira Ayuso
Use maybe_get_net() since GC workqueue might race with netns exit path. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: fix GC transaction races with netns and netlink event ↵Pablo Neira Ayuso
exit path Netlink event path is missing a synchronization point with GC transactions. Add GC sequence number update to netns release path and netlink event path, any GC transaction losing race will be discarded. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16ipvs: fix racy memcpy in proc_do_sync_thresholdSishuai Gong
When two threads run proc_do_sync_threshold() in parallel, data races could happen between the two memcpy(): Thread-1 Thread-2 memcpy(val, valp, sizeof(val)); memcpy(valp, val, sizeof(val)); This race might mess up the (struct ctl_table *) table->data, so we add a mutex lock to serialize them. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/netdev/B6988E90-0A1E-4B85-BF26-2DAF6D482433@gmail.com/ Signed-off-by: Sishuai Gong <sishuai.system@gmail.com> Acked-by: Simon Horman <horms@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: set default timeout to 3 secs for sctp shutdown send and recv stateXin Long
In SCTP protocol, it is using the same timer (T2 timer) for SHUTDOWN and SHUTDOWN_ACK retransmission. However in sctp conntrack the default timeout value for SCTP_CONNTRACK_SHUTDOWN_ACK_SENT state is 3 secs while it's 300 msecs for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV state. As Paolo Valerio noticed, this might cause unwanted expiration of the ct entry. In my test, with 1s tc netem delay set on the NAT path, after the SHUTDOWN is sent, the sctp ct entry enters SCTP_CONNTRACK_SHUTDOWN_SEND state. However, due to 300ms (too short) delay, when the SHUTDOWN_ACK is sent back from the peer, the sctp ct entry has expired and been deleted, and then the SHUTDOWN_ACK has to be dropped. Also, it is confusing these two sysctl options always show 0 due to all timeout values using sec as unit: net.netfilter.nf_conntrack_sctp_timeout_shutdown_recd = 0 net.netfilter.nf_conntrack_sctp_timeout_shutdown_sent = 0 This patch fixes it by also using 3 secs for sctp shutdown send and recv state in sctp conntrack, which is also RTO.initial value in SCTP protocol. Note that the very short time value for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV was probably used for a rare scenario where SHUTDOWN is sent on 1st path but SHUTDOWN_ACK is replied on 2nd path, then a new connection started immediately on 1st path. So this patch also moves from SHUTDOWN_SEND/RECV to CLOSE when receiving INIT in the ORIGINAL direction. Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Reported-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: don't fail inserts if duplicate has expiredFlorian Westphal
nftables selftests fail: run-tests.sh testcases/sets/0044interval_overlap_0 Expected: 0-2 . 0-3, got: W: [FAILED] ./testcases/sets/0044interval_overlap_0: got 1 Insertion must ignore duplicate but expired entries. Moreover, there is a strange asymmetry in nft_pipapo_activate: It refetches the current element, whereas the other ->activate callbacks (bitmap, hash, rhash, rbtree) use elem->priv. Same for .remove: other set implementations take elem->priv, nft_pipapo_remove fetches elem->priv, then does a relookup, remove this. I suspect this was the reason for the change that prompted the removal of the expired check in pipapo_get() in the first place, but skipping exired elements there makes no sense to me, this helper is used for normal get requests, insertions (duplicate check) and deactivate callback. In first two cases expired elements must be skipped. For ->deactivate(), this gets called for DELSETELEM, so it seems to me that expired elements should be skipped as well, i.e. delete request should fail with -ENOENT error. Fixes: 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk") Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: deactivate catchall elements in next generationFlorian Westphal
When flushing, individual set elements are disabled in the next generation via the ->flush callback. Catchall elements are not disabled. This is incorrect and may lead to double-deactivations of catchall elements which then results in memory leaks: WARNING: CPU: 1 PID: 3300 at include/net/netfilter/nf_tables.h:1172 nft_map_deactivate+0x549/0x730 CPU: 1 PID: 3300 Comm: nft Not tainted 6.5.0-rc5+ #60 RIP: 0010:nft_map_deactivate+0x549/0x730 [..] ? nft_map_deactivate+0x549/0x730 nf_tables_delset+0xb66/0xeb0 (the warn is due to nft_use_dec() detecting underflow). Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Reported-by: lonial con <kongln9170@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: fix kdoc warnings after gc reworkFlorian Westphal
Jakub Kicinski says: We've got some new kdoc warnings here: net/netfilter/nft_set_pipapo.c:1557: warning: Function parameter or member '_set' not described in 'pipapo_gc' net/netfilter/nft_set_pipapo.c:1557: warning: Excess function parameter 'set' description in 'pipapo_gc' include/net/netfilter/nf_tables.h:577: warning: Function parameter or member 'dead' not described in 'nft_set' Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20230810104638.746e46f1@kernel.org/ Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-16netfilter: nf_tables: fix false-positive lockdep splatFlorian Westphal
->abort invocation may cause splat on debug kernels: WARNING: suspicious RCU usage net/netfilter/nft_set_pipapo.c:1697 suspicious rcu_dereference_check() usage! [..] rcu_scheduler_active = 2, debug_locks = 1 1 lock held by nft/133554: [..] (nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid [..] lockdep_rcu_suspicious+0x1ad/0x260 nft_pipapo_abort+0x145/0x180 __nf_tables_abort+0x5359/0x63d0 nf_tables_abort+0x24/0x40 nfnetlink_rcv+0x1a0a/0x22c0 netlink_unicast+0x73c/0x900 netlink_sendmsg+0x7f0/0xc20 ____sys_sendmsg+0x48d/0x760 Transaction mutex is held, so parallel updates are not possible. Switch to _protected and check mutex is held for lockdep enabled builds. Fixes: 212ed75dc5fb ("netfilter: nf_tables: integrate pipapo into commit protocol") Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-15drm/amdgpu: disable mcbp if parameter zero is setJiadong Zhu
The parameter amdgpu_mcbp shall have priority against the default value calculated from the chip version. User could disable mcbp by setting the parameter mcbp as zero. v2: do not trigger preemption in sw ring muxer when mcbp is disabled. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/radeon: Fix multiple line dereference in 'atom_iio_execute'Srinivasan Shanmugam
Fixes the following: WARNING: Avoid multiple line dereference - prefer 'ctx->io_attr' + ((ctx-> + io_attr >> CU8(base + 2)) & (0xFFFFFFFF >> (32 - Cc: Guchun Chen <guchun.chen@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amd/pm: Add vclk and dclk sysnode for GC 9.4.3Asad Kamal
Expose sysfs vclck and dclk entries for GC version 9.4.3 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amd/pm: disallow the fan setting if there is no fan on smu 13.0.0Kenneth Feng
drm/amd/pm: disallow the fan setting if there is no fan on smu 13.0.0 V2: depend on pm.no_fan to check Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amdkfd: Add missing tba_hi programming on aldebaranJay Cornwall
Previously asymptomatic because high 32 bits were zero. Fixes: 96c211f1f9ef ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole") Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amdgpu: Fix missing comment for mb() in 'amdgpu_device_aper_access'Srinivasan Shanmugam
This patch adds the missing code comment for memory barrier WARNING: memory barrier without comment + mb(); WARNING: memory barrier without comment + mb(); Cc: Guchun Chen <guchun.chen@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amd/display: Add Replay supported/enabled checksBhawanpreet Lakha
- Add checks for Cursor update and dirty rects (sending updates to dmub) - Add checks for dc_notify_vsync, and fbc and subvp Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15fbdev: goldfishfb: Do not check 0 for platform_get_irq()Zhu Wang
Since platform_get_irq() never returned zero, so it need not to check whether it returned zero, and we use the return error code of platform_get_irq() to replace the current return error code. Please refer to the commit a85a6c86c25b ("driver core: platform: Clarify that IRQ 0 is invalid") to get that platform_get_irq() never returned zero. Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-15fbdev: atmel_lcdfb: Remove redundant of_match_ptr()Ruan Jinjie
The driver depends on CONFIG_OF, it is not necessary to use of_match_ptr() here. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-15fbdev: kyro: Remove unused declarationsYue Haibing
These declarations is never implemented since the beginning of git history. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-15Merge tag 'parisc-for-6.5-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "Fix the parisc TLB ptlock checks so that they can be enabled together with the lightweight spinlock checks" * tag 'parisc-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix CONFIG_TLB_PTLOCK to work with lightweight spinlock checks
2023-08-15Merge tag '6.5-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: "Three smb client fixes, all for stable: - fix for oops in unmount race with lease break of deferred close - debugging improvement for reconnect - fix for fscache deadlock (folio_wait_bit_common hang)" * tag '6.5-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: display network namespace in debug information cifs: Release folio lock on fscache read hit. cifs: fix potential oops in cifs_oplock_break
2023-08-15Merge tag 'regulator-fix-v6.5-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "Two small driver specific fixes: one incorrect definition for one of the Qualcomm regulators and better handling of poorly formed DTs in the DA9063 driver" * tag 'regulator-fix-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: qcom-rpmh: Fix LDO 12 regulator for PM8550 regulator: da9063: better fix null deref with partial DT
2023-08-15net: fix the RTO timer retransmitting skb every 1ms if linear option is enabledJason Xing
In the real workload, I encountered an issue which could cause the RTO timer to retransmit the skb per 1ms with linear option enabled. The amount of lost-retransmitted skbs can go up to 1000+ instantly. The root cause is that if the icsk_rto happens to be zero in the 6th round (which is the TCP_THIN_LINEAR_RETRIES value), then it will always be zero due to the changed calculation method in tcp_retransmit_timer() as follows: icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX); Above line could be converted to icsk->icsk_rto = min(0 << 1, TCP_RTO_MAX) = 0 Therefore, the timer expires so quickly without any doubt. I read through the RFC 6298 and found that the RTO value can be rounded up to a certain value, in Linux, say TCP_RTO_MIN as default, which is regarded as the lower bound in this patch as suggested by Eric. Fixes: 36e31b0af587 ("net: TCP thin linear timeouts") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-15Merge tag 'asoc-fix-v6.5-rc6' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.5 A fairly large collection of fixes here, mostly SOF and Intel related. The one core fix is Hans' change which reduces the log spam when working out new use cases for DPCM.
2023-08-15drm/msm/adreno: Add missing MODULE_FIRMWARE macrosJuerg Haefliger
The driver references some firmware files that don't have corresponding MODULE_FIRMWARE macros and thus won't be listed via modinfo. Fix that. Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Patchwork: https://patchwork.freedesktop.org/patch/543290/ [rob: drop a690_gmu.bin as a690 is using same fw as a660 now] Signed-off-by: Rob Clark <robdclark@chromium.org>
2023-08-15drm/msm/gpu: Push gpu lock down past runpmRob Clark
Avoid holding gpu lock when calling runpm, to avoid this lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.4.3-debug+ #14 Not tainted ------------------------------------------------------ ring0/373 is trying to acquire lock: ffffffead86efb98 (prepare_lock){+.+.}-{3:3}, at: clk_prepare_lock+0x70/0x98 but task is already holding lock: ffffff809cd19170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&gpu->lock){+.+.}-{3:3}: __mutex_lock+0xc8/0x388 mutex_lock_nested+0x2c/0x38 msm_job_run+0x7c/0x128 [msm] drm_sched_main+0x264/0x354 [gpu_sched] kthread+0xf0/0x100 ret_from_fork+0x10/0x20 -> #3 (dma_fence_map){++++}-{0:0}: __dma_fence_might_wait+0x74/0xc0 dma_resv_lockdep+0x1f0/0x2e8 do_one_initcall+0xb4/0x214 kernel_init_freeable+0x338/0x33c kernel_init+0x30/0x134 ret_from_fork+0x10/0x20 -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: fs_reclaim_acquire+0x7c/0x9c slab_pre_alloc_hook.constprop.0+0x40/0x250 __kmem_cache_alloc_node+0x60/0x18c kmalloc_node_trace+0x40/0x84 alloc_worker+0x2c/0x64 init_rescuer+0x34/0xe0 workqueue_init+0x168/0x1fc kernel_init_freeable+0x15c/0x33c kernel_init+0x30/0x134 ret_from_fork+0x10/0x20 -> #1 (fs_reclaim){+.+.}-{0:0}: __fs_reclaim_acquire+0x3c/0x48 fs_reclaim_acquire+0x50/0x9c slab_pre_alloc_hook.constprop.0+0x40/0x250 __kmem_cache_alloc_node+0x60/0x18c kmalloc_trace+0x44/0x88 clk_rcg2_dfs_determine_rate+0x60/0x214 clk_core_determine_round_nolock+0xb8/0xf0 clk_core_round_rate_nolock+0x84/0x118 clk_core_round_rate_nolock+0xd8/0x118 clk_round_rate+0x6c/0xd0 geni_se_clk_tbl_get+0x78/0xc0 geni_se_clk_freq_match+0x44/0xe4 get_spi_clk_cfg+0x50/0xf4 geni_spi_set_clock_and_bw+0x54/0x104 spi_geni_prepare_message+0x130/0x174 __spi_pump_transfer_message+0x200/0x4d8 __spi_sync+0x13c/0x23c spi_sync_locked+0x18/0x24 do_cros_ec_pkt_xfer_spi+0x124/0x3f0 cros_ec_xfer_high_pri_work+0x28/0x3c kthread_worker_fn+0x14c/0x27c kthread+0xf0/0x100 ret_from_fork+0x10/0x20 -> #0 (prepare_lock){+.+.}-{3:3}: __lock_acquire+0xdf8/0x109c lock_acquire+0x234/0x284 __mutex_lock+0xc8/0x388 mutex_lock_nested+0x2c/0x38 clk_prepare_lock+0x70/0x98 clk_prepare+0x24/0x50 clk_bulk_prepare+0x50/0x9c a6xx_gmu_resume+0x94/0x800 [msm] a6xx_gmu_pm_resume+0x38/0x158 [msm] adreno_runtime_resume+0x2c/0x38 [msm] pm_generic_runtime_resume+0x30/0x44 __rpm_callback+0x4c/0x134 rpm_callback+0x78/0x7c rpm_resume+0x3a4/0x46c __pm_runtime_resume+0x78/0xbc pm_runtime_get_sync.isra.0+0x14/0x20 [msm] msm_gpu_submit+0x4c/0x12c [msm] msm_job_run+0x88/0x128 [msm] drm_sched_main+0x264/0x354 [gpu_sched] kthread+0xf0/0x100 ret_from_fork+0x10/0x20 other info that might help us debug this: Chain exists of: prepare_lock --> dma_fence_map --> &gpu->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&gpu->lock); lock(dma_fence_map); lock(&gpu->lock); lock(prepare_lock); *** DEADLOCK *** 2 locks held by ring0/373: #0: ffffffead875ae50 (dma_fence_map){++++}-{0:0}, at: drm_sched_main+0x54/0x354 [gpu_sched] #1: ffffff809cd19170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm] stack backtrace: CPU: 2 PID: 373 Comm: ring0 Not tainted 6.4.3-debug+ #14 Hardware name: Google Villager (rev1+) with LTE (DT) Call trace: dump_backtrace+0xb4/0xf0 show_stack+0x20/0x30 dump_stack_lvl+0x60/0x84 dump_stack+0x18/0x24 print_circular_bug+0x1cc/0x234 check_noncircular+0x78/0xac __lock_acquire+0xdf8/0x109c lock_acquire+0x234/0x284 __mutex_lock+0xc8/0x388 mutex_lock_nested+0x2c/0x38 clk_prepare_lock+0x70/0x98 clk_prepare+0x24/0x50 clk_bulk_prepare+0x50/0x9c a6xx_gmu_resume+0x94/0x800 [msm] a6xx_gmu_pm_resume+0x38/0x158 [msm] adreno_runtime_resume+0x2c/0x38 [msm] pm_generic_runtime_resume+0x30/0x44 __rpm_callback+0x4c/0x134 rpm_callback+0x78/0x7c rpm_resume+0x3a4/0x46c __pm_runtime_resume+0x78/0xbc pm_runtime_get_sync.isra.0+0x14/0x20 [msm] msm_gpu_submit+0x4c/0x12c [msm] msm_job_run+0x88/0x128 [msm] drm_sched_main+0x264/0x354 [gpu_sched] kthread+0xf0/0x100 ret_from_fork+0x10/0x20 Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/552298/
2023-08-15accel/qaic: Clean up integer overflow checking in map_user_pages()Dan Carpenter
The encode_dma() function has some validation on in_trans->size but it would be more clear to move those checks to find_and_map_user_pages(). The encode_dma() had two checks: if (in_trans->addr + in_trans->size < in_trans->addr || !in_trans->size) return -EINVAL; The in_trans->addr variable is the starting address. The in_trans->size variable is the total size of the transfer. The transfer can occur in parts and the resources->xferred_dma_size tracks how many bytes we have already transferred. This patch introduces a new variable "remaining" which represents the amount we want to transfer (in_trans->size) minus the amount we have already transferred (resources->xferred_dma_size). I have modified the check for if in_trans->size is zero to instead check if in_trans->size is less than resources->xferred_dma_size. If we have already transferred more bytes than in_trans->size then there are negative bytes remaining which doesn't make sense. If there are zero bytes remaining to be copied, just return success. The check in encode_dma() checked that "addr + size" could not overflow and barring a driver bug that should work, but it's easier to check if we do this in parts. First check that "in_trans->addr + resources->xferred_dma_size" is safe. Then check that "xfer_start_addr + remaining" is safe. My final concern was that we are dealing with u64 values but on 32bit systems the kmalloc() function will truncate the sizes to 32 bits. So I calculated "total = in_trans->size + offset_in_page(xfer_start_addr);" and returned -EINVAL if it were >= SIZE_MAX. This will not affect 64bit systems. Fixes: 129776ac2e38 ("accel/qaic: Add control path") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/24d3348b-25ac-4c1b-b171-9dae7c43e4e0@moroto.mountain
2023-08-15accel/qaic: Fix slicing memory leakPranjal Ramajor Asha Kanojiya
The temporary buffer storing slicing configuration data from user is only freed on error. This is a memory leak. Free the buffer unconditionally. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230802145937.14827-1-quic_jhugo@quicinc.com
2023-08-15mmc: f-sdh30: fix order of function calls in sdhci_f_sdh30_removeYangtao Li
The order of function calls in sdhci_f_sdh30_remove is wrong, let's call sdhci_pltfm_unregister first. Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Fixes: 5def5c1c15bf ("mmc: sdhci-f-sdh30: Replace with sdhci_pltfm") Signed-off-by: Yangtao Li <frank.li@vivo.com> Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230727070051.17778-62-frank.li@vivo.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-08-15Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Just a bunch of bugfixes all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (26 commits) virtio-mem: check if the config changed before fake offlining memory virtio-mem: keep retrying on offline_and_remove_memory() errors in Sub Block Mode (SBM) virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY virtio-mem: remove unsafe unplug in Big Block Mode (BBM) pds_vdpa: fix up debugfs feature bit printing pds_vdpa: alloc irq vectors on DRIVER_OK pds_vdpa: clean and reset vqs entries pds_vdpa: always allow offering VIRTIO_NET_F_MAC pds_vdpa: reset to vdpa specified mac virtio-net: Zero max_tx_vq field for VIRTIO_NET_CTRL_MQ_HASH_CONFIG case vdpa/mlx5: Fix crash on shutdown for when no ndev exists vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary vdpa/mlx5: Fix mr->initialized semantics vdpa/mlx5: Correct default number of queues when MQ is on virtio-vdpa: Fix cpumask memory leak in virtio_vdpa_find_vqs() vduse: Use proper spinlock for IRQ injection vdpa: Enable strict validation for netlinks ops vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length check vdpa: Add queue index attr to vdpa_nl_policy for nlattr length check vdpa: Add features attr to vdpa_nl_policy for nlattr length check ...
2023-08-15Merge tag 'amd-drm-next-6.6-2023-08-11' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amdgpu: - SDMA 6.1.0 support - SMU 13.x fixes - PSP 13.x fixes - HDP 6.1 support - SMUIO 14.0 support - IH 6.1 support - Coding style cleanups - Misc display fixes - Initial Freesync panel replay support - RAS fixes - SDMA 5.2 MGCG updates - SR-IOV fixes - DCN3+ gamma fix - Revert zpos properly until IGT regression is fixed - NBIO 7.9 fixes - Use TTM to manage the doorbell BAR - Async flip fix - DPIA tracing support - DCN 3.x TMDS HDMI fixes - FRU fixes amdkfd: - Coding style cleanups - SVM fixes - Trap handler fixes - Convert older APUs to use dGPU path like newer APUs - Drop IOMMUv2 path as it is no longer used radeon: - Coding style cleanups drm buddy: - Fix debugging output UAPI: - A new memory pool was added to amdgpu_drm.h since we converted doorbell BAR management to use TTM, but userspace is blocked from allocating from it at this point, so kind of not really anything new here per se Signed-off-by: Dave Airlie <airlied@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZNahZwAKCRC93/aFa7yZ # 2KNjAP0UV2vJZjrze7OQI/YoI+40UlGjS81nKGlMIN3eR8nzvAD/c9McLJViL82R # idEAK7tsr/MaCKoPAlED7CkUZiHNlQw= # =4w7I # -----END PGP SIGNATURE----- # gpg: Signature made Sat 12 Aug 2023 07:00:23 AEST # gpg: using EDDSA key 203B921D836B5735349902BDBDDFF6856BBC99D8 # gpg: Can't check signature: No public key From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230811211554.7804-1-alexander.deucher@amd.com
2023-08-15Merge tag 'drm-next-20230814' of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/pinchartl/linux into drm-next Renesas R-Car DU miscellaneous changes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230814130531.GC22929@pendragon.ideasonboard.com
2023-08-14net: veth: Page pool creation error handling for existing pools onlyLiang Chen
The failure handling procedure destroys page pools for all queues, including those that haven't had their page pool created yet. this patch introduces necessary adjustments to prevent potential risks and inconsistency with the error handling behavior. Fixes: 0ebab78cbcbf ("net: veth: add page_pool for page recycling") Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Link: https://lore.kernel.org/r/20230812023016.10553-1-liangchen.linux@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14Merge branch 'octeon_ep-fixes-for-error-and-remove-paths'Jakub Kicinski
Michal Schmidt says: ==================== octeon_ep: fixes for error and remove paths I have an Octeon card that's misconfigured in a way that exposes a couple of bugs in the octeon_ep driver's error paths. It can reproduce the issues that patches 1 & 4 are fixing. Patches 2 & 3 are a result of reviewing the nearby code. ==================== Link: https://lore.kernel.org/r/20230810150114.107765-1-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14octeon_ep: cancel queued works in probe error pathMichal Schmidt
If it fails to get the devices's MAC address, octep_probe exits while leaving the delayed work intr_poll_task queued. When the work later runs, it's a use after free. Move the cancelation of intr_poll_task from octep_remove into octep_device_cleanup. This does not change anything in the octep_remove flow, but octep_device_cleanup is called also in the octep_probe error path, where the cancelation is needed. Note that the cancelation of ctrl_mbox_task has to follow intr_poll_task's, because the ctrl_mbox_task may be queued by intr_poll_task. Fixes: 24d4333233b3 ("octeon_ep: poll for control messages") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Link: https://lore.kernel.org/r/20230810150114.107765-5-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14octeon_ep: cancel ctrl_mbox_task after intr_poll_taskMichal Schmidt
intr_poll_task may queue ctrl_mbox_task. The function octep_poll_non_ioq_interrupts_cn93_pf does this. When removing the driver and canceling these two works, cancel ctrl_mbox_task last to guarantee it does not run anymore. Fixes: 24d4333233b3 ("octeon_ep: poll for control messages") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Link: https://lore.kernel.org/r/20230810150114.107765-4-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14octeon_ep: cancel tx_timeout_task later in remove sequenceMichal Schmidt
tx_timeout_task is canceled too early when removing the driver. Nothing prevents .ndo_tx_timeout from triggering and queuing the work again. Better cancel it after the netdev is unregistered. It's harmless for octep_tx_timeout_task to run in the window between the unregistration and cancelation, because it checks netif_running. Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Link: https://lore.kernel.org/r/20230810150114.107765-3-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14octeon_ep: fix timeout value for waiting on mbox responseMichal Schmidt
The intention was to wait up to 500 ms for the mbox response. The third argument to wait_event_interruptible_timeout() is supposed to be the timeout duration. The driver mistakenly passed absolute time instead. Fixes: 577f0d1b1c5f ("octeon_ep: add separate mailbox command and response queues") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230810150114.107765-2-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15Merge tag 'mediatek-drm-next-6.6' of ↵Dave Airlie
https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-next Mediatek DRM Next for Linux 6.6 1. Small mtk-dpi cleanups 2. DisplayPort: support eDP and aux-bus 3. Fix uninitialized symbol 4. Do not check for 0 return after calling platform_get_irq() 5. Convert to platform remove callback returning void 6. Fix coverity issues 7. Fix potential memory leak if vmap() fail 8. Fix void-pointer-to-enum-cast warning 9. Rid W=1 warnings from GPU Signed-off-by: Dave Airlie <airlied@redhat.com> From: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230813152726.14802-1-chunkuang.hu@kernel.org