summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-05-22IB/IPoIB: Enqueue separate work_structs for each flushed interfaceCosmin Ratiu
Previously, flushing a netdevice involved first flushing all child devices from the flush task itself. That requires holding the lock that protects the list for the entire duration of the flush. This poses a problem when converting from vlan_rwsem to the netdev instance lock (next patch), because holding the parent lock while trying to acquire a child lock makes lockdep unhappy, rightfully. Fix this by splitting a big flush task into individual flush tasks (all are already created in their respective ipoib_dev_priv structs) and defining a helper function to enqueue all of them while holding the list lock. In ipoib_set_mac, the function is not used and the task is enqueued directly, because in the subsequent patches locking is changed and this function may be called with the netdev instance lock held. This is effectively a noop, the wq is single-threaded and ordered and will execute the same flush operations in the same order as before. Furthermore, there should be no new races because ipoib_parent_unregister_pre() calls flush_workqueue() after stopping new work generation to wait for pending work to complete. flush_workqueue() waits for all currently enqueued work to finish before returning. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747829342-1018757-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22Revert "drm/amd: Keep display off while going into S4"Mario Limonciello
commit 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4") attempted to keep displays off during the S4 sequence by not resuming display IP. This however leads to hangs because DRM clients such as the console can try to access registers and cause a hang. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4155 Fixes: 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250522141328.115095-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e485502c37b097b0bd773baa7e2741bf7bd2909a) Cc: stable@vger.kernel.org
2025-05-22Merge tag 'pinctrl-v6.15-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "This deals with a crash in the Qualcomm pin controller GPIO parts when using hogs. The first patch to gpiolib makes gpiochip_line_is_valid() NULL-tolerant. The second patch fixes the actual problem" * tag 'pinctrl-v6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: qcom: switch to devm_register_sys_off_handler() gpiolib: don't crash on enabling GPIO HOG pins
2025-05-22RDMA/rxe: Break endless pagefault loop for RO pagesLeon Romanovsky
RO pages has "perm" equal to 0, that caused to the situation where such pages were marked as needed to have fault and caused to infinite loop. Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN") Reported-by: Daisuke Matsuda <dskmtsd@gmail.com> Closes: https://lore.kernel.org/all/3016329a-4edd-4550-862f-b298a1b79a39@gmail.com/ Link: https://patch.msgid.link/096fab178d48ed86942ee22eafe9be98e29092aa.1747913377.git.leonro@nvidia.com Tested-by: Daisuke Matsuda <dskmtsd@gmail.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-05-22Merge tag 'coresight-next-v6.16' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next Suzuki writes: coresight: updates for Linux v6.16 CoreSight self-hosted trace driver subsystem updates for Linux v6.16 includes: - Clear CLAIM tags on device probe if self-hosted tags are set. - Support for perf AUX pause/resume for CoreSight ETM PMU driver, with trace collection at pause. - Miscellaneous fixes for the subsystem Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> * tag 'coresight-next-v6.16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux: (27 commits) coresight: prevent deactivate active config while enabling the config coresight: holding cscfg_csdev_lock while removing cscfg from csdev coresight/etm4: fix missing disable active config coresight: etm4x: Fix timestamp bit field handling coresight: tmc: fix failure to disable/enable ETF after reading Documentation: coresight: Document AUX pause and resume coresight: perf: Update buffer on AUX pause coresight: tmc: Re-enable sink after buffer update coresight: perf: Support AUX trace pause and resume coresight: etm4x: Hook pause and resume callbacks coresight: Introduce pause and resume APIs for source coresight: etm4x: Extract the trace unit controlling coresight: cti: Replace inclusion by struct fwnode_handle forward declaration coresight: Disable MMIO logging for coresight stm driver coresight: replicator: Fix panic for clearing claim tag coresight: Add a KUnit test for coresight_find_default_sink() coresight: Remove extern from function declarations coresight: Remove inlines from static function definitions coresight: Clear self hosted claim tag on probe coresight: etm3x: Convert raw base pointer to struct coresight access ...
2025-05-22Revert "drm/amd: Keep display off while going into S4"Mario Limonciello
commit 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4") attempted to keep displays off during the S4 sequence by not resuming display IP. This however leads to hangs because DRM clients such as the console can try to access registers and cause a hang. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4155 Fixes: 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250522141328.115095-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22ublk: run auto buf unregisgering in same io_ring_ctx with registeringMing Lei
UBLK_F_AUTO_BUF_REG requires that the buffer registered automatically is unregistered in same `io_ring_ctx`, so check it explicitly. Document this requirement for UBLK_F_AUTO_BUF_REG. Drop WARN_ON_ONCE() which is triggered from userspace code path. Fixes: 99c1e4eb6a3f ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG") Reported-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522152043.399824-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-22drm/amd/pm: Fetch partition metrics on SMUv13.0.12Lijo Lazar
Add support to fetch compute partition related metrics in SMUv13.0.12 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22Revert "drm/amd/display: [FW Promotion] Release 0.1.11.0"Aurabindo Pillai
This reverts commit 81fc9ca25f02c53c055b842a40f2a915bd0bd5e0 since it introduces incompatbility with older firmware Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: seq64 memory unmap uses uninterruptible lockPhilip Yang
To unmap and free seq64 memory when drm node close to free vm, if there is signal accepted, then taking vm lock failed and leaking seq64 va mapping, and then dmesg has error log "still active bo inside vm". Change to use uninterruptible lock fix the mapping leaking and no dmesg error log. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: update ras support checkMangesh Gadre
update ras support check for vcn 5.0.1 Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Enable RAS for jpeg 5.0.1Mangesh Gadre
Enable jpeg ras posion processing and aca error logging Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Add jpeg poison status regMangesh Gadre
added registers to enable jpeg ras Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Enable RAS for vcn 5.0.1Mangesh Gadre
Enable vcn ras posion processing and aca error logging Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: Add a new dcdebugmask to allow skip detection LTWayne Lin
Under specific embedded scenarios, we might still use DP interface rather than eDP interface. Under such case, detection link training is unnecessary. Add a new dcdebugmask value that can be used to skip the detection LT Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Link: https://lore.kernel.org/amd-gfx/20250521063934.2111323-1-Wayne.Lin@amd.com/ Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: no 3D and blnd LUT as DPP color caps for DCN401Melissa Wen
Match what is declared as DPP color caps with hw caps. DCN401 has MPC shaper + 3D LUTs that are movable before and after blending (get from plane or stream), but no DPP blend LUTs. Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: only collect data if debug gamut_remap is availableMelissa Wen
Color gamut_remap state log may be not available for some hw versions, so prevent null pointer dereference by checking if there is a function to collect data for this hw version. Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Remove duplicated "context still alive" checkTvrtko Ursulin
When amdgpu_ctx_mgr_fini() calls amdgpu_ctx_mgr_entity_fini() it contains the exact same "context still alive" check as it will do next. Remove the duplicated copy. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Make amdgpu_ctx_mgr_entity_fini staticTvrtko Ursulin
Function amdgpu_ctx_mgr_entity_fini() only has a single local caller so lets make it local. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Update runtime pm checksAlex Deucher
Don't enable BACO when in passthrough. PCI resets don't work correctly when in BACO. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Add vcn poison status regMangesh Gadre
added register to enable vcn ras Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Use external link order for xgmi dataLijo Lazar
xgmi_port_num interface reports external link number for port number. To be consistent, use the external link number for reporting other XGMI link data also. v2: For invalid link number return -EINVAL (Kevin) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/radeon: fixing typo in macro nameJihed Chaibi
"ENABLE" is currently misspelled in SYS_INFO_GPUCAPS__ENABEL_DFS_BYPASS Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: fixing typo in macro nameJihed Chaibi
"ENABLE" is currently misspelled in SYS_INFO_GPUCAPS__ENABEL_DFS_BYPASS PS: checkpatch.pl is complaining about the presence of a space at the start of drivers/gpu/drm/amd/include/atomfirmware.h line: 1716 This is propably because this file uses (two) spaces and not tabs. Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: fix typo in commentsDaniil Ryabov
Fix double 'u' in 'frequuency' Signed-off-by: Daniil Ryabov <daniilryabov4@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: Adjust set_value function with prefix to help in ftraceLeonardo Gomes
Adjust set_value function in hw_hpd.c file to have prefix to help in ftrace, the name change from 'set_value' to 'dal_hw_hpd_set_value' Signed-off-by: Leonardo da Silva Gomes <leonardodasigomes@gmail.com> Co-developed-by: Derick Frias <derick.william.moraes@gmail.com> Signed-off-by: Derick Frias <derick.william.moraes@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: Adjust get_value function with prefix to help in ftraceLeonardo Gomes
Adjust get_value function in hw_hpd.c file to have prefix to help in ftrace, the name change from 'get_value' to 'dal_hw_hpd_get_value' Signed-off-by: Leonardo da Silva Gomes <leonardodasigomes@gmail.com> Co-developed-by: Derick Frias <derick.william.moraes@gmail.com> Signed-off-by: Derick Frias <derick.william.moraes@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Fetch partition metrics on SMUv13.0.6Lijo Lazar
Add support to fetch compute partition related metrics in SMUv13.0.6 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Add sysfs nodes for partitionLijo Lazar
Add sysfs nodes to provide compute paritition specific data. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Add support to query partition metricsLijo Lazar
Add interfaces to query compute partition related metrics data. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Register aqua vanjaram jpeg poison irqStanley.Yang
Register aqua vanjaram jpeg poison irq, add jpeg poison handle. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Register aqua vanjaram vcn poison irqStanley.Yang
Register aqua vanjaram vcn poison irq, add vcn poison handle. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Use macro to initialize metrics tableLijo Lazar
Helps to keep a build time check about usage of right datatype and avoids maintenance as new versions get added. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Fill pldm version for SMU v13.0.6 SOCsAsad Kamal
Fetch pldm version from static metrics table for SMU v13.0.6 SOCs Signed-off-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Update pmfw headers for smu_v_13_0_6Asad Kamal
Update pmfw headers for smu_v_13_0_6 to include pldm version as part of statics metrics table Signed-off-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Fix eviction fence worker race during fd closeJesse.Zhang
The current cleanup order during file descriptor close can lead to a race condition where the eviction fence worker attempts to access a destroyed mutex from the user queue manager: [ 517.294055] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 517.294060] WARNING: CPU: 8 PID: 2030 at kernel/locking/mutex.c:564 [ 517.294094] Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu] The issue occurs because: 1. We destroy the user queue manager (including its mutex) first 2. Then try to destroy eviction fences which may have pending work 3. The eviction fence worker may try to access the already-destroyed mutex Fix this by reordering the cleanup to: 1. First mark the fd as closing and destroy eviction fences, which flushes any pending work 2. Then safely destroy the user queue manager after we're certain no more fence work will be executed The copy in amdgpu_driver_postclose_kms() needs to be removed (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: lock the eviction fence for wq signals itPrike Liang
Lock and refer to the eviction fence before the eviction fence schedules work queue tries to signal it. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdkfd: Change svm_range_get_info return typeAndrey Vatoropin
Static analysis shows that pointer "svms" cannot be NULL because it points to the object "struct svm_range_list". Remove the extra NULL check. It is meaningless and harms the readability of the code. In the function svm_range_get_info() there is no possibility of failure. Therefore, the caller of the function svm_range_get_info() does not need a return value. Change the function svm_range_get_info() return type from "int" to "void". Since the function svm_range_get_info() has a return type of "void". The caller of the function svm_range_get_info() does not need a return value. Delete extra code. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22EDAC/bluefield: Don't use bluefield_edac_readl() result on errorDavid Thompson
The bluefield_edac_readl() routine returns an uninitialized result on error paths. In those cases the calling routine should not use the uninitialized result. The driver should simply log the error, and then return early. Fixes: e41967575474 ("EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2") Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shravan Kumar Ramani <shravankr@nvidia.com> Link: https://lore.kernel.org/20250318214747.12271-1-davthompson@nvidia.com
2025-05-22eth: bnxt: fix deadlock when xdp is attached or detachedTaehee Yoo
When xdp is attached or detached, dev->ndo_bpf() is called by do_setlink(), and it acquires netdev_lock() if needed. Unlike other drivers, the bnxt driver is protected by netdev_lock while xdp is attached/detached because it sets dev->request_ops_lock to true. So, the bnxt_xdp(), that is callback of ->ndo_bpf should not acquire netdev_lock(). But the xdp_features_{set | clear}_redirect_target() was changed to acquire netdev_lock() internally. It causes a deadlock. To fix this problem, bnxt driver should use xdp_features_{set | clear}_redirect_target_locked() instead. Splat looks like: ============================================ WARNING: possible recursive locking detected 6.15.0-rc6+ #1 Not tainted -------------------------------------------- bpftool/1745 is trying to acquire lock: ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: xdp_features_set_redirect_target+0x1f/0x80 but task is already holding lock: ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: do_setlink.constprop.0+0x24e/0x35d0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&dev->lock); lock(&dev->lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by bpftool/1745: #0: ffffffffa56131c8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_setlink+0x1fe/0x570 #1: ffffffffaafa75a0 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_setlink+0x236/0x570 #2: ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: do_setlink.constprop.0+0x24e/0x35d0 stack backtrace: CPU: 1 UID: 0 PID: 1745 Comm: bpftool Not tainted 6.15.0-rc6+ #1 PREEMPT(undef) Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 Call Trace: <TASK> dump_stack_lvl+0x7a/0xd0 print_deadlock_bug+0x294/0x3d0 __lock_acquire+0x153b/0x28f0 lock_acquire+0x184/0x340 ? xdp_features_set_redirect_target+0x1f/0x80 __mutex_lock+0x1ac/0x18a0 ? xdp_features_set_redirect_target+0x1f/0x80 ? xdp_features_set_redirect_target+0x1f/0x80 ? __pfx_bnxt_rx_page_skb+0x10/0x10 [bnxt_en ? __pfx___mutex_lock+0x10/0x10 ? __pfx_netdev_update_features+0x10/0x10 ? bnxt_set_rx_skb_mode+0x284/0x540 [bnxt_en ? __pfx_bnxt_set_rx_skb_mode+0x10/0x10 [bnxt_en ? xdp_features_set_redirect_target+0x1f/0x80 xdp_features_set_redirect_target+0x1f/0x80 bnxt_xdp+0x34e/0x730 [bnxt_en 11cbcce8fa11cff1dddd7ef358d6219e4ca9add3] dev_xdp_install+0x3f4/0x830 ? __pfx_bnxt_xdp+0x10/0x10 [bnxt_en 11cbcce8fa11cff1dddd7ef358d6219e4ca9add3] ? __pfx_dev_xdp_install+0x10/0x10 dev_xdp_attach+0x560/0xf70 dev_change_xdp_fd+0x22d/0x280 do_setlink.constprop.0+0x2989/0x35d0 ? __pfx_do_setlink.constprop.0+0x10/0x10 ? lock_acquire+0x184/0x340 ? find_held_lock+0x32/0x90 ? rtnl_setlink+0x236/0x570 ? rcu_is_watching+0x11/0xb0 ? trace_contention_end+0xdc/0x120 ? __mutex_lock+0x946/0x18a0 ? __pfx___mutex_lock+0x10/0x10 ? __lock_acquire+0xa95/0x28f0 ? rcu_is_watching+0x11/0xb0 ? rcu_is_watching+0x11/0xb0 ? cap_capable+0x172/0x350 rtnl_setlink+0x2cd/0x570 Fixes: 03df156dd3a6 ("xdp: double protect netdev->xdp_flags with netdev->lock") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250520071155.2462843-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22spi: spi-fsl-dspi: Reset SR flags before sending a new messageLarisa Grigore
If, in a previous transfer, the controller sends more data than expected by the DSPI target, SR.RFDF (RX FIFO is not empty) will remain asserted. When flushing the FIFOs at the beginning of a new transfer (writing 1 into MCR.CLR_TXF and MCR.CLR_RXF), SR.RFDF should also be cleared. Otherwise, when running in target mode with DMA, if SR.RFDF remains asserted, the DMA callback will be fired before the controller sends any data. Take this opportunity to reset all Status Register fields. Fixes: 5ce3cc567471 ("spi: spi-fsl-dspi: Provide support for DSPI slave mode operation (Vybryd vf610)") Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-3-bea884630cfb@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-22spi: spi-fsl-dspi: Halt the module after a new message transferBogdan-Gabriel Roman
The XSPI mode implementation in this driver still uses the EOQ flag to signal the last word in a transmission and deassert the PCS signal. However, at speeds lower than ~200kHZ, the PCS signal seems to remain asserted even when SR[EOQF] = 1 indicates the end of a transmission. This is a problem for target devices which require the deassertation of the PCS signal between transfers. Hence, this commit 'forces' the deassertation of the PCS by stopping the module through MCR[HALT] after completing a new transfer. According to the reference manual, the module stops or transitions from the Running state to the Stopped state after the current frame, when any one of the following conditions exist: - The value of SR[EOQF] = 1. - The chip is in Debug mode and the value of MCR[FRZ] = 1. - The value of MCR[HALT] = 1. This shouldn't be done if the last transfer in the message has cs_change set. Fixes: ea93ed4c181b ("spi: spi-fsl-dspi: Use EOQ for last word in buffer even for XSPI mode") Signed-off-by: Bogdan-Gabriel Roman <bogdan-gabriel.roman@nxp.com> Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-2-bea884630cfb@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-22spi: spi-fsl-dspi: restrict register range for regmap accessLarisa Grigore
DSPI registers are NOT continuous, some registers are reserved and accessing them from userspace will trigger external abort, add regmap register access table to avoid below abort. For example on S32G: # cat /sys/kernel/debug/regmap/401d8000.spi/registers Internal error: synchronous external abort: 96000210 1 PREEMPT SMP ... Call trace: regmap_mmio_read32le+0x24/0x48 regmap_mmio_read+0x48/0x70 _regmap_bus_reg_read+0x38/0x48 _regmap_read+0x68/0x1b0 regmap_read+0x50/0x78 regmap_read_debugfs+0x120/0x338 Fixes: 1acbdeb92c87 ("spi/fsl-dspi: Convert to use regmap and add big-endian support") Co-developed-by: Xulin Sun <xulin.sun@windriver.com> Signed-off-by: Xulin Sun <xulin.sun@windriver.com> Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-1-bea884630cfb@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-22drm/scheduler: signal scheduled fence when kill jobLin.Cao
When an entity from application B is killed, drm_sched_entity_kill() removes all jobs belonging to that entity through drm_sched_entity_kill_jobs_work(). If application A's job depends on a scheduled fence from application B's job, and that fence is not properly signaled during the killing process, application A's dependency cannot be cleared. This leads to application A hanging indefinitely while waiting for a dependency that will never be resolved. Fix this issue by ensuring that scheduled fences are properly signaled when an entity is killed, allowing dependent applications to continue execution. Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250515020713.1110476-1-lincao12@amd.com
2025-05-22mfd: tps65010: Use per-client debugfs directoryWolfram Sang
The I2C core now provides a debugfs entry for each client. Let this driver use it instead of the root directory. Further improvements by this change: automatic support of multiple instances. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20250318091234.22170-1-wsa+renesas@sang-engineering.com Signed-off-by: Lee Jones <lee@kernel.org>
2025-05-22mfd: aat2870: Use per-client debugfs directoryWolfram Sang
The I2C core now provides a debugfs entry for each client. Let this driver use it instead of the custom directory in debugfs root. Further improvements by this change: automatic clean up on removal, support of multiple instances. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20250318091426.22258-2-wsa+renesas@sang-engineering.com Signed-off-by: Lee Jones <lee@kernel.org>
2025-05-22Merge branches 'ib-firmware-mfd-6.16', 'ib-mfd-clocksource-pwm-6.16', ↵Lee Jones
'ib-mfd-gpio-nvmem-6.16', 'ib-mfd-regulator-6.16' and 'ib-mfd-regulator-6.16-1' into ibs-for-mfd-merged
2025-05-22cxl/features: Remove the inline specifier from to_cxlfs()Alison Schofield
to_cxlfs() was declared 'inline' in the header but only defined in drivers/cxl/core/features.c. This has worked because features.c was the only file using the function and the definition happened to be available in the same compilation unit. However, in preparation for a second .c file using the header and needing to call the function, the inline specifier became an issue. Sparse flagged the declaration as invalid since 'inline' requires a visible definition at the point of use. Defining the function in the header was considered but rejected, as it depends on internal symbols not visible at that level. Remove the inline specifier to correct the linkage violation. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250521233625.1745849-1-alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-05-22s390/pci: Prevent self deletion in disable_slot()Niklas Schnelle
As disable_slot() takes a struct zpci_dev from the Configured to the Standby state. In Standby there is still a hotplug slot so this is not usually a case of sysfs self deletion. This is important because self deletion gets very hairy in terms of locking (see for example recover_store() in arch/s390/pci/pci_sysfs.c). Because the pci_dev_put() is not within the critical section of the zdev->state_lock however, disable_slot() can turn into a case of self deletion if zPCI device event handling slips between the mutex_unlock() and the pci_dev_put(). If the latter is the last put and zpci_release_device() is called this then tries to remove the hotplug slot via zpci_exit_slot() which will try to remove the hotplug slot directory the disable_slot() is part of i.e. self deletion. Prevent this by widening the zdev->state_lock critical section to include the pci_dev_put() which is then guaranteed to happen with the struct zpci_dev still in Standby state ensuring it will not lead to a zpci_release_device() call as at least the zPCI event handling code still holds a reference. Cc: stable@vger.kernel.org Fixes: a46044a92add ("s390/pci: fix zpci_zdev_put() on reserve") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Tested-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-22dm mpath: replace spin_lock_irqsave with spin_lock_irqMikulas Patocka
Replace spin_lock_irqsave/spin_unlock_irqrestore with spin_lock_irq/spin_unlock_irq at places where it is known that interrupts are enabled. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>