summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-22Merge tag 'icc-6.16-rc1' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next Georgi writes: interconnect changes for 6.16 This pull request contains the interconnect changes for the 6.16-rc1 merge window. The core and driver changes are listed below. Core changes: - Add support for dynamic id allocation, that allows creating multiple instances of the same provider Driver changes: - Add driver for the EPSS L3 instances on SA8775P SoC - Add QoS support for SM8650 SoC - Add some missing nodes for SM8650 - Misc dt-binding style and indentation fixes Signed-off-by: Georgi Djakov <djakov@kernel.org> * tag 'icc-6.16-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/djakov/icc: interconnect: qcom: sm8650: remove regmap config for mc_virt & clk_virt interconnect: qcom: sm8650: add the MASTER_APSS_NOC dt-bindings: interconnect: sm8650: document the MASTER_APSS_NOC interconnect: qcom: sm8650: enable QoS configuration dt-bindings: interconnect: Correct indentation and style in DTS example interconnect: qcom: sa8775p: Add dynamic icc node id support interconnect: qcom: icc-rpmh: Add dynamic icc node id support interconnect: qcom: Add multidev EPSS L3 support interconnect: core: Add dynamic id allocation support dt-bindings: interconnect: Add EPSS L3 compatible for SA8775P
2025-05-22Merge tag 'net-6.15-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "This is somewhat larger than what I hoped for, with a few PRs from subsystems and follow-ups for the recent netdev locking changes, anyhow there are no known pending regressions. Including fixes from bluetooth, ipsec and CAN. Current release - regressions: - eth: team: grab team lock during team_change_rx_flags - eth: bnxt_en: fix netdev locking in ULP IRQ functions Current release - new code bugs: - xfrm: ipcomp: fix truesize computation on receive - eth: airoha: fix page recycling in airoha_qdma_rx_process() Previous releases - regressions: - sched: hfsc: fix qlen accounting bug when using peek in hfsc_enqueue() - mr: consolidate the ipmr_can_free_table() checks. - bridge: netfilter: fix forwarding of fragmented packets - xsk: bring back busy polling support in XDP_COPY - can: - add missing rcu read protection for procfs content - kvaser_pciefd: force IRQ edge in case of nested IRQ Previous releases - always broken: - xfrm: espintcp: remove encap socket caching to avoid reference leak - bluetooth: use skb_pull to avoid unsafe access in QCA dump handling - eth: idpf: - fix null-ptr-deref in idpf_features_check - fix idpf_vport_splitq_napi_poll() - eth: hibmcge: fix wrong ndo.open() after reset fail issue" * tag 'net-6.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits) octeontx2-af: Fix APR entry mapping based on APR_LMT_CFG octeontx2-af: Set LMT_ENA bit for APR table entries net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done octeontx2-pf: Avoid adding dcbnl_ops for LBK and SDP vf selftests/tc-testing: Add an HFSC qlen accounting test sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue() idpf: fix idpf_vport_splitq_napi_poll() net: hibmcge: fix wrong ndo.open() after reset fail issue. net: hibmcge: fix incorrect statistics update issue xsk: Bring back busy polling support in XDP_COPY can: slcan: allow reception of short error messages net: lan743x: Restore SGMII CTRL register on resume bnxt_en: Fix netdev locking in ULP IRQ functions MAINTAINERS: Drop myself to reviewer for ravb driver net: dwmac-sun8i: Use parsed internal PHY address instead of 1 net: ethernet: ti: am65-cpsw: Lower random mac address error print to info can: kvaser_pciefd: Continue parsing DMA buf after dropped RX can: kvaser_pciefd: Fix echo_skb race can: kvaser_pciefd: Force IRQ edge in case of nested IRQ idpf: fix null-ptr-deref in idpf_features_check ...
2025-05-22Merge branch 'net-mlx5-convert-mlx5-to-netdev-instance-locking'Jakub Kicinski
Tariq Toukan says: ==================== net/mlx5: Convert mlx5 to netdev instance locking Cosmin Ratiu says: mlx5 manages multiple netdevices, from basic Ethernet to Infiniband netdevs. This patch series converts the driver to use netdev instance locking for everything in preparation for TCP devmem Zero Copy. Because mlx5 is tightly coupled with the ipoib driver, a series of changes first happen in ipoib to allow it to work with mlx5 netdevs that use instance locking: IB/IPoIB: Enqueue separate work_structs for each flushed interface IB/IPoIB: Replace vlan_rwsem with the netdev instance lock IB/IPoIB: Allow using netdevs that require the instance lock A small patch then avoids dropping RTNL during firmware update: net/mlx5e: Don't drop RTNL during firmware flash The main patch then converts all mlx5 netdevs to use instance locking: net/mlx5e: Convert mlx5 netdevs to instance locking ==================== Link: https://patch.msgid.link/1747829342-1018757-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22net/mlx5e: Convert mlx5 netdevs to instance lockingCosmin Ratiu
This patch convert mlx5 to use the new netdev instance lock in addition to the pre-existing state_lock (and the RTNL). mlx5e_priv.state_lock was already used throughout mlx5 to protect against concurrent state modifications on the same netdev, usually in addition to the RTNL. The new netdev instance lock will eventually replace it, but for now, it is acquired in addition to the existing locks in the order RTNL -> instance lock -> state_lock. All three netdev types handled by mlx5 are converted to the new style of locking, because they share a lot of code related to initializing channels and dealing with NAPI, so it's better to convert all three rather than introduce different assumptions deep in the call stack depending on the type of device. Because of the nature of the call graphs in mlx5, it wasn't possible to incrementally convert parts of the driver to use the new lock, since either all call paths into NAPI have to possess the new lock if the *_locked variants are used, or none of them can have the lock. One area which required extra care is the interaction between closing channels and devlink health reporter tasks. Previously, the recovery tasks were unconditionally acquiring the RTNL, which could lead to deadlocks in these scenarios: T1: mlx5e_close (== .ndo_stop(), has RTNL) -> mlx5e_close_locked -> mlx5e_close_channels -> mlx5e_ptp_close -> mlx5e_ptp_close_queues -> mlx5e_ptp_close_txqsqs -> mlx5e_ptp_close_txqsq -> cancel_work_sync(&ptpsq->report_unhealthy_work) waits for T2: mlx5e_ptpsq_unhealthy_work -> mlx5e_reporter_tx_ptpsq_unhealthy -> mlx5e_health_report -> devlink_health_report -> devlink_health_reporter_recover -> mlx5e_tx_reporter_ptpsq_unhealthy_recover which does: rtnl_lock(); => Deadlock. Another similar instance of this is: T1: mlx5e_close (== .ndo_stop(), has RTNL) -> mlx5e_close_locked -> mlx5e_close_channels -> mlx5e_ptp_close -> mlx5e_ptp_close_queues -> mlx5e_ptp_close_txqsqs -> mlx5e_ptp_close_txqsq -> cancel_work_sync(&sq->recover_work) waits for T2: mlx5e_tx_err_cqe_work -> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report -> devlink_health_report -> devlink_health_reporter_recover -> mlx5e_tx_reporter_err_cqe_recover which does: rtnl_lock(); => Another deadlock. Fix that by using the same pattern previously done in mlx5e_tx_timeout_work, where the RTNL was repeatedly tried to be acquired until either: a) it is successfully acquired or b) there's no need for the work to be done any more (channel is being closed). Now, for all three recovery tasks, the instance lock is repeatedly tried to be acquired until successful or the channel/SQ is closed. As a side-effect, drop the !test_bit(MLX5E_STATE_OPENED, &priv->state) check from mlx5e_tx_timeout_work, it's weaker than !test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state) and unnecessary. Future patches will introduce new call paths (from netdev queue management ops) which can close channels (and call cancel_work_sync on the recovery tasks) without the RTNL lock and only with the netdev instance lock. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747829342-1018757-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22net/mlx5e: Don't drop RTNL during firmware flashCosmin Ratiu
There's no explanation in the original commit of why that was done, but presumably flashing takes a long time and holding RTNL for so long blocks other interactions with the netdev layer. However, the stack is moving towards netdev instance locking and dropping and reacquiring RTNL in the context of flashing introduces locking ordering issues: RTNL must be acquired before the netdev instance lock and released after it. This patch therefore takes the simpler approach by no longer dropping and reacquiring the RTNL, as soon RTNL for ethtool will be removed, leaving only the instance lock to protect against races. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747829342-1018757-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22IB/IPoIB: Allow using netdevs that require the instance lockCosmin Ratiu
After the last patch removing vlan_rwsem, it is an incremental step to allow ipoib to work with netdevs that require the instance lock. In several places, netdev_lock() is changed to netdev_lock_ops_to_full() which takes care of not acquiring the lock again when the netdev is already locked. In ipoib_ib_tx_timeout_work() and __ipoib_ib_dev_flush() for HEAVY flushes, the netdev lock is acquired/released. This is needed because these functions end up calling .ndo_stop()/.ndo_open() on subinterfaces, and the device may expect the netdev instance lock to be held. ipoib_set_mode() now explicitly acquires ops lock while manipulating the features, mtu and tx queues. Finally, ipoib_napi_enable()/ipoib_napi_disable() now use the *_locked variants of the napi_enable()/napi_disable() calls and optionally acquire the netdev lock themselves depending on the dev they operate on. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747829342-1018757-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22IB/IPoIB: Replace vlan_rwsem with the netdev instance lockCosmin Ratiu
vlan_rwsem was added more than a decade ago to work around a deadlock involving the original mutex being acquired twice, once from the wq. Subsequent changes then tweaked it to partially protect access to ipoib_dev_priv->child_intfs together with the RTNL. Flushing the wq synchronously was also since then refactored to happen separately. This semaphore unfortunately prevents updating ipoib to work with devices that require the netdev lock, because of lock ordering issues between RTNL, vlan_rwsem and the netdev instance locks of parent and child devices. To uncomplicate things, this commit replaces vlan_rwsem with the netdev instance lock of the parent device. Both parent child_intfs list and the children's list membership in it require holding the parent netdev instance lock. All call paths were carefully reviewed and no-longer-needed ASSERT_RTNL calls were dropped. Some non-trivial changes: - ipoib_match_gid_pkey_addr() now only acquires the instance lock and iterates through child_intfs for the first level of recursion (the parent), as it's not possible to have multiple levels of nested subinterfaces. - ipoib_open() and ipoib_stop() schedule tasks on the global workqueue to open/stop child interfaces to avoid potentially acquiring nested netdev instance locks. To avoid the device going away between the task scheduling and execution, netdev_hold/netdev_put are used. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747829342-1018757-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22IB/IPoIB: Enqueue separate work_structs for each flushed interfaceCosmin Ratiu
Previously, flushing a netdevice involved first flushing all child devices from the flush task itself. That requires holding the lock that protects the list for the entire duration of the flush. This poses a problem when converting from vlan_rwsem to the netdev instance lock (next patch), because holding the parent lock while trying to acquire a child lock makes lockdep unhappy, rightfully. Fix this by splitting a big flush task into individual flush tasks (all are already created in their respective ipoib_dev_priv structs) and defining a helper function to enqueue all of them while holding the list lock. In ipoib_set_mac, the function is not used and the task is enqueued directly, because in the subsequent patches locking is changed and this function may be called with the netdev instance lock held. This is effectively a noop, the wq is single-threaded and ordered and will execute the same flush operations in the same order as before. Furthermore, there should be no new races because ipoib_parent_unregister_pre() calls flush_workqueue() after stopping new work generation to wait for pending work to complete. flush_workqueue() waits for all currently enqueued work to finish before returning. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747829342-1018757-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22Revert "drm/amd: Keep display off while going into S4"Mario Limonciello
commit 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4") attempted to keep displays off during the S4 sequence by not resuming display IP. This however leads to hangs because DRM clients such as the console can try to access registers and cause a hang. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4155 Fixes: 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250522141328.115095-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e485502c37b097b0bd773baa7e2741bf7bd2909a) Cc: stable@vger.kernel.org
2025-05-22Merge tag 'pinctrl-v6.15-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "This deals with a crash in the Qualcomm pin controller GPIO parts when using hogs. The first patch to gpiolib makes gpiochip_line_is_valid() NULL-tolerant. The second patch fixes the actual problem" * tag 'pinctrl-v6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: qcom: switch to devm_register_sys_off_handler() gpiolib: don't crash on enabling GPIO HOG pins
2025-05-22trace/io_uring: fix io_uring_local_work_run ctx documentationCaleb Sander Mateos
The comment for the tracepoint io_uring_local_work_run refers to a field "tctx" and a type "io_uring_ctx", neither of which exist. "tctx" looks to mean "ctx" and "io_uring_ctx" should be "io_ring_ctx". Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250522150451.2385652-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-22Merge tag 'sound-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes for 6.15 final. It became slightly a higher amount than expected, but all look easy and safe to apply: - A fix for PCM core race spotted by fuzzing - ASoC topology fix for single DAI link - UAF fix for ASoC SOF Intel HD-audio at reloading - ASoC SOF Intel and Mediatek fixes - Trivial HD-audio quirks as usual" * tag 'sound-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek - Add new HP ZBook laptop with micmute led fixup ALSA: hda/realtek: Add support for HP Agusta using CS35L41 HDA ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14ASP10 ALSA: hda/realtek - restore auto-mute mode for Dell Chrome platform ALSA: pcm: Fix race of buffer access at PCM OSS layer ASoC: SOF: Intel: hda: Fix UAF when reloading module ASoc: SOF: topology: connect DAI to a single DAI link ASoC: SOF: Intel: hda-bus: Use PIO mode on ACE2+ platforms ASoC: SOF: ipc4-pcm: Delay reporting is only supported for playback direction ASoC: SOF: ipc4-control: Use SOF_CTRL_CMD_BINARY as numid for bytes_ext ASoC: mediatek: mt8188-mt6359: Depend on MT6359_ACCDET set or disabled ASoC: mediatek: mt8188-mt6359: select CONFIG_SND_SOC_MT6359_ACCDET
2025-05-22RDMA/rxe: Break endless pagefault loop for RO pagesLeon Romanovsky
RO pages has "perm" equal to 0, that caused to the situation where such pages were marked as needed to have fault and caused to infinite loop. Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN") Reported-by: Daisuke Matsuda <dskmtsd@gmail.com> Closes: https://lore.kernel.org/all/3016329a-4edd-4550-862f-b298a1b79a39@gmail.com/ Link: https://patch.msgid.link/096fab178d48ed86942ee22eafe9be98e29092aa.1747913377.git.leonro@nvidia.com Tested-by: Daisuke Matsuda <dskmtsd@gmail.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-05-22Merge tag 'coresight-next-v6.16' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next Suzuki writes: coresight: updates for Linux v6.16 CoreSight self-hosted trace driver subsystem updates for Linux v6.16 includes: - Clear CLAIM tags on device probe if self-hosted tags are set. - Support for perf AUX pause/resume for CoreSight ETM PMU driver, with trace collection at pause. - Miscellaneous fixes for the subsystem Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> * tag 'coresight-next-v6.16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux: (27 commits) coresight: prevent deactivate active config while enabling the config coresight: holding cscfg_csdev_lock while removing cscfg from csdev coresight/etm4: fix missing disable active config coresight: etm4x: Fix timestamp bit field handling coresight: tmc: fix failure to disable/enable ETF after reading Documentation: coresight: Document AUX pause and resume coresight: perf: Update buffer on AUX pause coresight: tmc: Re-enable sink after buffer update coresight: perf: Support AUX trace pause and resume coresight: etm4x: Hook pause and resume callbacks coresight: Introduce pause and resume APIs for source coresight: etm4x: Extract the trace unit controlling coresight: cti: Replace inclusion by struct fwnode_handle forward declaration coresight: Disable MMIO logging for coresight stm driver coresight: replicator: Fix panic for clearing claim tag coresight: Add a KUnit test for coresight_find_default_sink() coresight: Remove extern from function declarations coresight: Remove inlines from static function definitions coresight: Clear self hosted claim tag on probe coresight: etm3x: Convert raw base pointer to struct coresight access ...
2025-05-22Revert "drm/amd: Keep display off while going into S4"Mario Limonciello
commit 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4") attempted to keep displays off during the S4 sequence by not resuming display IP. This however leads to hangs because DRM clients such as the console can try to access registers and cause a hang. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4155 Fixes: 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250522141328.115095-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22ublk: run auto buf unregisgering in same io_ring_ctx with registeringMing Lei
UBLK_F_AUTO_BUF_REG requires that the buffer registered automatically is unregistered in same `io_ring_ctx`, so check it explicitly. Document this requirement for UBLK_F_AUTO_BUF_REG. Drop WARN_ON_ONCE() which is triggered from userspace code path. Fixes: 99c1e4eb6a3f ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG") Reported-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522152043.399824-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-22io_uring: add helper io_uring_cmd_ctx_handle()Ming Lei
Add helper io_uring_cmd_ctx_handle() for driver to track per-context resource, such as registered kernel io buffer. Suggested-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250522152043.399824-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-22drm/amd/pm: Fetch partition metrics on SMUv13.0.12Lijo Lazar
Add support to fetch compute partition related metrics in SMUv13.0.12 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22Revert "drm/amd/display: [FW Promotion] Release 0.1.11.0"Aurabindo Pillai
This reverts commit 81fc9ca25f02c53c055b842a40f2a915bd0bd5e0 since it introduces incompatbility with older firmware Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: seq64 memory unmap uses uninterruptible lockPhilip Yang
To unmap and free seq64 memory when drm node close to free vm, if there is signal accepted, then taking vm lock failed and leaking seq64 va mapping, and then dmesg has error log "still active bo inside vm". Change to use uninterruptible lock fix the mapping leaking and no dmesg error log. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: update ras support checkMangesh Gadre
update ras support check for vcn 5.0.1 Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Enable RAS for jpeg 5.0.1Mangesh Gadre
Enable jpeg ras posion processing and aca error logging Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Add jpeg poison status regMangesh Gadre
added registers to enable jpeg ras Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Enable RAS for vcn 5.0.1Mangesh Gadre
Enable vcn ras posion processing and aca error logging Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: Add a new dcdebugmask to allow skip detection LTWayne Lin
Under specific embedded scenarios, we might still use DP interface rather than eDP interface. Under such case, detection link training is unnecessary. Add a new dcdebugmask value that can be used to skip the detection LT Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Link: https://lore.kernel.org/amd-gfx/20250521063934.2111323-1-Wayne.Lin@amd.com/ Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: no 3D and blnd LUT as DPP color caps for DCN401Melissa Wen
Match what is declared as DPP color caps with hw caps. DCN401 has MPC shaper + 3D LUTs that are movable before and after blending (get from plane or stream), but no DPP blend LUTs. Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: only collect data if debug gamut_remap is availableMelissa Wen
Color gamut_remap state log may be not available for some hw versions, so prevent null pointer dereference by checking if there is a function to collect data for this hw version. Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Remove duplicated "context still alive" checkTvrtko Ursulin
When amdgpu_ctx_mgr_fini() calls amdgpu_ctx_mgr_entity_fini() it contains the exact same "context still alive" check as it will do next. Remove the duplicated copy. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Make amdgpu_ctx_mgr_entity_fini staticTvrtko Ursulin
Function amdgpu_ctx_mgr_entity_fini() only has a single local caller so lets make it local. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Update runtime pm checksAlex Deucher
Don't enable BACO when in passthrough. PCI resets don't work correctly when in BACO. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Add vcn poison status regMangesh Gadre
added register to enable vcn ras Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Use external link order for xgmi dataLijo Lazar
xgmi_port_num interface reports external link number for port number. To be consistent, use the external link number for reporting other XGMI link data also. v2: For invalid link number return -EINVAL (Kevin) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/radeon: fixing typo in macro nameJihed Chaibi
"ENABLE" is currently misspelled in SYS_INFO_GPUCAPS__ENABEL_DFS_BYPASS Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: fixing typo in macro nameJihed Chaibi
"ENABLE" is currently misspelled in SYS_INFO_GPUCAPS__ENABEL_DFS_BYPASS PS: checkpatch.pl is complaining about the presence of a space at the start of drivers/gpu/drm/amd/include/atomfirmware.h line: 1716 This is propably because this file uses (two) spaces and not tabs. Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: fix typo in commentsDaniil Ryabov
Fix double 'u' in 'frequuency' Signed-off-by: Daniil Ryabov <daniilryabov4@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: Adjust set_value function with prefix to help in ftraceLeonardo Gomes
Adjust set_value function in hw_hpd.c file to have prefix to help in ftrace, the name change from 'set_value' to 'dal_hw_hpd_set_value' Signed-off-by: Leonardo da Silva Gomes <leonardodasigomes@gmail.com> Co-developed-by: Derick Frias <derick.william.moraes@gmail.com> Signed-off-by: Derick Frias <derick.william.moraes@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/display: Adjust get_value function with prefix to help in ftraceLeonardo Gomes
Adjust get_value function in hw_hpd.c file to have prefix to help in ftrace, the name change from 'get_value' to 'dal_hw_hpd_get_value' Signed-off-by: Leonardo da Silva Gomes <leonardodasigomes@gmail.com> Co-developed-by: Derick Frias <derick.william.moraes@gmail.com> Signed-off-by: Derick Frias <derick.william.moraes@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Fetch partition metrics on SMUv13.0.6Lijo Lazar
Add support to fetch compute partition related metrics in SMUv13.0.6 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Add sysfs nodes for partitionLijo Lazar
Add sysfs nodes to provide compute paritition specific data. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Add support to query partition metricsLijo Lazar
Add interfaces to query compute partition related metrics data. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Register aqua vanjaram jpeg poison irqStanley.Yang
Register aqua vanjaram jpeg poison irq, add jpeg poison handle. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Register aqua vanjaram vcn poison irqStanley.Yang
Register aqua vanjaram vcn poison irq, add vcn poison handle. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Use macro to initialize metrics tableLijo Lazar
Helps to keep a build time check about usage of right datatype and avoids maintenance as new versions get added. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Fill pldm version for SMU v13.0.6 SOCsAsad Kamal
Fetch pldm version from static metrics table for SMU v13.0.6 SOCs Signed-off-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Update pmfw headers for smu_v_13_0_6Asad Kamal
Update pmfw headers for smu_v_13_0_6 to include pldm version as part of statics metrics table Signed-off-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Fix eviction fence worker race during fd closeJesse.Zhang
The current cleanup order during file descriptor close can lead to a race condition where the eviction fence worker attempts to access a destroyed mutex from the user queue manager: [ 517.294055] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 517.294060] WARNING: CPU: 8 PID: 2030 at kernel/locking/mutex.c:564 [ 517.294094] Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu] The issue occurs because: 1. We destroy the user queue manager (including its mutex) first 2. Then try to destroy eviction fences which may have pending work 3. The eviction fence worker may try to access the already-destroyed mutex Fix this by reordering the cleanup to: 1. First mark the fd as closing and destroy eviction fences, which flushes any pending work 2. Then safely destroy the user queue manager after we're certain no more fence work will be executed The copy in amdgpu_driver_postclose_kms() needs to be removed (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: lock the eviction fence for wq signals itPrike Liang
Lock and refer to the eviction fence before the eviction fence schedules work queue tries to signal it. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdkfd: Change svm_range_get_info return typeAndrey Vatoropin
Static analysis shows that pointer "svms" cannot be NULL because it points to the object "struct svm_range_list". Remove the extra NULL check. It is meaningless and harms the readability of the code. In the function svm_range_get_info() there is no possibility of failure. Therefore, the caller of the function svm_range_get_info() does not need a return value. Change the function svm_range_get_info() return type from "int" to "void". Since the function svm_range_get_info() has a return type of "void". The caller of the function svm_range_get_info() does not need a return value. Delete extra code. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22EDAC/bluefield: Don't use bluefield_edac_readl() result on errorDavid Thompson
The bluefield_edac_readl() routine returns an uninitialized result on error paths. In those cases the calling routine should not use the uninitialized result. The driver should simply log the error, and then return early. Fixes: e41967575474 ("EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2") Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shravan Kumar Ramani <shravankr@nvidia.com> Link: https://lore.kernel.org/20250318214747.12271-1-davthompson@nvidia.com
2025-05-22Merge branch 's390-bpf-use-kernel-s-expoline-thunks'Alexei Starovoitov
Ilya Leoshkevich says: ==================== This series simplifies the s390 JIT by replacing the generation of expolines (Spectre mitigation) with using the ones from the kernel text. This is possible thanks to the V!=R s390 kernel rework. Patch 1 is a small prerequisite for arch/s390 that I would like to get in via the BPF tree. It has Heiko's Acked-by. Patches 2 and 3 are the implementation. ==================== Link: https://patch.msgid.link/20250519223646.66382-1-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>