summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-06-18drm/amd/display: Add dc cap for dp tunnelingPeichen Huang
[WHAT] 1. add dc cap for dp tunneling 2. add function to get index of host router Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Cruise Hung <cruise.hung@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 29e178d13979cf6fdb42c5fe2dfec2da2306c4ad) Cc: stable@vger.kernel.org
2025-06-18drm/amdkfd: move SDMA queue reset capability check to node_showJesse.Zhang
Relocate the per-SDMA queue reset capability check from kfd_topology_set_capabilities() to node_show() to ensure we read the latest value of sdma.supported_reset after all IP blocks are initialized. Fixes: ceb7114c961b ("drm/amdkfd: flag per-sdma queue reset supported to user space") Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e17df7b086cf908cedd919f448da9e00028419bb)
2025-06-18PCI: pciehp: Ignore belated Presence Detect Changed caused by DPCLukas Wunner
Commit c3be50f7547c ("PCI: pciehp: Ignore Presence Detect Changed caused by DPC") sought to ignore Presence Detect Changed events occurring as a side effect of Downstream Port Containment. The commit awaits recovery from DPC and then clears events which occurred in the meantime. However if the first event seen after DPC is Data Link Layer State Changed, only that event is cleared and not Presence Detect Changed. The object of the commit is thus defeated. That's because pciehp_ist() computes the events to clear based on the local "events" variable instead of "ctrl->pending_events". The former contains the events that had occurred when pciehp_ist() was entered, whereas the latter also contains events that have accumulated while awaiting DPC recovery. In practice, the order of PDC and DLLSC events is arbitrary and the delay in-between can be several milliseconds. So change the logic to always clear PDC events, even if they come after an initial DLLSC event. Fixes: c3be50f7547c ("PCI: pciehp: Ignore Presence Detect Changed caused by DPC") Reported-by: Lương Việt Hoàng <tcm4095@gmail.com> Reported-by: Joel Mathew Thomas <proxy0@tutamail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765#c165 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Lương Việt Hoàng <tcm4095@gmail.com> Tested-by: Joel Mathew Thomas <proxy0@tutamail.com> Link: https://patch.msgid.link/d9c4286a16253af7e93eaf12e076e3ef3546367a.1750257164.git.lukas@wunner.de
2025-06-18gpio: mlxbf3: only get IRQ for device instance 0David Thompson
The gpio-mlxbf3 driver interfaces with two GPIO controllers, device instance 0 and 1. There is a single IRQ resource shared between the two controllers, and it is found in the ACPI table for device instance 0. The driver should not attempt to get an IRQ resource when probing device instance 1, otherwise the following error is logged: mlxbf3_gpio MLNXBF33:01: error -ENXIO: IRQ index 0 not found Signed-off-by: David Thompson <davthompson@nvidia.com> Reviewed-by: Shravan Kumar Ramani <shravankr@nvidia.com> Fixes: cd33f216d241 ("gpio: mlxbf3: Add gpio driver support") Link: https://lore.kernel.org/r/20250613163443.1065217-1-davthompson@nvidia.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-06-18mtd: spinand: winbond: Prevent unsupported frequencies on dual/quad I/O variantsMiquel Raynal
Dual and quad capable chips natively support dual and quad I/O variants at up to 104MHz (1-2-2 and 1-4-4 operations). Reaching the maximum speed of 166MHz is theoretically possible (while still unsupported in the field) by adding a few more dummy cycles. Let's be accurate and clearly state this limit. Setting a maximum frequency implies adding the frequency parameter to the macro, which is done using a variadic argument to avoid impacting all the other drivers which already make use of this macro. Fixes: 1ea808b4d15b ("mtd: spinand: winbond: Update the *JW chip definitions") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-06-18mtd: spinand: winbond: Increase maximum frequency on an octal operationMiquel Raynal
The default number of dummy cycles is 16 in octal I/O mode (1S-8S-8S), and with this default configuration the maximum frequency is higher than what is being advertised. There are higher and lower frequency possibilities, which involve making changes in the number of dummy cycles through the VCR register. At this stage, let's just describe the default configuration correctly. There should be no functional change. Fixes: 1ac5ff2f2ad6 ("mtd: spinand: winbond: Add octal support") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-06-18mtd: spinand: winbond: Fix W35N number of planes/LUNMiquel Raynal
There's been a mistake when extracting the geometry of the W35N02 and W35N04 chips from the datasheet. There is a single plane, however there are respectively 2 and 4 LUNs. They are actually referred in the datasheet as dies (equivalent of target), but as there is no die select operation and the chips only feature a single configuration register for the entire chip (instead of one per die), we can reasonably assume we are talking about LUNs and not dies. Reported-by: Andreas Dannenberg <dannenberg@ti.com> Suggested-by: Vignesh Raghavendra <vigneshr@ti.com> Fixes: 25e08bf66660 ("mtd: spinand: winbond: Add support for W35N02JW and W35N04JW chips") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-06-18Revert "mtd: core: always create master device"Miquel Raynal
The idea behind this patch was to always let a "master" mtd device available to anchor runtime PM. Historically, there was no mtd device representing the whole storage as soon as partitions were coming into play. The introduction of CONFIG_MTD_PARTITIONED_MASTER allowed to keep this "master" device, but was not enabled by default to avoid breaking existing users (otherwise the mtd device numbering would be totally messed up with an off by 1, at least). The approach of adding an mtd_master class on top of partitioned mtd devices is breaking the mtd core in many creative ways, so better think again this approach and revert the faulty changes for now. This reverts commit 0aa7b390fc40a871267a2328bbbefca8b37ad307. Fixes: 0aa7b390fc40 ("mtd: core: always create master device") Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-06-18Merge tag 'iwlwifi-fixes-2025-06-18' of ↵Johannes Berg
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next Miri Korenblit says: ==================== fixes for 6.16-rc3 ==================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-18wifi: iwlwifi: Fix incorrect logic on cmd_ver range checkingColin Ian King
The current cmd_ver range checking on cmd_ver < 1 && cmd_ver > 3 can never be true because the logical operator && is being used, cmd_ver can never be less than 1 and also greater than 3. Fix this by using the logical || operator. Fixes: df6146a0296e ("wifi: iwlwifi: Add a new version for mac config command") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://patch.msgid.link/20250522121703.2766764-1-colin.i.king@gmail.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-06-18wifi: iwlwifi: dvm: restore n_no_reclaim_cmds settingJohannes Berg
Apparently I accidentally removed this setting in my transport configuration rework, leading to an endless stream of warnings from the PCIe code when relevant notifications are received by the driver from firmware. Restore it. Reported-by: Woody Suwalski <terraluna977@gmail.com> Closes: https://lore.kernel.org/r/e8c45d32-6129-8a5e-af80-ccc42aaf200b@gmail.com/ Fixes: 08e77d5edf70 ("wifi: iwlwifi: rework transport configuration") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250616134902.222342908ca4.I47a551c86cbc0e9de4f980ca2fd0d67bf4052e50@changeid Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-06-18wifi: iwlwifi: cfg: Limit cb_size to valid rangePei Xiao
on arm64 defconfig build failed with gcc-8: drivers/net/wireless/intel/iwlwifi/pcie/ctxt-info.c:208:3: include/linux/bitfield.h:195:3: error: call to '__field_overflow' declared with attribute error: value doesn't fit into mask __field_overflow(); \ ^~~~~~~~~~~~~~~~~~ include/linux/bitfield.h:215:2: note: in expansion of macro '____MAKE_OP' ____MAKE_OP(u##size,u##size,,) ^~~~~~~~~~~ include/linux/bitfield.h:218:1: note: in expansion of macro '__MAKE_OP' __MAKE_OP(32) Limit cb_size to valid range to fix it. Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Link: https://patch.msgid.link/7b373a4426070d50b5afb3269fd116c18ce3aea8.1748332709.git.xiaopei01@kylinos.cn Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-06-18wifi: iwlwifi: restore missing initialization of async_handlers_list (again)Miri Korenblit
The initialization of async_handlers_list was accidentally removed in a previous change. Then it was restoted by commit 175e69e33c66 ("wifi: iwlwifi: restore missing initialization of async_handlers_list"). Somehow, the initialization disappeared again. Restote it. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-06-18Merge tag 'ath-current-20250617' of ↵Johannes Berg
git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath Jeff Johnson says: ================== ath.git updates for v6.16-rc3 Fix the following 3 issues: wifi: ath12k: avoid burning CPU while waiting for firmware stats wifi: ath12k: don't activate more links than firmware supports wifi: carl9170: do not ping device which has failed to load firmware ================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-18wifi: ath6kl: remove WARN on bad firmware inputJohannes Berg
If the firmware gives bad input, that's nothing to do with the driver's stack at this point etc., so the WARN_ON() doesn't add any value. Additionally, this is one of the top syzbot reports now. Just print a message, and as an added bonus, print the sizes too. Reported-by: syzbot+92c6dd14aaa230be6855@syzkaller.appspotmail.com Tested-by: syzbot+92c6dd14aaa230be6855@syzkaller.appspotmail.com Acked-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Link: https://patch.msgid.link/20250617114529.031a677a348e.I58bf1eb4ac16a82c546725ff010f3f0d2b0cca49@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-17atm: atmtcp: Free invalid length skb in atmtcp_c_send().Kuniyuki Iwashima
syzbot reported the splat below. [0] vcc_sendmsg() copies data passed from userspace to skb and passes it to vcc->dev->ops->send(). atmtcp_c_send() accesses skb->data as struct atmtcp_hdr after checking if skb->len is 0, but it's not enough. Also, when skb->len == 0, skb and sk (vcc) were leaked because dev_kfree_skb() is not called and sk_wmem_alloc adjustment is missing to revert atm_account_tx() in vcc_sendmsg(), which is expected to be done in atm_pop_raw(). Let's properly free skb with an invalid length in atmtcp_c_send(). [0]: BUG: KMSAN: uninit-value in atmtcp_c_send+0x255/0xed0 drivers/atm/atmtcp.c:294 atmtcp_c_send+0x255/0xed0 drivers/atm/atmtcp.c:294 vcc_sendmsg+0xd7c/0xff0 net/atm/common.c:644 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg+0x330/0x3d0 net/socket.c:727 ____sys_sendmsg+0x7e0/0xd80 net/socket.c:2566 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2620 __sys_sendmsg net/socket.c:2652 [inline] __do_sys_sendmsg net/socket.c:2657 [inline] __se_sys_sendmsg net/socket.c:2655 [inline] __x64_sys_sendmsg+0x211/0x3e0 net/socket.c:2655 x64_sys_call+0x32fb/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4154 [inline] slab_alloc_node mm/slub.c:4197 [inline] kmem_cache_alloc_node_noprof+0x818/0xf00 mm/slub.c:4249 kmalloc_reserve+0x13c/0x4b0 net/core/skbuff.c:579 __alloc_skb+0x347/0x7d0 net/core/skbuff.c:670 alloc_skb include/linux/skbuff.h:1336 [inline] vcc_sendmsg+0xb40/0xff0 net/atm/common.c:628 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg+0x330/0x3d0 net/socket.c:727 ____sys_sendmsg+0x7e0/0xd80 net/socket.c:2566 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2620 __sys_sendmsg net/socket.c:2652 [inline] __do_sys_sendmsg net/socket.c:2657 [inline] __se_sys_sendmsg net/socket.c:2655 [inline] __x64_sys_sendmsg+0x211/0x3e0 net/socket.c:2655 x64_sys_call+0x32fb/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 1 UID: 0 PID: 5798 Comm: syz-executor192 Not tainted 6.16.0-rc1-syzkaller-00010-g2c4a1f3fe03e #0 PREEMPT(undef) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+1d3c235276f62963e93a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1d3c235276f62963e93a Tested-by: syzbot+1d3c235276f62963e93a@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250616182147.963333-2-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18Merge tag 'drm-msm-fixes-2025-06-16' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-fixes Fixes for v6.16-rc3 Display: - Fixed DP output on SDM845 - Fixed 10nm DSI PLL init GPU: - SUBMIT ioctl error path leak fixes - drm half of stall-on-fault fixes. Note there is a soft dependency, to get correct mmu fault devcoredumps, on arm-smmu changes which are not in this branch, but have already been merged by Linus. So by the time Linus merges this, everything should be peachy. - a7xx: Missing CP_RESET_CONTEXT_STATE - Skip GPU component bind if GPU is not in the device table. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <rob.clark@oss.qualcomm.com> Link: https://lore.kernel.org/r/CACSVV03=OH74ip8O1xqb8RJWGyM4HFuUnWuR=p3zJR+-ko_AJA@mail.gmail.com
2025-06-17wifi: carl9170: do not ping device which has failed to load firmwareDmitry Antipov
Syzkaller reports [1, 2] crashes caused by an attempts to ping the device which has failed to load firmware. Since such a device doesn't pass 'ieee80211_register_hw()', an internal workqueue managed by 'ieee80211_queue_work()' is not yet created and an attempt to queue work on it causes null-ptr-deref. [1] https://syzkaller.appspot.com/bug?extid=9a4aec827829942045ff [2] https://syzkaller.appspot.com/bug?extid=0d8afba53e8fb2633217 Fixes: e4a668c59080 ("carl9170: fix spurious restart due to high latency") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Christian Lamparter <chunkeey@gmail.com> Link: https://patch.msgid.link/20250616181205.38883-1-dmantipov@yandex.ru Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: don't wait when there is no vdev startedBaochen Qiang
For WMI_REQUEST_VDEV_STAT request, firmware might split response into multiple events dut to buffer limit, hence currently in ath12k_wmi_fw_stats_process() host waits until all events received. In case there is no vdev started, this results in that below condition would never get satisfied ((++ar->fw_stats.num_vdev_recvd) == total_vdevs_started) consequently the requestor would be blocked until time out. The same applies to WMI_REQUEST_BCN_STAT request as well due to: ((++ar->fw_stats.num_bcn_recvd) == ar->num_started_vdevs) Change to check the number of started vdev first: if it is zero, finish directly; if not, follow the old way. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284.1-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Tested-on: QCN9274 hw2.0 WLAN.WBE.1.5-01651-QCAHKSWPL_SILICONZ-1 Fixes: e367c924768b ("wifi: ath12k: Request vdev stats from firmware") Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Link: https://patch.msgid.link/20250612-ath12k-fw-fixes-v1-4-12f594f3b857@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: don't use static variables in ath12k_wmi_fw_stats_process()Baochen Qiang
Currently ath12k_wmi_fw_stats_process() is using static variables to count firmware stat events. Taking num_vdev as an example, if for whatever reason (say ar->num_started_vdevs is 0 or firmware bug etc.) the following condition (++num_vdev) == total_vdevs_started is not met, is_end is not set thus num_vdev won't be cleared. Next time when firmware stats is requested again, even if everything is working fine, failure is expected due to the condition above will never be satisfied. The same applies to num_bcn as well. Change to use non-static counters and reset them each time before firmware stats is requested. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284.1-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Tested-on: QCN9274 hw2.0 WLAN.WBE.1.5-01651-QCAHKSWPL_SILICONZ-1 Fixes: e367c924768b ("wifi: ath12k: Request vdev stats from firmware") Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Link: https://patch.msgid.link/20250612-ath12k-fw-fixes-v1-3-12f594f3b857@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: avoid burning CPU while waiting for firmware statsBaochen Qiang
ath12k_mac_get_fw_stats() is busy polling fw_stats_done flag while waiting firmware finishing sending all events. This is not good as CPU is monopolized and kept burning during the wait. Change to the completion mechanism to fix it. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284.1-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Tested-on: QCN9274 hw2.0 WLAN.WBE.1.5-01651-QCAHKSWPL_SILICONZ-1 Fixes: e367c924768b ("wifi: ath12k: Request vdev stats from firmware") Reported-by: Grégoire Stein <gregoire.s93@live.fr> Closes: https://lore.kernel.org/ath12k/AS8P190MB120575BBB25FCE697CD7D4988763A@AS8P190MB1205.EURP190.PROD.OUTLOOK.COM/ Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Tested-by: Grégoire Stein <gregoire.s93@live.fr> Link: https://patch.msgid.link/20250612-ath12k-fw-fixes-v1-2-12f594f3b857@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: fix documentation on firmware statsBaochen Qiang
Regarding the firmware stats events handling, the comment in ath12k_mac_get_fw_stats() says host determines whether all events have been received based on 'end' tag in TLV. This is wrong as there is no such tag at all, actually host makes the decision totally by itself based on the stats type and active pdev/vdev counts etc. Fix it to correctly reflect the logic. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284.1-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Tested-on: QCN9274 hw2.0 WLAN.WBE.1.5-01651-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Link: https://patch.msgid.link/20250612-ath12k-fw-fixes-v1-1-12f594f3b857@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: don't activate more links than firmware supportsBaochen Qiang
In case of ML connection, currently all useful links are activated at ASSOC stage: ieee80211_set_active_links(vif, ieee80211_vif_usable_links(vif)) this results in firmware crash when the number of links activated on the same device is more than supported. Since firmware supports activating at most 2 links for a ML connection, to avoid firmware crash, host needs to select 2 links out of the useful links. As the assoc link has already been chosen, the question becomes how to determine partner links. A straightforward principle applied here is that the resulted combination should achieve the best throughput. For that purpose, ideally various factors like bandwidth, RSSI etc should be considered. But that would be too complicate. To make it easy, the choice is to only take hardware modes into consideration. The SBS (single band simultaneously) mode frequency range covers 5 GHz and 6 GHz bands. In this mode, the two individual MACs are both active, with one working on 5g-high band and the other on 5g-low band (from hardware perspective 5 GHz and 6 GHz bands are referred to as a 'large' single 5 GHz band). The DBS (dual band simultaneously) mode covers 2 GHz band and the 'large' 5 GHz band, with one MAC working on 2 GHz band and the other working on 5 GHz band or 6 GHz band. Since 5,6 GHz bands could provide higher bandwidth than 2 GHz band, the preference is given to SBS mode. Other hardware modes results in only one working MAC at any given time, so it is chosen only when both SBS are DBS are not possible. For each hardware mode, if there are more than one partner candidate, just choose the first one. For now only single device MLO case is handled as it is easy. Other cases could be addressed in the future. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-6-54a29e7a3a88@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: update link active in case two links fall on the same MACBaochen Qiang
In case of two links established on the same device in an ML connection, depending on device's hardware mode capability, it is possible that both links fall on the same MAC. Currently, no specific action is taken to address this but just keep both links active. However this would result in lower throughput compared to even one link, because switching between these two links on the resulted MAC significantly impacts throughput. Check if both links fall in the frequency range of a single MAC. If so, send WMI_MLO_LINK_SET_ACTIVE_CMDID command to firmware such that firmware can deactivate one of them. Note the decision of which link getting deactivated is made by firmware, host only sends the vdev lists. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-5-54a29e7a3a88@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: support WMI_MLO_LINK_SET_ACTIVE_CMDID commandBaochen Qiang
Add WMI_MLO_LINK_SET_ACTIVE_CMDID command. This command allows host to send required link information to firmware such that firmware can make decision on activating/deactivating links in various scenarios. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-4-54a29e7a3a88@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: update freq range for each hardware modeBaochen Qiang
Previous patches parse and save hardware MAC frequency range information in ath12k_svc_ext_info structure. Such range represents hardware capability hence needs to be updated based on host information, e.g. guard the range based on host's low/high boundary. So update frequency range. The updated range is saved in ath12k_hw_mode_info structure and would be used when doing vdev activation and link selection in following patches. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-3-54a29e7a3a88@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: parse and save sbs_lower_band_end_freq from ↵Baochen Qiang
WMI_SERVICE_READY_EXT2_EVENTID event Firmware sends the boundary between lower and higher bands in ath12k_wmi_dbs_or_sbs_cap_params structure embedded in WMI_SERVICE_READY_EXT2_EVENTID event. The boundary is needed when updating frequency range in the following patch. So parse and save it for later use. Note ath12k_wmi_dbs_or_sbs_cap_params is placed after some other structures, so placeholders for them are added as well. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-2-54a29e7a3a88@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: parse and save hardware mode info from ↵Baochen Qiang
WMI_SERVICE_READY_EXT_EVENTID event for later use WLAN hardware might support various hardware modes such as DBS (dual band simultaneously), SBS (single band simultaneously) and DBS_OR_SBS etc, see enum wmi_host_hw_mode_config_type. Firmware advertises actual supported modes in WMI_SERVICE_READY_EXT_EVENTID event. For each mode, firmware advertises frequency range each hardware MAC can operate on. In MLO case such information is necessary during vdev activation and link selection (which is done in following patches), so add a new structure ath12k_svc_ext_info to ath12k_wmi_base, then parse and save those information to it for later use. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-1-54a29e7a3a88@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: Avoid CPU busy-wait by handling VDEV_STAT and BCN_STATBjorn Andersson
When the ath12k driver is built without CONFIG_ATH12K_DEBUG, the recently refactored stats code can cause any user space application (such at NetworkManager) to consume 100% CPU for 3 seconds, every time stats are read. Commit 'b8a0d83fe4c7 ("wifi: ath12k: move firmware stats out of debugfs")' moved ath12k_debugfs_fw_stats_request() out of debugfs, by merging the additional logic into ath12k_mac_get_fw_stats(). Among the added responsibility of ath12k_mac_get_fw_stats() was the busy-wait for `fw_stats_done`. Signalling of `fw_stats_done` happens when one of the WMI_REQUEST_PDEV_STAT, WMI_REQUEST_VDEV_STAT, and WMI_REQUEST_BCN_STAT messages are received, but the handling of the latter two commands remained in the debugfs code. As `fw_stats_done` isn't signalled, the calling processes will spin until the timeout (3 seconds) is reached. Moving the handling of these two additional responses out of debugfs resolves the issue. Fixes: b8a0d83fe4c7 ("wifi: ath12k: move firmware stats out of debugfs") Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Tested-by: Abel Vesa <abel.vesa@linaro.org> Link: https://patch.msgid.link/20250609-ath12k-fw-stats-done-v1-1-2b3624656697@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17ptp: allow reading of currently dialed frequency to succeed on free-running ↵Vladimir Oltean
clocks There is a bug in ptp_clock_adjtime() which makes it refuse the operation even if we just want to read the current clock dialed frequency, not modify anything (tx->modes == 0). That should be possible even if the clock is free-running. For context, the kernel UAPI is the same for getting and setting the frequency of a POSIX clock. For example, ptp4l errors out at clock_create() -> clockadj_get_freq() -> clock_adjtime() time, when it should logically only have failed on actual adjustments to the clock, aka if the clock was configured as slave. But in master mode it should work. This was discovered when examining the issue described in the previous commit, where ptp_clock_freerun() returned true despite n_vclocks being zero. Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250613174749.406826-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17ptp: fix breakage after ptp_vclock_in_use() reworkVladimir Oltean
What is broken -------------- ptp4l, and any other application which calls clock_adjtime() on a physical clock, is greeted with error -EBUSY after commit 87f7ce260a3c ("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()"). Explanation for the breakage ---------------------------- The blamed commit was based on the false assumption that ptp_vclock_in_use() callers already test for n_vclocks prior to calling this function. This is notably incorrect for the code path below, in which there is, in fact, no n_vclocks test: ptp_clock_adjtime() -> ptp_clock_freerun() -> ptp_vclock_in_use() The result is that any clock adjustment on any physical clock is now impossible. This is _despite_ there not being any vclock over this physical clock. $ ptp4l -i eno0 -2 -P -m ptp4l[58.425]: selected /dev/ptp0 as PTP clock [ 58.429749] ptp: physical clock is free running ptp4l[58.431]: Failed to open /dev/ptp0: Device or resource busy failed to create a clock $ cat /sys/class/ptp/ptp0/n_vclocks 0 The patch makes the ptp_vclock_in_use() function say "if it's not a virtual clock, then this physical clock does have virtual clocks on top". Then ptp_clock_freerun() uses this information to say "this physical clock has virtual clocks on top, so it must stay free-running". Then ptp_clock_adjtime() uses this information to say "well, if this physical clock has to be free-running, I can't do it, return -EBUSY". Simply put, ptp_vclock_in_use() cannot be simplified so as to remove the test whether vclocks are in use. What did the blamed commit intend to fix ---------------------------------------- The blamed commit presents a lockdep warning stating "possible recursive locking detected", with the n_vclocks_store() and ptp_clock_unregister() functions involved. The recursive locking seems this: n_vclocks_store() -> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 1 -> device_for_each_child_reverse(..., unregister_vclock) -> unregister_vclock() -> ptp_vclock_unregister() -> ptp_clock_unregister() -> ptp_vclock_in_use() -> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 2 The issue can be triggered by creating and then deleting vclocks: $ echo 2 > /sys/class/ptp/ptp0/n_vclocks $ echo 0 > /sys/class/ptp/ptp0/n_vclocks But note that in the original stack trace, the address of the first lock is different from the address of the second lock. This is because at step 1 marked above, &ptp->n_vclocks_mux is the lock of the parent (physical) PTP clock, and at step 2, the lock is of the child (virtual) PTP clock. They are different locks of different devices. In this situation there is no real deadlock, the lockdep warning is caused by the fact that the mutexes have the same lock class on both the parent and the child. Functionally it is fine. Proposed alternative solution ----------------------------- We must reintroduce the body of ptp_vclock_in_use() mostly as it was structured prior to the blamed commit, but avoid the lockdep warning. Based on the fact that vclocks cannot be nested on top of one another (ptp_is_attribute_visible() hides n_vclocks for virtual clocks), we already know that ptp->n_vclocks is zero for a virtual clock. And ptp->is_virtual_clock is a runtime invariant, established at ptp_clock_register() time and never changed. There is no need to serialize on any mutex in order to read ptp->is_virtual_clock, and we take advantage of that by moving it outside the lock. Thus, virtual clocks do not need to acquire &ptp->n_vclocks_mux at all, and step 2 in the code walkthrough above can simply go away. We can simply return false to the question "ptp_vclock_in_use(a virtual clock)". Other notes ----------- Releasing &ptp->n_vclocks_mux before ptp_vclock_in_use() returns execution seems racy, because the returned value can become stale as soon as the function returns and before the return value is used (i.e. n_vclocks_store() can run any time). The locking requirement should somehow be transferred to the caller, to ensure a longer life time for the returned value, but this seems out of scope for this severe bug fix. Because we are also fixing up the logic from the original commit, there is another Fixes: tag for that. Fixes: 87f7ce260a3c ("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()") Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250613174749.406826-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17bnxt_en: Update MRU and RSS table of RSS contexts on queue resetPavan Chebbi
The commit under the Fixes tag below which updates the VNICs' RSS and MRU during .ndo_queue_start(), needs to be extended to cover any non-default RSS contexts which have their own VNICs. Without this step, packets that are destined to a non-default RSS context may be dropped after .ndo_queue_start(). We further optimize this scheme by updating the VNIC only if the RX ring being restarted is in the RSS table of the VNIC. Updating the VNIC (in particular setting the MRU to 0) will momentarily stop all traffic to all rings in the RSS table. Any VNIC that has the RX ring excluded from the RSS table can skip this step and avoid the traffic disruption. Note that this scheme is just an improvement. A VNIC with multiple rings in the RSS table will still see traffic disruptions to all rings in the RSS table when one of the rings is being restarted. We are working on a FW scheme that will improve upon this further. Fixes: 5ac066b7b062 ("bnxt_en: Fix queue start to update vnic RSS table") Reported-by: David Wei <dw@davidwei.uk> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250613231841.377988-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17bnxt_en: Add a helper function to configure MRU and RSSPavan Chebbi
Add a new helper function that will configure MRU and RSS table of a VNIC. This will be useful when we configure both on a VNIC when resetting an RX ring. This function will be used again in the next bug fix patch where we have to reconfigure VNICs for RSS contexts. Suggested-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Wei <dw@davidwei.uk> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250613231841.377988-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17bnxt_en: Fix double invocation of bnxt_ulp_stop()/bnxt_ulp_start()Kalesh AP
Before the commit under the Fixes tag below, bnxt_ulp_stop() and bnxt_ulp_start() were always invoked in pairs. After that commit, the new bnxt_ulp_restart() can be invoked after bnxt_ulp_stop() has been called. This may result in the RoCE driver's aux driver .suspend() method being invoked twice. The 2nd bnxt_re_suspend() call will crash when it dereferences a NULL pointer: (NULL ib_device): Handle device suspend call BUG: kernel NULL pointer dereference, address: 0000000000000b78 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 20 UID: 0 PID: 181 Comm: kworker/u96:5 Tainted: G S 6.15.0-rc1 #4 PREEMPT(voluntary) Tainted: [S]=CPU_OUT_OF_SPEC Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017 Workqueue: bnxt_pf_wq bnxt_sp_task [bnxt_en] RIP: 0010:bnxt_re_suspend+0x45/0x1f0 [bnxt_re] Code: 8b 05 a7 3c 5b f5 48 89 44 24 18 31 c0 49 8b 5c 24 08 4d 8b 2c 24 e8 ea 06 0a f4 48 c7 c6 04 60 52 c0 48 89 df e8 1b ce f9 ff <48> 8b 83 78 0b 00 00 48 8b 80 38 03 00 00 a8 40 0f 85 b5 00 00 00 RSP: 0018:ffffa2e84084fd88 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffffb4b6b934 RDI: 00000000ffffffff RBP: ffffa1760954c9c0 R08: 0000000000000000 R09: c0000000ffffdfff R10: 0000000000000001 R11: ffffa2e84084fb50 R12: ffffa176031ef070 R13: ffffa17609775000 R14: ffffa17603adc180 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffa17daa397000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000b78 CR3: 00000004aaa30003 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> bnxt_ulp_stop+0x69/0x90 [bnxt_en] bnxt_sp_task+0x678/0x920 [bnxt_en] ? __schedule+0x514/0xf50 process_scheduled_works+0x9d/0x400 worker_thread+0x11c/0x260 ? __pfx_worker_thread+0x10/0x10 kthread+0xfe/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2b/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Check the BNXT_EN_FLAG_ULP_STOPPED flag and do not proceed if the flag is already set. This will preserve the original symmetrical bnxt_ulp_stop() and bnxt_ulp_start(). Also, inside bnxt_ulp_start(), clear the BNXT_EN_FLAG_ULP_STOPPED flag after taking the mutex to avoid any race condition. And for symmetry, only proceed in bnxt_ulp_start() if the BNXT_EN_FLAG_ULP_STOPPED is set. Fixes: 3c163f35bd50 ("bnxt_en: Optimize recovery path ULP locking in the driver") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Co-developed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250613231841.377988-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge tag 'linux-can-fixes-for-6.16-20250617' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-06-17 The patch is by Brett Werling, and fixes the power regulator retrieval during probe of the tcan4x5x glue code for the m_can driver. * tag 'linux-can-fixes-for-6.16-20250617' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: tcan4x5x: fix power regulator retrieval during probe openvswitch: Allocate struct ovs_pcpu_storage dynamically ==================== Link: https://patch.msgid.link/20250617155123.2141584-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: ti: icssg-prueth: Fix packet handling for XDP_TXMeghana Malladi
While transmitting XDP frames for XDP_TX, page_pool is used to get the DMA buffers (already mapped to the pages) and need to be freed/reycled once the transmission is complete. This need not be explicitly done by the driver as this is handled more gracefully by the xdp driver while returning the xdp frame. __xdp_return() frees the XDP memory based on its memory type, under which page_pool memory is also handled. This change fixes the transmit queue timeout while running XDP_TX. logs: [ 309.069682] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 45860 ms [ 313.933780] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 50724 ms [ 319.053656] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 55844 ms ... Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250616063319.3347541-1-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge tag 'platform-drivers-x86-v6.16-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - amd/hsmp: Timeout handling fixes - amd/pmc: - Clear metrics table at start of cycle - Add PCSpecialist Lafite Pro V 14M to 8042 quirks list - amd/pmf: Fix error handling corner cases (nth attempt) - alienware-wmi-wmax: Revert G-Mode support as it lowers performance - dell_rbu: - Fix sparse lock context warning - Fix list head usage - Don't overwrite data buffer past the size of the last packet - ideapad-laptop: Ensure EC is not polled too frequently - intel-uncore-freq: - Fail module load when plat_info is NULL - Avoid a non-literal format string as it triggers a compiler warning - intel/pmc: Add Lunar Lake and Panther Lake support to SSRAM Telemetry - intel/power-domains: Fix error code in tpmi_init() - samsung-galaxybook: Add support for Notebook 9 Pro and others (SAM0426) * tag 'platform-drivers-x86-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: Revert "platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1" platform/x86/amd/pmc: Add PCSpecialist Lafite Pro V 14M to 8042 quirks list platform/x86/intel-uncore-freq: avoid non-literal format string platform/x86/intel/pmc: Add Panther Lake support to Intel PMC SSRAM Telemetry platform/x86/intel/pmc: Add Lunar Lake support to Intel PMC SSRAM Telemetry MAINTAINERS: .mailmap: Update Hans de Goede's email address platform/x86: dell_rbu: Bump version platform/x86: dell_rbu: Stop overwriting data buffer platform/x86: dell_rbu: Fix list usage platform/x86: dell_rbu: Fix lock context warning platform/x86/amd: pmf: Simplify error flow in amd_pmf_init_smart_pc() platform/x86/amd: pmf: Prevent amd_pmf_tee_deinit() from running twice platform/x86/amd: pmf: Use device managed allocations x86/platform/amd: replace down_timeout() with down_interruptible() x86/platform/amd: move final timeout check to after final sleep platform/x86/amd: pmc: Clear metrics table at start of cycle platform/x86/intel: power-domains: Fix error code in tpmi_init() platform/x86: samsung-galaxybook: Add SAM0426 platform/x86/intel-uncore-freq: Fail module load when plat_info is NULL platform/x86: ideapad-laptop: use usleep_range() for EC polling
2025-06-17e1000e: set fixed clock frequency indication for Nahum 11 and Nahum 13Vitaly Lifshits
On some systems with Nahum 11 and Nahum 13 the value of the XTAL clock in the software STRAP is incorrect. This causes the PTP timer to run at the wrong rate and can lead to synchronization issues. The STRAP value is configured by the system firmware, and a firmware update is not always possible. Since the XTAL clock on these systems always runs at 38.4MHz, the driver may ignore the STRAP and just set the correct value. Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake") Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Reviewed-by: Gil Fine <gil.fine@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-17ice: fix eswitch code memory leak in reset scenarioGrzegorz Nitka
Add simple eswitch mode checker in attaching VF procedure and allocate required port representor memory structures only in switchdev mode. The reset flows triggers VF (if present) detach/attach procedure. It might involve VF port representor(s) re-creation if the device is configured is switchdev mode (not legacy one). The memory was blindly allocated in current implementation, regardless of the mode and not freed if in legacy mode. Kmemeleak trace: unreferenced object (percpu) 0x7e3bce5b888458 (size 40): comm "bash", pid 1784, jiffies 4295743894 hex dump (first 32 bytes on cpu 45): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 0): pcpu_alloc_noprof+0x4c4/0x7c0 ice_repr_create+0x66/0x130 [ice] ice_repr_create_vf+0x22/0x70 [ice] ice_eswitch_attach_vf+0x1b/0xa0 [ice] ice_reset_all_vfs+0x1dd/0x2f0 [ice] ice_pci_err_resume+0x3b/0xb0 [ice] pci_reset_function+0x8f/0x120 reset_store+0x56/0xa0 kernfs_fop_write_iter+0x120/0x1b0 vfs_write+0x31c/0x430 ksys_write+0x61/0xd0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e Testing hints (ethX is PF netdev): - create at least one VF echo 1 > /sys/class/net/ethX/device/sriov_numvfs - trigger the reset echo 1 > /sys/class/net/ethX/device/reset Fixes: 415db8399d06 ("ice: make representor code generic") Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-17net: ice: Perform accurate aRFS flow matchKrishna Kumar
This patch fixes an issue seen in a large-scale deployment under heavy incoming pkts where the aRFS flow wrongly matches a flow and reprograms the NIC with wrong settings. That mis-steering causes RX-path latency spikes and noisy neighbor effects when many connections collide on the same hash (some of our production servers have 20-30K connections). set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by hashing the skb sized by the per rx-queue table size. This results in multiple connections (even across different rx-queues) getting the same hash value. The driver steer function modifies the wrong flow to use this rx-queue, e.g.: Flow#1 is first added: Flow#1: <ip1, port1, ip2, port2>, Hash 'h', q#10 Later when a new flow needs to be added: Flow#2: <ip3, port3, ip4, port4>, Hash 'h', q#20 The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This results in both flows getting un-optimized - packets for Flow#1 goes to q#20, and then reprogrammed back to q#10 later and so on; and Flow #2 programming is never done as Flow#1 is matched first for all misses. Many flows may wrongly share the same hash and reprogram rules of the original flow each with their own q#. Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s 72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS, enable aRFS (global value is 144 * rps_flow_cnt). Test notes about results from ice_rx_flow_steer(): --------------------------------------------------- 1. "Skip:" counter increments here: if (fltr_info->q_index == rxq_idx || arfs_entry->fltr_state != ICE_ARFS_ACTIVE) goto out; 2. "Add:" counter increments here: ret = arfs_entry->fltr_info.fltr_id; INIT_HLIST_NODE(&arfs_entry->list_entry); 3. "Update:" counter increments here: /* update the queue to forward to on an already existing flow */ Runtime comparison: original code vs with the patch for different rps_flow_cnt values. +-------------------------------+--------------+--------------+ | rps_flow_cnt | 512 | 2048 | +-------------------------------+--------------+--------------+ | Ratio of Pkts on Good:Bad q's | 214 vs 822K | 1.1M vs 980K | | Avoid wrong aRFS programming | 0 vs 310K | 0 vs 30K | | CPU User | 216 vs 183 | 216 vs 206 | | CPU System | 1441 vs 1171 | 1447 vs 1320 | | CPU Softirq | 1245 vs 920 | 1238 vs 961 | | CPU Total | 29 vs 22.7 | 29 vs 24.9 | | aRFS Update | 533K vs 59 | 521K vs 32 | | aRFS Skip | 82M vs 77M | 7.2M vs 4.5M | +-------------------------------+--------------+--------------+ A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections showed no performance degradation. Some points on the patch/aRFS behavior: 1. Enabling full tuple matching ensures flows are always correctly matched, even with smaller hash sizes. 2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs and fewer calls to driver for programming on misses. 3. Larger hash tables reduces mis-steering due to more unique flow hashes, but still has clashes. However, with larger per-device rps_flow_cnt, old flows take more time to expire and new aRFS flows cannot be added if h/w limits are reached (rps_may_expire_flow() succeeds when 10*rps_flow_cnt pkts have been processed by this cpu that are not part of the flow). Fixes: 28bf26724fdb0 ("ice: Implement aRFS") Signed-off-by: Krishna Kumar <krikku@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-17can: tcan4x5x: fix power regulator retrieval during probeBrett Werling
Fixes the power regulator retrieval in tcan4x5x_can_probe() by ensuring the regulator pointer is not set to NULL in the successful return from devm_regulator_get_optional(). Fixes: 3814ca3a10be ("can: tcan4x5x: tcan4x5x_can_probe(): turn on the power before parsing the config") Signed-off-by: Brett Werling <brett.werling@garmin.com> Link: https://patch.msgid.link/20250612191825.3646364-1-brett.werling@garmin.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-06-17aoe: defer rexmit timer downdev work to workqueueJustin Sanders
When aoe's rexmit_timer() notices that an aoe target fails to respond to commands for more than aoe_deadsecs, it calls aoedev_downdev() which cleans the outstanding aoe and block queues. This can involve sleeping, such as in blk_mq_freeze_queue(), which should not occur in irq context. This patch defers that aoedev_downdev() call to the aoe device's workqueue. Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665 Signed-off-by: Justin Sanders <jsanders.devel@gmail.com> Link: https://lore.kernel.org/r/20250610170600.869-2-jsanders.devel@gmail.com Tested-By: Valentin Kleibel <valentin@vrvis.at> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-17aoe: clean device rq_list in aoedev_downdev()Justin Sanders
An aoe device's rq_list contains accepted block requests that are waiting to be transmitted to the aoe target. This queue was added as part of the conversion to blk_mq. However, the queue was not cleaned out when an aoe device is downed which caused blk_mq_freeze_queue() to sleep indefinitely waiting for those requests to complete, causing a hang. This fix cleans out the queue before calling blk_mq_freeze_queue(). Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665 Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq") Signed-off-by: Justin Sanders <jsanders.devel@gmail.com> Link: https://lore.kernel.org/r/20250610170600.869-1-jsanders.devel@gmail.com Tested-By: Valentin Kleibel <valentin@vrvis.at> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-17ata: ahci: Disallow LPM for Asus B550-F motherboardMikko Korhonen
Asus ROG STRIX B550-F GAMING (WI-FI) motherboard has problems on some SATA ports with at least one hard drive model (WDC WD20EFAX-68FB5N0) when LPM is enabled. Disabling LPM solves the issue. Cc: stable@vger.kernel.org Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Signed-off-by: Mikko Korhonen <mjkorhon@gmail.com> Link: https://lore.kernel.org/r/20250617062055.784827-1-mjkorhon@gmail.com [cassel: more detailed comment, make single line comments consistent] Signed-off-by: Niklas Cassel <cassel@kernel.org>
2025-06-17gpio: pca953x: fix wrong error probe return valueSascha Hauer
The second argument to dev_err_probe() is the error value. Pass the return value of devm_request_threaded_irq() there instead of the irq number. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Fixes: c47f7ff0fe61 ("gpio: pca953x: Utilise dev_err_probe() where it makes sense") Link: https://lore.kernel.org/r/20250616134503.1201138-1-s.hauer@pengutronix.de Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-06-16drm/etnaviv: Protect the scheduler's pending list with its lockMaíra Canal
Commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active") ensured that active jobs are returned to the pending list when extending the timeout. However, it didn't use the pending list's lock to manipulate the list, which causes a race condition as the scheduler's workqueues are running. Hold the lock while manipulating the scheduler's pending list to prevent a race. Cc: stable@vger.kernel.org Fixes: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active") Reported-by: Philipp Stanner <phasta@kernel.org> Closes: https://lore.kernel.org/dri-devel/964e59ba1539083ef29b06d3c78f5e2e9b138ab8.camel@mailbox.org/ Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250602132240.93314-1-mcanal@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-06-16drm/v3d: Avoid NULL pointer dereference in `v3d_job_update_stats()`Maíra Canal
The following kernel Oops was recently reported by Mesa CI: [ 800.139824] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000588 [ 800.148619] Mem abort info: [ 800.151402] ESR = 0x0000000096000005 [ 800.155141] EC = 0x25: DABT (current EL), IL = 32 bits [ 800.160444] SET = 0, FnV = 0 [ 800.163488] EA = 0, S1PTW = 0 [ 800.166619] FSC = 0x05: level 1 translation fault [ 800.171487] Data abort info: [ 800.174357] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [ 800.179832] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 800.184873] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 800.190176] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001014c2000 [ 800.196607] [0000000000000588] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [ 800.205305] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 800.211564] Modules linked in: vc4 snd_soc_hdmi_codec drm_display_helper v3d cec gpu_sched drm_dma_helper drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm i2c_brcmstb snd_timer snd backlight [ 800.234448] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.25+rpt-rpi-v8 #1 Debian 1:6.12.25-1+rpt1 [ 800.244182] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) [ 800.250005] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 800.256959] pc : v3d_job_update_stats+0x60/0x130 [v3d] [ 800.262112] lr : v3d_job_update_stats+0x48/0x130 [v3d] [ 800.267251] sp : ffffffc080003e60 [ 800.270555] x29: ffffffc080003e60 x28: ffffffd842784980 x27: 0224012000000000 [ 800.277687] x26: ffffffd84277f630 x25: ffffff81012fd800 x24: 0000000000000020 [ 800.284818] x23: ffffff8040238b08 x22: 0000000000000570 x21: 0000000000000158 [ 800.291948] x20: 0000000000000000 x19: ffffff8040238000 x18: 0000000000000000 [ 800.299078] x17: ffffffa8c1bd2000 x16: ffffffc080000000 x15: 0000000000000000 [ 800.306208] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 800.313338] x11: 0000000000000040 x10: 0000000000001a40 x9 : ffffffd83b39757c [ 800.320468] x8 : ffffffd842786420 x7 : 7fffffffffffffff x6 : 0000000000ef32b0 [ 800.327598] x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : ffffffd842784980 [ 800.334728] x2 : 0000000000000004 x1 : 0000000000010002 x0 : 000000ba4c0ca382 [ 800.341859] Call trace: [ 800.344294] v3d_job_update_stats+0x60/0x130 [v3d] [ 800.349086] v3d_irq+0x124/0x2e0 [v3d] [ 800.352835] __handle_irq_event_percpu+0x58/0x218 [ 800.357539] handle_irq_event+0x54/0xb8 [ 800.361369] handle_fasteoi_irq+0xac/0x240 [ 800.365458] handle_irq_desc+0x48/0x68 [ 800.369200] generic_handle_domain_irq+0x24/0x38 [ 800.373810] gic_handle_irq+0x48/0xd8 [ 800.377464] call_on_irq_stack+0x24/0x58 [ 800.381379] do_interrupt_handler+0x88/0x98 [ 800.385554] el1_interrupt+0x34/0x68 [ 800.389123] el1h_64_irq_handler+0x18/0x28 [ 800.393211] el1h_64_irq+0x64/0x68 [ 800.396603] default_idle_call+0x3c/0x168 [ 800.400606] do_idle+0x1fc/0x230 [ 800.403827] cpu_startup_entry+0x40/0x50 [ 800.407742] rest_init+0xe4/0xf0 [ 800.410962] start_kernel+0x5e8/0x790 [ 800.414616] __primary_switched+0x80/0x90 [ 800.418622] Code: 8b170277 8b160296 11000421 b9000861 (b9401ac1) [ 800.424707] ---[ end trace 0000000000000000 ]--- [ 800.457313] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- This issue happens when the file descriptor is closed before the jobs submitted by it are completed. When the job completes, we update the global GPU stats and the per-fd GPU stats, which are exposed through fdinfo. If the file descriptor was closed, then the struct `v3d_file_priv` and its stats were already freed and we can't update the per-fd stats. Therefore, if the file descriptor was already closed, don't update the per-fd GPU stats, only update the global ones. Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Link: https://lore.kernel.org/r/20250602151451.10161-1-mcanal@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-06-16scsi: elx: efct: Fix memory leak in efct_hw_parse_filter()Vitaliy Shevtsov
strsep() modifies the address of the pointer passed to it so that it no longer points to the original address. This means kfree() gets the wrong pointer. Fix this by passing unmodified pointer returned from kstrdup() to kfree(). Found by Linux Verification Center (linuxtesting.org) with Svace. Fixes: 4df84e846624 ("scsi: elx: efct: Driver initialization routines") Signed-off-by: Vitaliy Shevtsov <v.shevtsov@mt-integration.ru> Link: https://lore.kernel.org/r/20250612163616.24298-1-v.shevtsov@mt-integration.ru Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-16EDAC/amd64: Correct number of UMCs for family 19h models 70h-7fhAvadhut Naik
AMD's Family 19h-based Models 70h-7fh support 4 unified memory controllers (UMC) per processor die. The amd64_edac driver, however, assumes only 2 UMCs are supported since max_mcs variable for the models has not been explicitly set to 4. The same results in incomplete or incorrect memory information being logged to dmesg by the module during initialization in some instances. Fixes: 6c79e42169fe ("EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh") Closes: https://lore.kernel.org/all/27dc093f-ce27-4c71-9e81-786150a040b6@reox.at/ Reported-by: reox <mailinglist@reox.at> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@kernel.org Link: https://lore.kernel.org/20250613005233.2330627-1-avadhut.naik@amd.com
2025-06-16scsi: target: Fix NULL pointer dereference in core_scsi3_decode_spec_i_port()Maurizio Lombardi
The function core_scsi3_decode_spec_i_port(), in its error code path, unconditionally calls core_scsi3_lunacl_undepend_item() passing the dest_se_deve pointer, which may be NULL. This can lead to a NULL pointer dereference if dest_se_deve remains unset. SPC-3 PR SPEC_I_PT: Unable to locate dest_tpg Unable to handle kernel paging request at virtual address dfff800000000012 Call trace: core_scsi3_lunacl_undepend_item+0x2c/0xf0 [target_core_mod] (P) core_scsi3_decode_spec_i_port+0x120c/0x1c30 [target_core_mod] core_scsi3_emulate_pro_register+0x6b8/0xcd8 [target_core_mod] target_scsi3_emulate_pr_out+0x56c/0x840 [target_core_mod] Fix this by adding a NULL check before calling core_scsi3_lunacl_undepend_item() Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Link: https://lore.kernel.org/r/20250612101556.24829-1-mlombard@redhat.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>