Age | Commit message (Collapse) | Author |
|
For WMI_REQUEST_VDEV_STAT request, firmware might split response into
multiple events dut to buffer limit, hence currently in
ath12k_wmi_fw_stats_process() host waits until all events received. In
case there is no vdev started, this results in that below condition
would never get satisfied
((++ar->fw_stats.num_vdev_recvd) == total_vdevs_started)
consequently the requestor would be blocked until time out.
The same applies to WMI_REQUEST_BCN_STAT request as well due to:
((++ar->fw_stats.num_bcn_recvd) == ar->num_started_vdevs)
Change to check the number of started vdev first: if it is zero, finish
directly; if not, follow the old way.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284.1-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Tested-on: QCN9274 hw2.0 WLAN.WBE.1.5-01651-QCAHKSWPL_SILICONZ-1
Fixes: e367c924768b ("wifi: ath12k: Request vdev stats from firmware")
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Link: https://patch.msgid.link/20250612-ath12k-fw-fixes-v1-4-12f594f3b857@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Currently ath12k_wmi_fw_stats_process() is using static variables to count
firmware stat events. Taking num_vdev as an example, if for whatever
reason (say ar->num_started_vdevs is 0 or firmware bug etc.) the following
condition
(++num_vdev) == total_vdevs_started
is not met, is_end is not set thus num_vdev won't be cleared. Next time
when firmware stats is requested again, even if everything is working
fine, failure is expected due to the condition above will never be
satisfied.
The same applies to num_bcn as well.
Change to use non-static counters and reset them each time before firmware
stats is requested.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284.1-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Tested-on: QCN9274 hw2.0 WLAN.WBE.1.5-01651-QCAHKSWPL_SILICONZ-1
Fixes: e367c924768b ("wifi: ath12k: Request vdev stats from firmware")
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Link: https://patch.msgid.link/20250612-ath12k-fw-fixes-v1-3-12f594f3b857@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
ath12k_mac_get_fw_stats() is busy polling fw_stats_done flag while waiting
firmware finishing sending all events. This is not good as CPU is
monopolized and kept burning during the wait.
Change to the completion mechanism to fix it.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284.1-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Tested-on: QCN9274 hw2.0 WLAN.WBE.1.5-01651-QCAHKSWPL_SILICONZ-1
Fixes: e367c924768b ("wifi: ath12k: Request vdev stats from firmware")
Reported-by: Grégoire Stein <gregoire.s93@live.fr>
Closes: https://lore.kernel.org/ath12k/AS8P190MB120575BBB25FCE697CD7D4988763A@AS8P190MB1205.EURP190.PROD.OUTLOOK.COM/
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Tested-by: Grégoire Stein <gregoire.s93@live.fr>
Link: https://patch.msgid.link/20250612-ath12k-fw-fixes-v1-2-12f594f3b857@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Regarding the firmware stats events handling, the comment in
ath12k_mac_get_fw_stats() says host determines whether all events have been
received based on 'end' tag in TLV. This is wrong as there is no such tag
at all, actually host makes the decision totally by itself based on the
stats type and active pdev/vdev counts etc.
Fix it to correctly reflect the logic.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284.1-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Tested-on: QCN9274 hw2.0 WLAN.WBE.1.5-01651-QCAHKSWPL_SILICONZ-1
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Link: https://patch.msgid.link/20250612-ath12k-fw-fixes-v1-1-12f594f3b857@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
In case of ML connection, currently all useful links are activated at
ASSOC stage:
ieee80211_set_active_links(vif, ieee80211_vif_usable_links(vif))
this results in firmware crash when the number of links activated on the
same device is more than supported.
Since firmware supports activating at most 2 links for a ML connection,
to avoid firmware crash, host needs to select 2 links out of the useful
links. As the assoc link has already been chosen, the question becomes
how to determine partner links. A straightforward principle applied
here is that the resulted combination should achieve the best throughput.
For that purpose, ideally various factors like bandwidth, RSSI etc should
be considered. But that would be too complicate. To make it easy, the
choice is to only take hardware modes into consideration.
The SBS (single band simultaneously) mode frequency range covers 5 GHz
and 6 GHz bands. In this mode, the two individual MACs are both active,
with one working on 5g-high band and the other on 5g-low band (from
hardware perspective 5 GHz and 6 GHz bands are referred to as a 'large'
single 5 GHz band). The DBS (dual band simultaneously) mode covers 2 GHz
band and the 'large' 5 GHz band, with one MAC working on 2 GHz band and
the other working on 5 GHz band or 6 GHz band. Since 5,6 GHz bands could
provide higher bandwidth than 2 GHz band, the preference is given to SBS
mode. Other hardware modes results in only one working MAC at any given
time, so it is chosen only when both SBS are DBS are not possible.
For each hardware mode, if there are more than one partner candidate,
just choose the first one.
For now only single device MLO case is handled as it is easy. Other cases
could be addressed in the future.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-6-54a29e7a3a88@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
In case of two links established on the same device in an ML connection,
depending on device's hardware mode capability, it is possible that both
links fall on the same MAC. Currently, no specific action is taken to
address this but just keep both links active. However this would result
in lower throughput compared to even one link, because switching between
these two links on the resulted MAC significantly impacts throughput.
Check if both links fall in the frequency range of a single MAC. If
so, send WMI_MLO_LINK_SET_ACTIVE_CMDID command to firmware such that
firmware can deactivate one of them. Note the decision of which link
getting deactivated is made by firmware, host only sends the vdev
lists.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-5-54a29e7a3a88@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Add WMI_MLO_LINK_SET_ACTIVE_CMDID command. This command allows host to
send required link information to firmware such that firmware can make
decision on activating/deactivating links in various scenarios.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-4-54a29e7a3a88@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Previous patches parse and save hardware MAC frequency range information
in ath12k_svc_ext_info structure. Such range represents hardware
capability hence needs to be updated based on host information, e.g. guard
the range based on host's low/high boundary.
So update frequency range. The updated range is saved in
ath12k_hw_mode_info structure and would be used when doing vdev activation
and link selection in following patches.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-3-54a29e7a3a88@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
WMI_SERVICE_READY_EXT2_EVENTID event
Firmware sends the boundary between lower and higher bands in
ath12k_wmi_dbs_or_sbs_cap_params structure embedded in
WMI_SERVICE_READY_EXT2_EVENTID event. The boundary is needed when
updating frequency range in the following patch. So parse and save
it for later use. Note ath12k_wmi_dbs_or_sbs_cap_params is placed
after some other structures, so placeholders for them are added
as well.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-2-54a29e7a3a88@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
WMI_SERVICE_READY_EXT_EVENTID event for later use
WLAN hardware might support various hardware modes such as DBS (dual
band simultaneously), SBS (single band simultaneously) and DBS_OR_SBS
etc, see enum wmi_host_hw_mode_config_type. Firmware advertises actual
supported modes in WMI_SERVICE_READY_EXT_EVENTID event. For each mode,
firmware advertises frequency range each hardware MAC can operate on.
In MLO case such information is necessary during vdev activation and
link selection (which is done in following patches), so add a new
structure ath12k_svc_ext_info to ath12k_wmi_base, then parse and save
those information to it for later use.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-1-54a29e7a3a88@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
When the ath12k driver is built without CONFIG_ATH12K_DEBUG, the
recently refactored stats code can cause any user space application
(such at NetworkManager) to consume 100% CPU for 3 seconds, every time
stats are read.
Commit 'b8a0d83fe4c7 ("wifi: ath12k: move firmware stats out of
debugfs")' moved ath12k_debugfs_fw_stats_request() out of debugfs, by
merging the additional logic into ath12k_mac_get_fw_stats().
Among the added responsibility of ath12k_mac_get_fw_stats() was the
busy-wait for `fw_stats_done`.
Signalling of `fw_stats_done` happens when one of the
WMI_REQUEST_PDEV_STAT, WMI_REQUEST_VDEV_STAT, and WMI_REQUEST_BCN_STAT
messages are received, but the handling of the latter two commands remained
in the debugfs code. As `fw_stats_done` isn't signalled, the calling
processes will spin until the timeout (3 seconds) is reached.
Moving the handling of these two additional responses out of debugfs
resolves the issue.
Fixes: b8a0d83fe4c7 ("wifi: ath12k: move firmware stats out of debugfs")
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Tested-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://patch.msgid.link/20250609-ath12k-fw-stats-done-v1-1-2b3624656697@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
clocks
There is a bug in ptp_clock_adjtime() which makes it refuse the
operation even if we just want to read the current clock dialed
frequency, not modify anything (tx->modes == 0). That should be possible
even if the clock is free-running. For context, the kernel UAPI is the
same for getting and setting the frequency of a POSIX clock.
For example, ptp4l errors out at clock_create() -> clockadj_get_freq()
-> clock_adjtime() time, when it should logically only have failed on
actual adjustments to the clock, aka if the clock was configured as
slave. But in master mode it should work.
This was discovered when examining the issue described in the previous
commit, where ptp_clock_freerun() returned true despite n_vclocks being
zero.
Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250613174749.406826-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
What is broken
--------------
ptp4l, and any other application which calls clock_adjtime() on a
physical clock, is greeted with error -EBUSY after commit 87f7ce260a3c
("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()").
Explanation for the breakage
----------------------------
The blamed commit was based on the false assumption that
ptp_vclock_in_use() callers already test for n_vclocks prior to calling
this function.
This is notably incorrect for the code path below, in which there is, in
fact, no n_vclocks test:
ptp_clock_adjtime()
-> ptp_clock_freerun()
-> ptp_vclock_in_use()
The result is that any clock adjustment on any physical clock is now
impossible. This is _despite_ there not being any vclock over this
physical clock.
$ ptp4l -i eno0 -2 -P -m
ptp4l[58.425]: selected /dev/ptp0 as PTP clock
[ 58.429749] ptp: physical clock is free running
ptp4l[58.431]: Failed to open /dev/ptp0: Device or resource busy
failed to create a clock
$ cat /sys/class/ptp/ptp0/n_vclocks
0
The patch makes the ptp_vclock_in_use() function say "if it's not a
virtual clock, then this physical clock does have virtual clocks on
top".
Then ptp_clock_freerun() uses this information to say "this physical
clock has virtual clocks on top, so it must stay free-running".
Then ptp_clock_adjtime() uses this information to say "well, if this
physical clock has to be free-running, I can't do it, return -EBUSY".
Simply put, ptp_vclock_in_use() cannot be simplified so as to remove the
test whether vclocks are in use.
What did the blamed commit intend to fix
----------------------------------------
The blamed commit presents a lockdep warning stating "possible recursive
locking detected", with the n_vclocks_store() and ptp_clock_unregister()
functions involved.
The recursive locking seems this:
n_vclocks_store()
-> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 1
-> device_for_each_child_reverse(..., unregister_vclock)
-> unregister_vclock()
-> ptp_vclock_unregister()
-> ptp_clock_unregister()
-> ptp_vclock_in_use()
-> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 2
The issue can be triggered by creating and then deleting vclocks:
$ echo 2 > /sys/class/ptp/ptp0/n_vclocks
$ echo 0 > /sys/class/ptp/ptp0/n_vclocks
But note that in the original stack trace, the address of the first lock
is different from the address of the second lock. This is because at
step 1 marked above, &ptp->n_vclocks_mux is the lock of the parent
(physical) PTP clock, and at step 2, the lock is of the child (virtual)
PTP clock. They are different locks of different devices.
In this situation there is no real deadlock, the lockdep warning is
caused by the fact that the mutexes have the same lock class on both the
parent and the child. Functionally it is fine.
Proposed alternative solution
-----------------------------
We must reintroduce the body of ptp_vclock_in_use() mostly as it was
structured prior to the blamed commit, but avoid the lockdep warning.
Based on the fact that vclocks cannot be nested on top of one another
(ptp_is_attribute_visible() hides n_vclocks for virtual clocks), we
already know that ptp->n_vclocks is zero for a virtual clock. And
ptp->is_virtual_clock is a runtime invariant, established at
ptp_clock_register() time and never changed. There is no need to
serialize on any mutex in order to read ptp->is_virtual_clock, and we
take advantage of that by moving it outside the lock.
Thus, virtual clocks do not need to acquire &ptp->n_vclocks_mux at
all, and step 2 in the code walkthrough above can simply go away.
We can simply return false to the question "ptp_vclock_in_use(a virtual
clock)".
Other notes
-----------
Releasing &ptp->n_vclocks_mux before ptp_vclock_in_use() returns
execution seems racy, because the returned value can become stale as
soon as the function returns and before the return value is used (i.e.
n_vclocks_store() can run any time). The locking requirement should
somehow be transferred to the caller, to ensure a longer life time for
the returned value, but this seems out of scope for this severe bug fix.
Because we are also fixing up the logic from the original commit, there
is another Fixes: tag for that.
Fixes: 87f7ce260a3c ("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()")
Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250613174749.406826-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The commit under the Fixes tag below which updates the VNICs' RSS
and MRU during .ndo_queue_start(), needs to be extended to cover any
non-default RSS contexts which have their own VNICs. Without this
step, packets that are destined to a non-default RSS context may be
dropped after .ndo_queue_start().
We further optimize this scheme by updating the VNIC only if the
RX ring being restarted is in the RSS table of the VNIC. Updating
the VNIC (in particular setting the MRU to 0) will momentarily stop
all traffic to all rings in the RSS table. Any VNIC that has the
RX ring excluded from the RSS table can skip this step and avoid the
traffic disruption.
Note that this scheme is just an improvement. A VNIC with multiple
rings in the RSS table will still see traffic disruptions to all rings
in the RSS table when one of the rings is being restarted. We are
working on a FW scheme that will improve upon this further.
Fixes: 5ac066b7b062 ("bnxt_en: Fix queue start to update vnic RSS table")
Reported-by: David Wei <dw@davidwei.uk>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250613231841.377988-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new helper function that will configure MRU and RSS table
of a VNIC. This will be useful when we configure both on a VNIC
when resetting an RX ring. This function will be used again in
the next bug fix patch where we have to reconfigure VNICs for RSS
contexts.
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Wei <dw@davidwei.uk>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250613231841.377988-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Before the commit under the Fixes tag below, bnxt_ulp_stop() and
bnxt_ulp_start() were always invoked in pairs. After that commit,
the new bnxt_ulp_restart() can be invoked after bnxt_ulp_stop()
has been called. This may result in the RoCE driver's aux driver
.suspend() method being invoked twice. The 2nd bnxt_re_suspend()
call will crash when it dereferences a NULL pointer:
(NULL ib_device): Handle device suspend call
BUG: kernel NULL pointer dereference, address: 0000000000000b78
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 20 UID: 0 PID: 181 Comm: kworker/u96:5 Tainted: G S 6.15.0-rc1 #4 PREEMPT(voluntary)
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
Workqueue: bnxt_pf_wq bnxt_sp_task [bnxt_en]
RIP: 0010:bnxt_re_suspend+0x45/0x1f0 [bnxt_re]
Code: 8b 05 a7 3c 5b f5 48 89 44 24 18 31 c0 49 8b 5c 24 08 4d 8b 2c 24 e8 ea 06 0a f4 48 c7 c6 04 60 52 c0 48 89 df e8 1b ce f9 ff <48> 8b 83 78 0b 00 00 48 8b 80 38 03 00 00 a8 40 0f 85 b5 00 00 00
RSP: 0018:ffffa2e84084fd88 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffffb4b6b934 RDI: 00000000ffffffff
RBP: ffffa1760954c9c0 R08: 0000000000000000 R09: c0000000ffffdfff
R10: 0000000000000001 R11: ffffa2e84084fb50 R12: ffffa176031ef070
R13: ffffa17609775000 R14: ffffa17603adc180 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffa17daa397000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000b78 CR3: 00000004aaa30003 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
bnxt_ulp_stop+0x69/0x90 [bnxt_en]
bnxt_sp_task+0x678/0x920 [bnxt_en]
? __schedule+0x514/0xf50
process_scheduled_works+0x9d/0x400
worker_thread+0x11c/0x260
? __pfx_worker_thread+0x10/0x10
kthread+0xfe/0x1e0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2b/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
Check the BNXT_EN_FLAG_ULP_STOPPED flag and do not proceed if the flag
is already set. This will preserve the original symmetrical
bnxt_ulp_stop() and bnxt_ulp_start().
Also, inside bnxt_ulp_start(), clear the BNXT_EN_FLAG_ULP_STOPPED
flag after taking the mutex to avoid any race condition. And for
symmetry, only proceed in bnxt_ulp_start() if the
BNXT_EN_FLAG_ULP_STOPPED is set.
Fixes: 3c163f35bd50 ("bnxt_en: Optimize recovery path ULP locking in the driver")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Co-developed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250613231841.377988-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-06-17
The patch is by Brett Werling, and fixes the power regulator retrieval
during probe of the tcan4x5x glue code for the m_can driver.
* tag 'linux-can-fixes-for-6.16-20250617' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: tcan4x5x: fix power regulator retrieval during probe
openvswitch: Allocate struct ovs_pcpu_storage dynamically
====================
Link: https://patch.msgid.link/20250617155123.2141584-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
While transmitting XDP frames for XDP_TX, page_pool is
used to get the DMA buffers (already mapped to the pages)
and need to be freed/reycled once the transmission is complete.
This need not be explicitly done by the driver as this is handled
more gracefully by the xdp driver while returning the xdp frame.
__xdp_return() frees the XDP memory based on its memory type,
under which page_pool memory is also handled. This change fixes
the transmit queue timeout while running XDP_TX.
logs:
[ 309.069682] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 45860 ms
[ 313.933780] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 50724 ms
[ 319.053656] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 55844 ms
...
Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250616063319.3347541-1-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the first driver for Apple Silicon was upstreamed we accidentally
included `default ARCH_APPLE` in its Kconfig which then spread to almost
every subsequent driver. As soon as ARCH_APPLE is set to y this will
pull in many drivers as built-ins which is not what we want.
Thus, drop `default ARCH_APPLE` from Kconfig.
Signed-off-by: Sven Peter <sven@kernel.org>
Link: https://lore.kernel.org/r/20250612-apple-kconfig-defconfig-v1-8-0e6f9cb512c1@kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
It is very useful to find driver implementing compatibles with `git grep
compatible`, so driver should not use defines for that string, even if
this means string will be effectively duplicated.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20250613071653.46809-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- amd/hsmp: Timeout handling fixes
- amd/pmc:
- Clear metrics table at start of cycle
- Add PCSpecialist Lafite Pro V 14M to 8042 quirks list
- amd/pmf: Fix error handling corner cases (nth attempt)
- alienware-wmi-wmax: Revert G-Mode support as it lowers performance
- dell_rbu:
- Fix sparse lock context warning
- Fix list head usage
- Don't overwrite data buffer past the size of the last packet
- ideapad-laptop: Ensure EC is not polled too frequently
- intel-uncore-freq:
- Fail module load when plat_info is NULL
- Avoid a non-literal format string as it triggers a compiler warning
- intel/pmc: Add Lunar Lake and Panther Lake support to SSRAM Telemetry
- intel/power-domains: Fix error code in tpmi_init()
- samsung-galaxybook: Add support for Notebook 9 Pro and others
(SAM0426)
* tag 'platform-drivers-x86-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
Revert "platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1"
platform/x86/amd/pmc: Add PCSpecialist Lafite Pro V 14M to 8042 quirks list
platform/x86/intel-uncore-freq: avoid non-literal format string
platform/x86/intel/pmc: Add Panther Lake support to Intel PMC SSRAM Telemetry
platform/x86/intel/pmc: Add Lunar Lake support to Intel PMC SSRAM Telemetry
MAINTAINERS: .mailmap: Update Hans de Goede's email address
platform/x86: dell_rbu: Bump version
platform/x86: dell_rbu: Stop overwriting data buffer
platform/x86: dell_rbu: Fix list usage
platform/x86: dell_rbu: Fix lock context warning
platform/x86/amd: pmf: Simplify error flow in amd_pmf_init_smart_pc()
platform/x86/amd: pmf: Prevent amd_pmf_tee_deinit() from running twice
platform/x86/amd: pmf: Use device managed allocations
x86/platform/amd: replace down_timeout() with down_interruptible()
x86/platform/amd: move final timeout check to after final sleep
platform/x86/amd: pmc: Clear metrics table at start of cycle
platform/x86/intel: power-domains: Fix error code in tpmi_init()
platform/x86: samsung-galaxybook: Add SAM0426
platform/x86/intel-uncore-freq: Fail module load when plat_info is NULL
platform/x86: ideapad-laptop: use usleep_range() for EC polling
|
|
The obj_event may be loaded immediately after inserted, then if the
list_head is not initialized then we may get a poisonous pointer. This
fixes the crash below:
mlx5_core 0000:03:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
mlx5_core.sf mlx5_core.sf.4: firmware version: 32.38.3056
mlx5_core 0000:03:00.0 en3f0pf0sf2002: renamed from eth0
mlx5_core.sf mlx5_core.sf.4: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps
IPv6: ADDRCONF(NETDEV_CHANGE): en3f0pf0sf2002: link becomes ready
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000060
Mem abort info:
ESR = 0x96000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=00000007760fb000
[0000000000000060] pgd=000000076f6d7003, p4d=000000076f6d7003, pud=0000000777841003, pmd=0000000000000000
Internal error: Oops: 96000006 [#1] SMP
Modules linked in: ipmb_host(OE) act_mirred(E) cls_flower(E) sch_ingress(E) mptcp_diag(E) udp_diag(E) raw_diag(E) unix_diag(E) tcp_diag(E) inet_diag(E) binfmt_misc(E) bonding(OE) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) isofs(E) cdrom(E) mst_pciconf(OE) ib_umad(OE) mlx5_ib(OE) ipmb_dev_int(OE) mlx5_core(OE) kpatch_15237886(OEK) mlxdevm(OE) auxiliary(OE) ib_uverbs(OE) ib_core(OE) psample(E) mlxfw(OE) tls(E) sunrpc(E) vfat(E) fat(E) crct10dif_ce(E) ghash_ce(E) sha1_ce(E) sbsa_gwdt(E) virtio_console(E) ext4(E) mbcache(E) jbd2(E) xfs(E) libcrc32c(E) mmc_block(E) virtio_net(E) net_failover(E) failover(E) sha2_ce(E) sha256_arm64(E) nvme(OE) nvme_core(OE) gpio_mlxbf3(OE) mlx_compat(OE) mlxbf_pmc(OE) i2c_mlxbf(OE) sdhci_of_dwcmshc(OE) pinctrl_mlxbf3(OE) mlxbf_pka(OE) gpio_generic(E) i2c_core(E) mmc_core(E) mlxbf_gige(OE) vitesse(E) pwr_mlxbf(OE) mlxbf_tmfifo(OE) micrel(E) mlxbf_bootctl(OE) virtio_ring(E) virtio(E) ipmi_devintf(E) ipmi_msghandler(E)
[last unloaded: mst_pci]
CPU: 11 PID: 20913 Comm: rte-worker-11 Kdump: loaded Tainted: G OE K 5.10.134-13.1.an8.aarch64 #1
Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.2.2.12968 Oct 26 2023
pstate: a0400089 (NzCv daIf +PAN -UAO -TCO BTYPE=--)
pc : dispatch_event_fd+0x68/0x300 [mlx5_ib]
lr : devx_event_notifier+0xcc/0x228 [mlx5_ib]
sp : ffff80001005bcf0
x29: ffff80001005bcf0 x28: 0000000000000001
x27: ffff244e0740a1d8 x26: ffff244e0740a1d0
x25: ffffda56beff5ae0 x24: ffffda56bf911618
x23: ffff244e0596a480 x22: ffff244e0596a480
x21: ffff244d8312ad90 x20: ffff244e0596a480
x19: fffffffffffffff0 x18: 0000000000000000
x17: 0000000000000000 x16: ffffda56be66d620
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000040 x10: ffffda56bfcafb50
x9 : ffffda5655c25f2c x8 : 0000000000000010
x7 : 0000000000000000 x6 : ffff24545a2e24b8
x5 : 0000000000000003 x4 : ffff80001005bd28
x3 : 0000000000000000 x2 : 0000000000000000
x1 : ffff244e0596a480 x0 : ffff244d8312ad90
Call trace:
dispatch_event_fd+0x68/0x300 [mlx5_ib]
devx_event_notifier+0xcc/0x228 [mlx5_ib]
atomic_notifier_call_chain+0x58/0x80
mlx5_eq_async_int+0x148/0x2b0 [mlx5_core]
atomic_notifier_call_chain+0x58/0x80
irq_int_handler+0x20/0x30 [mlx5_core]
__handle_irq_event_percpu+0x60/0x220
handle_irq_event_percpu+0x3c/0x90
handle_irq_event+0x58/0x158
handle_fasteoi_irq+0xfc/0x188
generic_handle_irq+0x34/0x48
...
Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX")
Link: https://patch.msgid.link/r/3ce7f20e0d1a03dc7de6e57494ec4b8eaf1f05c2.1750147949.git.leon@kernel.org
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The GID cache warning messages can flood the kernel log when there are
multiple failed attempts to add GIDs. This can happen when creating many
virtual interfaces without having enough space for their GIDs in the GID
table.
Change pr_warn to pr_warn_ratelimited to prevent log flooding while still
maintaining visibility of the issue.
Link: https://patch.msgid.link/r/fd45ed4a1078e743f498b234c3ae816610ba1b18.1750062357.git.leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
__xa_store() and __xa_erase() were used without holding the proper lock,
which led to a lockdep warning due to unsafe RCU usage. This patch
replaces them with xa_store() and xa_erase(), which perform the necessary
locking internally.
=============================
WARNING: suspicious RCPU usage
6.14.0-rc7_for_upstream_debug_2025_03_18_15_01 #1 Not tainted
-----------------------------
./include/linux/xarray.h:1211 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
3 locks held by kworker/u136:0/219:
at: process_one_work+0xbe4/0x15f0
process_one_work+0x75c/0x15f0
pagefault_mr+0x9a5/0x1390 [mlx5_ib]
stack backtrace:
CPU: 14 UID: 0 PID: 219 Comm: kworker/u136:0 Not tainted
6.14.0-rc7_for_upstream_debug_2025_03_18_15_01 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
Call Trace:
dump_stack_lvl+0xa8/0xc0
lockdep_rcu_suspicious+0x1e6/0x260
xas_create+0xb8a/0xee0
xas_store+0x73/0x14c0
__xa_store+0x13c/0x220
? xa_store_range+0x390/0x390
? spin_bug+0x1d0/0x1d0
pagefault_mr+0xcb5/0x1390 [mlx5_ib]
? _raw_spin_unlock+0x1f/0x30
mlx5_ib_eqe_pf_action+0x3be/0x2620 [mlx5_ib]
? lockdep_hardirqs_on_prepare+0x400/0x400
? mlx5_ib_invalidate_range+0xcb0/0xcb0 [mlx5_ib]
process_one_work+0x7db/0x15f0
? pwq_dec_nr_in_flight+0xda0/0xda0
? assign_work+0x168/0x240
worker_thread+0x57d/0xcd0
? rescuer_thread+0xc40/0xc40
kthread+0x3b3/0x800
? kthread_is_per_cpu+0xb0/0xb0
? lock_downgrade+0x680/0x680
? do_raw_spin_lock+0x12d/0x270
? spin_bug+0x1d0/0x1d0
? finish_task_switch.isra.0+0x284/0x9e0
? lockdep_hardirqs_on_prepare+0x284/0x400
? kthread_is_per_cpu+0xb0/0xb0
ret_from_fork+0x2d/0x70
? kthread_is_per_cpu+0xb0/0xb0
ret_from_fork_asm+0x11/0x20
Fixes: d3d930411ce3 ("RDMA/mlx5: Fix implicit ODP use after free")
Link: https://patch.msgid.link/r/a85ddd16f45c8cb2bc0a188c2b0fcedfce975eb8.1750061791.git.leon@kernel.org
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
On some systems with Nahum 11 and Nahum 13 the value of the XTAL clock in
the software STRAP is incorrect. This causes the PTP timer to run at the
wrong rate and can lead to synchronization issues.
The STRAP value is configured by the system firmware, and a firmware
update is not always possible. Since the XTAL clock on these systems
always runs at 38.4MHz, the driver may ignore the STRAP and just set
the correct value.
Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Reviewed-by: Gil Fine <gil.fine@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add simple eswitch mode checker in attaching VF procedure and allocate
required port representor memory structures only in switchdev mode.
The reset flows triggers VF (if present) detach/attach procedure.
It might involve VF port representor(s) re-creation if the device is
configured is switchdev mode (not legacy one).
The memory was blindly allocated in current implementation,
regardless of the mode and not freed if in legacy mode.
Kmemeleak trace:
unreferenced object (percpu) 0x7e3bce5b888458 (size 40):
comm "bash", pid 1784, jiffies 4295743894
hex dump (first 32 bytes on cpu 45):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 0):
pcpu_alloc_noprof+0x4c4/0x7c0
ice_repr_create+0x66/0x130 [ice]
ice_repr_create_vf+0x22/0x70 [ice]
ice_eswitch_attach_vf+0x1b/0xa0 [ice]
ice_reset_all_vfs+0x1dd/0x2f0 [ice]
ice_pci_err_resume+0x3b/0xb0 [ice]
pci_reset_function+0x8f/0x120
reset_store+0x56/0xa0
kernfs_fop_write_iter+0x120/0x1b0
vfs_write+0x31c/0x430
ksys_write+0x61/0xd0
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Testing hints (ethX is PF netdev):
- create at least one VF
echo 1 > /sys/class/net/ethX/device/sriov_numvfs
- trigger the reset
echo 1 > /sys/class/net/ethX/device/reset
Fixes: 415db8399d06 ("ice: make representor code generic")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This patch fixes an issue seen in a large-scale deployment under heavy
incoming pkts where the aRFS flow wrongly matches a flow and reprograms the
NIC with wrong settings. That mis-steering causes RX-path latency spikes
and noisy neighbor effects when many connections collide on the same
hash (some of our production servers have 20-30K connections).
set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by
hashing the skb sized by the per rx-queue table size. This results in
multiple connections (even across different rx-queues) getting the same
hash value. The driver steer function modifies the wrong flow to use this
rx-queue, e.g.: Flow#1 is first added:
Flow#1: <ip1, port1, ip2, port2>, Hash 'h', q#10
Later when a new flow needs to be added:
Flow#2: <ip3, port3, ip4, port4>, Hash 'h', q#20
The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This
results in both flows getting un-optimized - packets for Flow#1 goes to
q#20, and then reprogrammed back to q#10 later and so on; and Flow #2
programming is never done as Flow#1 is matched first for all misses. Many
flows may wrongly share the same hash and reprogram rules of the original
flow each with their own q#.
Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf
clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s
72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS,
enable aRFS (global value is 144 * rps_flow_cnt).
Test notes about results from ice_rx_flow_steer():
---------------------------------------------------
1. "Skip:" counter increments here:
if (fltr_info->q_index == rxq_idx ||
arfs_entry->fltr_state != ICE_ARFS_ACTIVE)
goto out;
2. "Add:" counter increments here:
ret = arfs_entry->fltr_info.fltr_id;
INIT_HLIST_NODE(&arfs_entry->list_entry);
3. "Update:" counter increments here:
/* update the queue to forward to on an already existing flow */
Runtime comparison: original code vs with the patch for different
rps_flow_cnt values.
+-------------------------------+--------------+--------------+
| rps_flow_cnt | 512 | 2048 |
+-------------------------------+--------------+--------------+
| Ratio of Pkts on Good:Bad q's | 214 vs 822K | 1.1M vs 980K |
| Avoid wrong aRFS programming | 0 vs 310K | 0 vs 30K |
| CPU User | 216 vs 183 | 216 vs 206 |
| CPU System | 1441 vs 1171 | 1447 vs 1320 |
| CPU Softirq | 1245 vs 920 | 1238 vs 961 |
| CPU Total | 29 vs 22.7 | 29 vs 24.9 |
| aRFS Update | 533K vs 59 | 521K vs 32 |
| aRFS Skip | 82M vs 77M | 7.2M vs 4.5M |
+-------------------------------+--------------+--------------+
A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections
showed no performance degradation.
Some points on the patch/aRFS behavior:
1. Enabling full tuple matching ensures flows are always correctly matched,
even with smaller hash sizes.
2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs
and fewer calls to driver for programming on misses.
3. Larger hash tables reduces mis-steering due to more unique flow hashes,
but still has clashes. However, with larger per-device rps_flow_cnt, old
flows take more time to expire and new aRFS flows cannot be added if h/w
limits are reached (rps_may_expire_flow() succeeds when 10*rps_flow_cnt
pkts have been processed by this cpu that are not part of the flow).
Fixes: 28bf26724fdb0 ("ice: Implement aRFS")
Signed-off-by: Krishna Kumar <krikku@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fixes the power regulator retrieval in tcan4x5x_can_probe() by ensuring
the regulator pointer is not set to NULL in the successful return from
devm_regulator_get_optional().
Fixes: 3814ca3a10be ("can: tcan4x5x: tcan4x5x_can_probe(): turn on the power before parsing the config")
Signed-off-by: Brett Werling <brett.werling@garmin.com>
Link: https://patch.msgid.link/20250612191825.3646364-1-brett.werling@garmin.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Add the required features detection glue to bugs.c et all in order to
support the TSA mitigation.
Co-developed-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
|
|
Fix warnings reported by sparse, related to incorrect type:
drivers/platform/mellanox/mlxbf-tmfifo.c:284:38: warning: incorrect type in assignment (different base types)
drivers/platform/mellanox/mlxbf-tmfifo.c:284:38: expected restricted __virtio32 [usertype] len
drivers/platform/mellanox/mlxbf-tmfifo.c:284:38: got unsigned long
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404040339.S7CUIgf3-lkp@intel.com/
Fixes: 78034cbece79 ("platform/mellanox: mlxbf-tmfifo: Drop the Rx packet if no more descriptors")
Signed-off-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20250613214608.2250130-1-davthompson@nvidia.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Without explicitly setting a parent for the watchdog device, the device is
registered with a NULL parent. This causes device_add() (called internally
by devm_watchdog_register_device()) to register the device under
/sys/devices/virtual, since no parent is provided. The result is:
DEVPATH=/devices/virtual/watchdog/watchdog0
To fix this, assign &pdev->dev as the parent of the watchdog device before
calling devm_watchdog_register_device(). This ensures the device is
associated with the Portwell EC platform device and placed correctly in
sysfs as:
DEVPATH=/devices/platform/portwell-ec/watchdog/watchdog0
This aligns the device hierarchy with expectations and avoids misplacement
under the virtual class.
Fixes: 835796753310 ("platform/x86: portwell-ec: Add GPIO and WDT driver for Portwell EC")
Signed-off-by: Ivan Hu <ivan.hu@canonical.com>
Link: https://lore.kernel.org/r/20250616074819.63547-1-ivan.hu@canonical.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
When aoe's rexmit_timer() notices that an aoe target fails to respond to
commands for more than aoe_deadsecs, it calls aoedev_downdev() which
cleans the outstanding aoe and block queues. This can involve sleeping,
such as in blk_mq_freeze_queue(), which should not occur in irq context.
This patch defers that aoedev_downdev() call to the aoe device's
workqueue.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>
Link: https://lore.kernel.org/r/20250610170600.869-2-jsanders.devel@gmail.com
Tested-By: Valentin Kleibel <valentin@vrvis.at>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
An aoe device's rq_list contains accepted block requests that are
waiting to be transmitted to the aoe target. This queue was added as
part of the conversion to blk_mq. However, the queue was not cleaned out
when an aoe device is downed which caused blk_mq_freeze_queue() to sleep
indefinitely waiting for those requests to complete, causing a hang. This
fix cleans out the queue before calling blk_mq_freeze_queue().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq")
Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>
Link: https://lore.kernel.org/r/20250610170600.869-1-jsanders.devel@gmail.com
Tested-By: Valentin Kleibel <valentin@vrvis.at>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Asus ROG STRIX B550-F GAMING (WI-FI) motherboard has problems on some
SATA ports with at least one hard drive model (WDC WD20EFAX-68FB5N0)
when LPM is enabled. Disabling LPM solves the issue.
Cc: stable@vger.kernel.org
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type")
Signed-off-by: Mikko Korhonen <mjkorhon@gmail.com>
Link: https://lore.kernel.org/r/20250617062055.784827-1-mjkorhon@gmail.com
[cassel: more detailed comment, make single line comments consistent]
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
The second argument to dev_err_probe() is the error value. Pass the
return value of devm_request_threaded_irq() there instead of the irq
number.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Fixes: c47f7ff0fe61 ("gpio: pca953x: Utilise dev_err_probe() where it makes sense")
Link: https://lore.kernel.org/r/20250616134503.1201138-1-s.hauer@pengutronix.de
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
Commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still
active") ensured that active jobs are returned to the pending list when
extending the timeout. However, it didn't use the pending list's lock to
manipulate the list, which causes a race condition as the scheduler's
workqueues are running.
Hold the lock while manipulating the scheduler's pending list to prevent
a race.
Cc: stable@vger.kernel.org
Fixes: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active")
Reported-by: Philipp Stanner <phasta@kernel.org>
Closes: https://lore.kernel.org/dri-devel/964e59ba1539083ef29b06d3c78f5e2e9b138ab8.camel@mailbox.org/
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250602132240.93314-1-mcanal@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
|
|
The following kernel Oops was recently reported by Mesa CI:
[ 800.139824] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000588
[ 800.148619] Mem abort info:
[ 800.151402] ESR = 0x0000000096000005
[ 800.155141] EC = 0x25: DABT (current EL), IL = 32 bits
[ 800.160444] SET = 0, FnV = 0
[ 800.163488] EA = 0, S1PTW = 0
[ 800.166619] FSC = 0x05: level 1 translation fault
[ 800.171487] Data abort info:
[ 800.174357] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 800.179832] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 800.184873] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 800.190176] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001014c2000
[ 800.196607] [0000000000000588] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[ 800.205305] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[ 800.211564] Modules linked in: vc4 snd_soc_hdmi_codec drm_display_helper v3d cec gpu_sched drm_dma_helper drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm i2c_brcmstb snd_timer snd backlight
[ 800.234448] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.25+rpt-rpi-v8 #1 Debian 1:6.12.25-1+rpt1
[ 800.244182] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 800.250005] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 800.256959] pc : v3d_job_update_stats+0x60/0x130 [v3d]
[ 800.262112] lr : v3d_job_update_stats+0x48/0x130 [v3d]
[ 800.267251] sp : ffffffc080003e60
[ 800.270555] x29: ffffffc080003e60 x28: ffffffd842784980 x27: 0224012000000000
[ 800.277687] x26: ffffffd84277f630 x25: ffffff81012fd800 x24: 0000000000000020
[ 800.284818] x23: ffffff8040238b08 x22: 0000000000000570 x21: 0000000000000158
[ 800.291948] x20: 0000000000000000 x19: ffffff8040238000 x18: 0000000000000000
[ 800.299078] x17: ffffffa8c1bd2000 x16: ffffffc080000000 x15: 0000000000000000
[ 800.306208] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 800.313338] x11: 0000000000000040 x10: 0000000000001a40 x9 : ffffffd83b39757c
[ 800.320468] x8 : ffffffd842786420 x7 : 7fffffffffffffff x6 : 0000000000ef32b0
[ 800.327598] x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : ffffffd842784980
[ 800.334728] x2 : 0000000000000004 x1 : 0000000000010002 x0 : 000000ba4c0ca382
[ 800.341859] Call trace:
[ 800.344294] v3d_job_update_stats+0x60/0x130 [v3d]
[ 800.349086] v3d_irq+0x124/0x2e0 [v3d]
[ 800.352835] __handle_irq_event_percpu+0x58/0x218
[ 800.357539] handle_irq_event+0x54/0xb8
[ 800.361369] handle_fasteoi_irq+0xac/0x240
[ 800.365458] handle_irq_desc+0x48/0x68
[ 800.369200] generic_handle_domain_irq+0x24/0x38
[ 800.373810] gic_handle_irq+0x48/0xd8
[ 800.377464] call_on_irq_stack+0x24/0x58
[ 800.381379] do_interrupt_handler+0x88/0x98
[ 800.385554] el1_interrupt+0x34/0x68
[ 800.389123] el1h_64_irq_handler+0x18/0x28
[ 800.393211] el1h_64_irq+0x64/0x68
[ 800.396603] default_idle_call+0x3c/0x168
[ 800.400606] do_idle+0x1fc/0x230
[ 800.403827] cpu_startup_entry+0x40/0x50
[ 800.407742] rest_init+0xe4/0xf0
[ 800.410962] start_kernel+0x5e8/0x790
[ 800.414616] __primary_switched+0x80/0x90
[ 800.418622] Code: 8b170277 8b160296 11000421 b9000861 (b9401ac1)
[ 800.424707] ---[ end trace 0000000000000000 ]---
[ 800.457313] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
This issue happens when the file descriptor is closed before the jobs
submitted by it are completed. When the job completes, we update the
global GPU stats and the per-fd GPU stats, which are exposed through
fdinfo. If the file descriptor was closed, then the struct `v3d_file_priv`
and its stats were already freed and we can't update the per-fd stats.
Therefore, if the file descriptor was already closed, don't update the
per-fd GPU stats, only update the global ones.
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Link: https://lore.kernel.org/r/20250602151451.10161-1-mcanal@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
|
|
strsep() modifies the address of the pointer passed to it so that it no
longer points to the original address. This means kfree() gets the wrong
pointer.
Fix this by passing unmodified pointer returned from kstrdup() to
kfree().
Found by Linux Verification Center (linuxtesting.org) with Svace.
Fixes: 4df84e846624 ("scsi: elx: efct: Driver initialization routines")
Signed-off-by: Vitaliy Shevtsov <v.shevtsov@mt-integration.ru>
Link: https://lore.kernel.org/r/20250612163616.24298-1-v.shevtsov@mt-integration.ru
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
AMD's Family 19h-based Models 70h-7fh support 4 unified memory controllers
(UMC) per processor die.
The amd64_edac driver, however, assumes only 2 UMCs are supported since
max_mcs variable for the models has not been explicitly set to 4. The same
results in incomplete or incorrect memory information being logged to dmesg by
the module during initialization in some instances.
Fixes: 6c79e42169fe ("EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh")
Closes: https://lore.kernel.org/all/27dc093f-ce27-4c71-9e81-786150a040b6@reox.at/
Reported-by: reox <mailinglist@reox.at>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@kernel.org
Link: https://lore.kernel.org/20250613005233.2330627-1-avadhut.naik@amd.com
|
|
The function core_scsi3_decode_spec_i_port(), in its error code path,
unconditionally calls core_scsi3_lunacl_undepend_item() passing the
dest_se_deve pointer, which may be NULL.
This can lead to a NULL pointer dereference if dest_se_deve remains
unset.
SPC-3 PR SPEC_I_PT: Unable to locate dest_tpg
Unable to handle kernel paging request at virtual address dfff800000000012
Call trace:
core_scsi3_lunacl_undepend_item+0x2c/0xf0 [target_core_mod] (P)
core_scsi3_decode_spec_i_port+0x120c/0x1c30 [target_core_mod]
core_scsi3_emulate_pro_register+0x6b8/0xcd8 [target_core_mod]
target_scsi3_emulate_pr_out+0x56c/0x840 [target_core_mod]
Fix this by adding a NULL check before calling
core_scsi3_lunacl_undepend_item()
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Link: https://lore.kernel.org/r/20250612101556.24829-1-mlombard@redhat.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Number of apqn target list entries contained in 'nr_apqns' variable is
determined by userspace via an ioctl call so the result of the product in
calculation of size passed to memdup_user() may overflow.
In this case the actual size of the allocated area and the value
describing it won't be in sync leading to various types of unpredictable
behaviour later.
Use a proper memdup_array_user() helper which returns an error if an
overflow is detected. Note that it is different from when nr_apqns is
initially zero - that case is considered valid and should be handled in
subsequent pkey_handler implementations.
Found by Linux Verification Center (linuxtesting.org).
Fixes: f2bbc96e7cfa ("s390/pkey: add CCA AES cipher key support")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20250611192011.206057-1-pchelkin@ispras.ru
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The fault enabled bits were being mistankenly enabled twice in case the FW
property is present. Remove one of the writes.
Fixes: cbc29538dbf7 ("hwmon: Add driver for LTC4282")
Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20250611-fix-ltc4282-repetead-write-v1-1-fe46edd08cf1@analog.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Passing a pointer to an unaligned integer as a function argument is
undefined behavior:
drivers/hwmon/occ/common.c:492:27: warning: taking address of packed member 'accumulator' of class or structure 'power_sensor_2' may result in an unaligned pointer value [-Waddress-of-packed-member]
492 | val = occ_get_powr_avg(&power->accumulator,
| ^~~~~~~~~~~~~~~~~~
drivers/hwmon/occ/common.c:493:13: warning: taking address of packed member 'update_tag' of class or structure 'power_sensor_2' may result in an unaligned pointer value [-Waddress-of-packed-member]
493 | &power->update_tag);
| ^~~~~~~~~~~~~~~~~
Move the get_unaligned() calls out of the function and pass these
through argument registers instead.
Fixes: c10e753d43eb ("hwmon (occ): Add sensor types and versions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250610092553.2641094-1-arnd@kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
clang produces an output with excessive stack usage when building the
occ_setup_sensor_attrs() function, apparently the result of having
a lot of struct literals and building with the -fno-strict-overflow
option that leads clang to skip some optimization in case the 'attr'
pointer overruns:
drivers/hwmon/occ/common.c:775:12: error: stack frame size (1392) exceeds limit (1280) in 'occ_setup_sensor_attrs' [-Werror,-Wframe-larger-than]
Replace the custom macros for initializing the attributes with a
simpler function call that does not run into this corner case.
Link: https://godbolt.org/z/Wf1Yx76a5
Fixes: 54076cb3b5ff ("hwmon (occ): Add sensor attributes and register hwmon device")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250610092315.2640039-1-arnd@kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
In the fts_read() function, when handling hwmon_pwm_auto_channels_temp,
the code accesses the shared variable data->fan_source[channel] twice
without holding any locks. It is first checked against
FTS_FAN_SOURCE_INVALID, and if the check passes, it is read again
when used as an argument to the BIT() macro.
This creates a Time-of-Check to Time-of-Use (TOCTOU) race condition.
Another thread executing fts_update_device() can modify the value of
data->fan_source[channel] between the check and its use. If the value
is changed to FTS_FAN_SOURCE_INVALID (0xff) during this window, the
BIT() macro will be called with a large shift value (BIT(255)).
A bit shift by a value greater than or equal to the type width is
undefined behavior and can lead to a crash or incorrect values being
returned to userspace.
Fix this by reading data->fan_source[channel] into a local variable
once, eliminating the race condition. Additionally, add a bounds check
to ensure the value is less than BITS_PER_LONG before passing it to
the BIT() macro, making the code more robust against undefined behavior.
This possible bug was found by an experimental static analysis tool
developed by our team.
Fixes: 1c5759d8ce05 ("hwmon: (ftsteutates) Replace fanX_source with pwmX_auto_channels_temp")
Cc: stable@vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
Link: https://lore.kernel.org/r/20250606071640.501262-1-hanguidong02@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
The datasheets for all the fan53555 variants (and clones using the same
interface) define so called soft start times, from enabling the regulator
until at least some percentage of the output (i.e. 92% for the rk860x
types) are available.
The regulator framework supports this with the enable_time property
but currently the fan53555 driver does not define enable_times for any
variant.
I ran into a problem with this while testing the new driver for the
Rockchip NPUs (rocket), which does runtime-pm including disabling and
enabling a rk8602 as needed. When reenabling the regulator while running
a load, fatal hangs could be observed while enabling the associated
power-domain, which the regulator supplies.
Experimentally setting the regulator to always-on, made the issue
disappear, leading to the missing delay to let power stabilize.
And as expected, setting the enable-time to a non-zero value
according to the datasheet also resolved the regulator-issue.
The datasheets in nearly all cases only specify "typical" values,
except for the fan53555 type 08. There both a typical and maximum
value are listed - 40uS apart.
For all typical values I've added 100uS to be on the safe side.
Individual details for the relevant regulators below:
- fan53526:
The datasheet for all variants lists a typical value of 150uS, so
make that 250uS with safety margin.
- fan53555:
types 08 and 18 (unsupported) are given a typical enable time of 135uS
but also a maximum of 175uS so use that value. All the other types only
have a typical time in the datasheet of 300uS, so give a bit margin by
setting it to 400uS.
- rk8600 + rk8602:
Datasheet reports a typical value of 260us, so use 360uS to be safe.
- syr82x + syr83x:
All datasheets report typical soft-start values of 300uS for these
regulators, so use 400uS.
- tcs452x:
Datasheet sadly does not report a soft-start time, so I've not set
an enable-time
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20250606190418.478633-1-heiko@sntech.de
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The gpio-spacemit-k1 driver can be compiled as a module. Add missing
MODULE_DEVICE_TABLE so it can be matched by modalias and automatically
loaded by udev.
Fixes: d00553240ef8 ("gpio: spacemit: add support for K1 SoC")
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Link: https://lore.kernel.org/r/20250613-k1-gpio-of-table-v1-1-9015da8fdfdb@iscas.ac.cn
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
BXT_MIPI_TRANS_VTOTAL must be programmed with vtotal-1
instead of vtotal. Make it so.
Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250314150136.22564-1-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 7b3685c9b38c3097f465efec8b24dbed63258cf6)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
i915_pmu.c may fail to build with GCOV and AutoFDO enabled.
../drivers/gpu/drm/i915/i915_pmu.c:116:3: error: call to '__compiletime_assert_487' declared with 'error' attribute: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1
116 | BUILD_BUG_ON(bit >
| ^
Here is a way to reproduce the issue:
$ git checkout v6.15
$ mkdir build
$ ./scripts/kconfig/merge_config.sh -O build -n -m <(cat <<EOF
CONFIG_DRM=y
CONFIG_PCI=y
CONFIG_DRM_I915=y
CONFIG_PERF_EVENTS=y
CONFIG_DEBUG_FS=y
CONFIG_GCOV_KERNEL=y
CONFIG_GCOV_PROFILE_ALL=y
CONFIG_AUTOFDO_CLANG=y
EOF
)
$ PATH=${PATH}:${HOME}/llvm-20.1.5-x86_64/bin make LLVM=1 O=build \
olddefconfig
$ PATH=${PATH}:${HOME}/llvm-20.1.5-x86_64/bin make LLVM=1 O=build \
CLANG_AUTOFDO_PROFILE=...PATH_TO_SOME_AFDO_PROFILE... \
drivers/gpu/drm/i915/i915_pmu.o
Although not super sure what happened, by reviewing the code, it should
depend on `__builtin_constant_p(bit)` directly instead of assuming
`__builtin_constant_p(config)` makes `bit` a builtin constant.
Also fix a nit, to reuse the `bit` local variable.
Fixes: a644fde77ff7 ("drm/i915/pmu: Change bitmask of enabled events to u32")
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://lore.kernel.org/r/20250612083023.562585-1-tzungbi@kernel.org
(cherry picked from commit 686d773186bf72b739bab7e12eb8665d914676ee)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
If the PHY driver uses another PHY internally (e.g. in case of eUSB2,
repeaters are represented as PHYs), then it would trigger the following
lockdep splat because all PHYs use a single static lockdep key and thus
lockdep can not identify whether there is a dependency or not and
reports a false positive.
Make PHY subsystem use dynamic lockdep keys, assigning each driver a
separate key. This way lockdep can correctly identify dependency graph
between mutexes.
============================================
WARNING: possible recursive locking detected
6.15.0-rc7-next-20250522-12896-g3932f283970c #3455 Not tainted
--------------------------------------------
kworker/u51:0/78 is trying to acquire lock:
ffff0008116554f0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c
but task is already holding lock:
ffff000813c10cf0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&phy->mutex);
lock(&phy->mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by kworker/u51:0/78:
#0: ffff000800010948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x18c/0x5ec
#1: ffff80008036bdb0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1b4/0x5ec
#2: ffff0008094ac8f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x38/0x188
#3: ffff000813c10cf0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c
stack backtrace:
CPU: 0 UID: 0 PID: 78 Comm: kworker/u51:0 Not tainted 6.15.0-rc7-next-20250522-12896-g3932f283970c #3455 PREEMPT
Hardware name: Qualcomm CRD, BIOS 6.0.240904.BOOT.MXF.2.4-00528.1-HAMOA-1 09/ 4/2024
Workqueue: events_unbound deferred_probe_work_func
Call trace:
show_stack+0x18/0x24 (C)
dump_stack_lvl+0x90/0xd0
dump_stack+0x18/0x24
print_deadlock_bug+0x258/0x348
__lock_acquire+0x10fc/0x1f84
lock_acquire+0x1c8/0x338
__mutex_lock+0xb8/0x59c
mutex_lock_nested+0x24/0x30
phy_init+0x4c/0x12c
snps_eusb2_hsphy_init+0x54/0x1a0
phy_init+0xe0/0x12c
dwc3_core_init+0x450/0x10b4
dwc3_core_probe+0xce4/0x15fc
dwc3_probe+0x64/0xb0
platform_probe+0x68/0xc4
really_probe+0xbc/0x298
__driver_probe_device+0x78/0x12c
driver_probe_device+0x3c/0x160
__device_attach_driver+0xb8/0x138
bus_for_each_drv+0x84/0xe0
__device_attach+0x9c/0x188
device_initial_probe+0x14/0x20
bus_probe_device+0xac/0xb0
deferred_probe_work_func+0x8c/0xc8
process_one_work+0x208/0x5ec
worker_thread+0x1c0/0x368
kthread+0x14c/0x20c
ret_from_fork+0x10/0x20
Fixes: 3584f6392f09 ("phy: qcom: phy-qcom-snps-eusb2: Add support for eUSB2 repeater")
Fixes: e2463559ff1d ("phy: amlogic: Add Amlogic AXG PCIE PHY Driver")
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Reported-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/lkml/ZnpoAVGJMG4Zu-Jw@hovoldconsulting.com/
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250605-phy-subinit-v3-1-1e1e849e10cd@oss.qualcomm.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|