Age | Commit message (Collapse) | Author |
|
The following trace can be seen if a device is being unregistered while
its number of channels are being modified.
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 3754 at kernel/locking/mutex.c:564 __mutex_lock+0xc8a/0x1120
CPU: 3 UID: 0 PID: 3754 Comm: ethtool Not tainted 6.13.0-rc6+ #771
RIP: 0010:__mutex_lock+0xc8a/0x1120
Call Trace:
<TASK>
ethtool_check_max_channel+0x1ea/0x880
ethnl_set_channels+0x3c3/0xb10
ethnl_default_set_doit+0x306/0x650
genl_family_rcv_msg_doit+0x1e3/0x2c0
genl_rcv_msg+0x432/0x6f0
netlink_rcv_skb+0x13d/0x3b0
genl_rcv+0x28/0x40
netlink_unicast+0x42e/0x720
netlink_sendmsg+0x765/0xc20
__sys_sendto+0x3ac/0x420
__x64_sys_sendto+0xe0/0x1c0
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This is because unregister_netdevice_many_notify might run before the
rtnl lock section of ethnl operations, eg. set_channels in the above
example. In this example the rss lock would be destroyed by the device
unregistration path before being used again, but in general running
ethnl operations while dismantle has started is not a good idea.
Fix this by denying any operation on devices being unregistered. A check
was already there in ethnl_ops_begin, but not wide enough.
Note that the same issue cannot be seen on the ioctl version
(__dev_ethtool) because the device reference is retrieved from within
the rtnl lock section there. Once dismantle started, the net device is
unlisted and no reference will be found.
Fixes: dde91ccfa25f ("ethtool: do not perform operations on net devices being unregistered")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250116092159.50890-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot complained that free_netdev() was calling netif_napi_del()
after dev->lock mutex has been destroyed.
This fires a warning for CONFIG_DEBUG_MUTEXES=y builds.
Move mutex_destroy(&dev->lock) near the end of free_netdev().
[1]
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 0 PID: 5971 at kernel/locking/mutex.c:564 __mutex_lock_common kernel/locking/mutex.c:564 [inline]
WARNING: CPU: 0 PID: 5971 at kernel/locking/mutex.c:564 __mutex_lock+0xdac/0xee0 kernel/locking/mutex.c:735
Modules linked in:
CPU: 0 UID: 0 PID: 5971 Comm: syz-executor Not tainted 6.13.0-rc7-syzkaller-01131-g8d20dcda404d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
RIP: 0010:__mutex_lock_common kernel/locking/mutex.c:564 [inline]
RIP: 0010:__mutex_lock+0xdac/0xee0 kernel/locking/mutex.c:735
Code: 0f b6 04 38 84 c0 0f 85 1a 01 00 00 83 3d 6f 40 4c 04 00 75 19 90 48 c7 c7 60 84 0a 8c 48 c7 c6 00 85 0a 8c e8 f5 dc 91 f5 90 <0f> 0b 90 90 90 e9 c7 f3 ff ff 90 0f 0b 90 e9 29 f8 ff ff 90 0f 0b
RSP: 0018:ffffc90003317580 EFLAGS: 00010246
RAX: ee0f97edaf7b7d00 RBX: ffff8880299f8cb0 RCX: ffff8880323c9e00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003317710 R08: ffffffff81602ac2 R09: 1ffff110170c519a
R10: dffffc0000000000 R11: ffffed10170c519b R12: 0000000000000000
R13: 0000000000000000 R14: 1ffff92000662ec4 R15: dffffc0000000000
FS: 000055557a046500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd581d46ff8 CR3: 000000006f870000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
netdev_lock include/linux/netdevice.h:2691 [inline]
__netif_napi_del include/linux/netdevice.h:2829 [inline]
netif_napi_del include/linux/netdevice.h:2848 [inline]
free_netdev+0x2d9/0x610 net/core/dev.c:11621
netdev_run_todo+0xf21/0x10d0 net/core/dev.c:11189
nsim_destroy+0x3c3/0x620 drivers/net/netdevsim/netdev.c:1028
__nsim_dev_port_del+0x14b/0x1b0 drivers/net/netdevsim/dev.c:1428
nsim_dev_port_del_all drivers/net/netdevsim/dev.c:1440 [inline]
nsim_dev_reload_destroy+0x28a/0x490 drivers/net/netdevsim/dev.c:1661
nsim_drv_remove+0x58/0x160 drivers/net/netdevsim/dev.c:1676
device_remove drivers/base/dd.c:567 [inline]
Fixes: 1b23cdbd2bbc ("net: protect netdev->napi_list with netdev_lock()")
Reported-by: syzbot+85ff1051228a04613a32@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/678add43.050a0220.303755.0016.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250117224626.1427577-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
W=1 builds with gcc 14.2.1 report:
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:4193:32: error: ‘%s’ directive output may be truncated writing up to 31 bytes into a region of size 27 [-Werror=format-truncation=]
4193 | "/pkg %s", buf);
It's upset that we let buf be full length but then we use 5
characters for "/pkg ".
The builds is also clear with clang version 19.1.5 now.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250117183726.1481524-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The number of SYN + MPC retransmissions before falling back to TCP was
fixed to 2. This is certainly a good default value, but having a fixed
number can be a problem in some environments.
The current behaviour means that if all packets are dropped, there will
be:
- The initial SYN + MPC
- 2 retransmissions with MPC
- The next ones will be without MPTCP.
So typically ~3 seconds before falling back to TCP. In some networks
where some temporally blackholes are unfortunately frequent, or when a
client tries to initiate connections while the network is not ready yet,
this can cause new connections not to have MPTCP connections.
In such environments, it is now possible to increase the number of SYN
retransmissions with MPTCP options to make sure MPTCP is used.
Interesting values are:
- 0: the first retransmission will be done without MPTCP options: quite
aggressive, but also a higher risk of detecting false-positive
MPTCP blackholes.
- >= 128: all SYN retransmissions will keep the MPTCP options: back to
the < 6.12 behaviour.
The default behaviour is not changed here.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250117-net-next-mptcp-syn_retrans_before_tcp_fallback-v1-1-ab4b187099b0@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Shinas Rasheed says:
====================
Fix race conditions in ndo_get_stats64
Fix race conditions in ndo_get_stats64 by storing tx/rx stats
locally and not availing per queue resources which could be torn
down during interface stop. Also remove stats fetch from
firmware which is currently unnecessary
====================
Link: https://patch.msgid.link/20250117094653.2588578-1-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update tx/rx stats locally, so that ndo_get_stats64()
can use that and not rely on per queue resources to obtain statistics.
The latter used to cause race conditions when the device stopped.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-5-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The firmware stats fetch call that happens in ndo_get_stats64()
is currently not required, and causes a warning to issue.
The corresponding warn log for the PF is given below:
[ 123.316837] ------------[ cut here ]------------
[ 123.316840] Voluntary context switch within RCU read-side critical section!
[ 123.316917] pc : rcu_note_context_switch+0x2e4/0x300
[ 123.316919] lr : rcu_note_context_switch+0x2e4/0x300
[ 123.316947] Call trace:
[ 123.316949] rcu_note_context_switch+0x2e4/0x300
[ 123.316952] __schedule+0x84/0x584
[ 123.316955] schedule+0x38/0x90
[ 123.316956] schedule_timeout+0xa0/0x1d4
[ 123.316959] octep_send_mbox_req+0x190/0x230 [octeon_ep]
[ 123.316966] octep_ctrl_net_get_if_stats+0x78/0x100 [octeon_ep]
[ 123.316970] octep_get_stats64+0xd4/0xf0 [octeon_ep]
[ 123.316975] dev_get_stats+0x4c/0x114
[ 123.316977] dev_seq_printf_stats+0x3c/0x11c
[ 123.316980] dev_seq_show+0x1c/0x40
[ 123.316982] seq_read_iter+0x3cc/0x4e0
[ 123.316985] seq_read+0xc8/0x110
[ 123.316987] proc_reg_read+0x9c/0xec
[ 123.316990] vfs_read+0xc8/0x2ec
[ 123.316993] ksys_read+0x70/0x100
[ 123.316995] __arm64_sys_read+0x20/0x30
[ 123.316997] invoke_syscall.constprop.0+0x7c/0xd0
[ 123.317000] do_el0_svc+0xb4/0xd0
[ 123.317002] el0_svc+0xe8/0x1f4
[ 123.317005] el0t_64_sync_handler+0x134/0x150
[ 123.317006] el0t_64_sync+0x17c/0x180
[ 123.317008] ---[ end trace 63399811432ab69b ]---
Fixes: c3fad23cdc06 ("octeon_ep_vf: add support for ndo ops")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-4-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update tx/rx stats locally, so that ndo_get_stats64()
can use that and not rely on per queue resources to obtain statistics.
The latter used to cause race conditions when the device stopped.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-3-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The firmware stats fetch call that happens in ndo_get_stats64()
is currently not required, and causes a warning to issue.
The warn log is given below:
[ 123.316837] ------------[ cut here ]------------
[ 123.316840] Voluntary context switch within RCU read-side critical section!
[ 123.316917] pc : rcu_note_context_switch+0x2e4/0x300
[ 123.316919] lr : rcu_note_context_switch+0x2e4/0x300
[ 123.316947] Call trace:
[ 123.316949] rcu_note_context_switch+0x2e4/0x300
[ 123.316952] __schedule+0x84/0x584
[ 123.316955] schedule+0x38/0x90
[ 123.316956] schedule_timeout+0xa0/0x1d4
[ 123.316959] octep_send_mbox_req+0x190/0x230 [octeon_ep]
[ 123.316966] octep_ctrl_net_get_if_stats+0x78/0x100 [octeon_ep]
[ 123.316970] octep_get_stats64+0xd4/0xf0 [octeon_ep]
[ 123.316975] dev_get_stats+0x4c/0x114
[ 123.316977] dev_seq_printf_stats+0x3c/0x11c
[ 123.316980] dev_seq_show+0x1c/0x40
[ 123.316982] seq_read_iter+0x3cc/0x4e0
[ 123.316985] seq_read+0xc8/0x110
[ 123.316987] proc_reg_read+0x9c/0xec
[ 123.316990] vfs_read+0xc8/0x2ec
[ 123.316993] ksys_read+0x70/0x100
[ 123.316995] __arm64_sys_read+0x20/0x30
[ 123.316997] invoke_syscall.constprop.0+0x7c/0xd0
[ 123.317000] do_el0_svc+0xb4/0xd0
[ 123.317002] el0_svc+0xe8/0x1f4
[ 123.317005] el0t_64_sync_handler+0x134/0x150
[ 123.317006] el0t_64_sync+0x17c/0x180
[ 123.317008] ---[ end trace 63399811432ab69b ]---
Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-2-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sean Anderson says:
====================
net: xilinx: axienet: Report an error for bad coalesce settings
====================
Link: https://patch.msgid.link/20250116232954.2696930-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Instead of silently ignoring invalid/unsupported settings, report an
error. Additionally, relax the check for non-zero usecs to apply only
when it will be used (i.e. when frames != 1).
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250116232954.2696930-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Instead of using literals, add some symbolic constants for the IRQ delay
timer calculation.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250116232954.2696930-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update header inclusions to follow IWYU (Include What You Use)
principle.
In this case replace of_gpio.h, which is subject to remove by the GPIOLIB
subsystem, with the respective headers that are being used by the driver.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20250116153119.148097-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
of_gpio.h is deprecated and subject to remove. The drivers in question
don't use it, simply remove the unused header.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The functions th1520_mbox_suspend_noirq and th1520_mbox_resume_noirq are
intended to save and restore the interrupt mask registers in the MBOX
ICU0. However, the array used to store these registers was incorrectly
sized, leading to memory corruption when accessing all four registers.
This commit corrects the array size to accommodate all four interrupt
mask registers, preventing memory corruption during suspend and resume
operations.
Fixes: 5d4d263e1c6b ("mailbox: Introduce support for T-head TH1520 Mailbox driver")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/a99e72be-8490-4960-ad26-cbfef6af238f@stanley.mountain/
Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
struct zynqmp_ipi_pdata __percpu *pdata is not a per-cpu variable,
so it should not be annotated with __percpu annotation.
Remove invalid __percpu annotation to fix several
zynqmp-ipi-mailbox.c:920:15: warning: incorrect type in assignment (different address spaces)
zynqmp-ipi-mailbox.c:920:15: expected struct zynqmp_ipi_pdata [noderef] __percpu *pdata
zynqmp-ipi-mailbox.c:920:15: got void *
zynqmp-ipi-mailbox.c:927:56: warning: incorrect type in argument 3 (different address spaces)
zynqmp-ipi-mailbox.c:927:56: expected unsigned int [usertype] *out_value
zynqmp-ipi-mailbox.c:927:56: got unsigned int [noderef] __percpu *
...
and several
drivers/mailbox/zynqmp-ipi-mailbox.c:924:9: warning: dereference of noderef expression
...
sparse warnings.
There were no changes in the resulting object file.
Cc: stable@vger.kernel.org
Fixes: 6ffb1635341b ("mailbox: zynqmp: handle SGI for shared IPI")
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Michal Simek <michal.simek@amd.com>
Reviewed-by: Tanmay Shah <tanmay.shah@amd.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
Add entry for the Samsung Exynos mailbox driver.
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
The Samsung Exynos mailbox controller, used on Google GS101 SoC, has 16
flag bits for hardware interrupt generation and a shared register for
passing mailbox messages. When the controller is used by the
ACPM interface the shared register is ignored and the mailbox controller
acts as a doorbell. The controller just raises the interrupt to APM
after the ACPM interface has written the message to SRAM.
Add support for the Samsung Exynos mailbox controller.
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
Add bindings for the Samsung Exynos Mailbox Controller.
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
IPQ5424 mailbox do not have clock support and reuses msm8994_apcs_data.
Signed-off-by: Gokul Sriram Palanisamy <quic_gokulsri@quicinc.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
Add compatible for the Qualcomm IPQ5424 APCS block.
Signed-off-by: Gokul Sriram Palanisamy <quic_gokulsri@quicinc.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
For some SoCs, boot firmware is using the same IPCC instance used
by Linux and it has kept CLEAR_ON_RECV_RD set which basically means
interrupt pending registers are cleared when RECV_ID is read and the
register automatically updates to the next pending interrupt/client
status based on priority.
Clear the CLEAR_ON_RECV_RD if it is set from the boot firmware.
Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
Add a mailbox controller driver for the Microchip Inter-processor
Communication (IPC), which is used to send and receive data between
processors.
The driver uses the RISC-V Supervisor Binary Interface (SBI) to
communicate with software running in machine mode (M-mode) to access
the IPC hardware block.
Additional details on the Microchip vendor extension and the IPC
function IDs described in the driver can be found in the following
documentation:
https://github.com/linux4microchip/microchip-sbi-ecall-extension
This SBI interface in this driver is compatible with the Mi-V Inter-hart
Communication (IHC) IP.
Transmitting and receiving data through the mailbox framework is done
through struct mchp_ipc_msg.
Signed-off-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
Add a dt-binding for the Microchip Inter-Processor Communication (IPC)
mailbox controller.
Signed-off-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
The Tegra RCE (Camera) driver expects the mailbox to be empty before
processing the IVC messages. On RT kernel, the threads processing the
IVC messages (which are invoked after `mbox_chan_received_data()` is
called) may be on a different CPU or running with a higher priority
than the HSP interrupt handler thread. This can cause it to act on the
message before the mailbox gets cleared in the HSP interrupt handler
resulting in a loss of IVC notification.
Fix this by clearing the mailbox data register before calling
`mbox_chan_received_data()`.
Fixes: 8f585d14030d ("mailbox: tegra-hsp: Add tegra_hsp_sm_ops")
Fixes: 74c20dd0f892 ("mailbox: tegra-hsp: Add 128-bit shared mailbox support")
Cc: stable@vger.kernel.org
Signed-off-by: Pekka Pessi <ppessi@nvidia.com>
Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
This code accidentally checks ->ctrl_base instead of ->mbox_base so the
error handling can never be triggered.
Fixes: a4123ffab9ec ("mailbox: mpfs: support new, syscon based, devicetree configuration")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
The devm_ioremap() function doesn't return error pointers, it returns
NULL. Update the error checking to match.
Fixes: 5d4d263e1c6b ("mailbox: Introduce support for T-head TH1520 Mailbox driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Michal Wilczynski <m.wilczynski@samsung.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix regression in GFP output in trace events
It was reported that the GFP flags in trace events went from human
readable to just their hex values:
gfp_flags=GFP_HIGHUSER_MOVABLE|__GFP_COMP to gfp_flags=0x140cca
This was caused by a change that added the use of enums in calculating
the GFP flags.
As defines get translated into their values in the trace event format
files, the user space tooling could easily convert the GFP flags into
their symbols via the __print_flags() helper macro.
The problem is that enums do not get converted, and the names of the
enums show up in the format files and user space tooling cannot
translate them.
Add TRACE_DEFINE_ENUM() around the enums used for GFP flags which is
the tracing infrastructure macro that informs the tracing subsystem
what the values for enums and it can then expose that to user space"
* tag 'trace-v6.13-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: gfp: Fix the GFP enum values shown for user space tracing tools
|
|
Previously there were two definitions of struct of_pci_range: one in
include/linux/of_address.h and another local to drivers/pci/of_property.c.
Rename the local struct of_pci_range to of_pci_range_entry to avoid
confusion.
Link: https://lore.kernel.org/r/20250117161037.643953-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
|
|
Add a new field called 'parent_bus_addr' to struct of_pci_range to use
when retrieving parent bus address information.
Refer to the diagram below to better understand that the bus fabric in
some systems (like i.MX8QXP) does not always use a 1:1 address map
between input and output.
Currently, many controller drivers use the cpu_addr_fixup() callback
that would often hardcode address translation directly in the code, e.g.,
"cpu_addr & CDNS_PLAT_CPU_TO_BUS_ADDR" or "cpu_addr + BUS_IATU_OFFSET",
etc., even though those translations *should* be described via DT.
However, the cpu_addr_fixup() can be eliminated if DT correctly reflects
hardware behavior and drivers use 'parent_bus_addr' in struct of_pci_range.
┌─────────┐ ┌────────────┐
┌─────┐ │ │ IA: 0x8ff8_0000 │ │
│ CPU ├───►│ ┌────►├─────────────────┐ │ PCI │
└─────┘ │ │ │ IA: 0x8ff0_0000 │ │ │
CPU Addr │ │ ┌─►├─────────────┐ │ │ Controller │
0x7ff8_0000─┼───┘ │ │ │ │ │ │
│ │ │ │ │ │ │ PCI Addr
0x7ff0_0000─┼──────┘ │ │ └──► IOSpace ─┼────────────►
│ │ │ │ │ 0
0x7000_0000─┼────────►├─────────┐ │ │ │
└─────────┘ │ └──────► CfgSpace ─┼────────────►
BUS Fabric │ │ │ 0
│ │ │
└──────────► MemSpace ─┼────────────►
IA: 0x8000_0000 │ │ 0x8000_0000
└────────────┘
bus@5f000000 {
compatible = "simple-bus";
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x80000000 0x0 0x70000000 0x10000000>;
pcie@5f010000 {
compatible = "fsl,imx8q-pcie";
reg = <0x5f010000 0x10000>, <0x8ff00000 0x80000>;
reg-names = "dbi", "config";
#address-cells = <3>;
#size-cells = <2>;
device_type = "pci";
bus-range = <0x00 0xff>;
ranges = <0x81000000 0 0x00000000 0x8ff80000 0 0x00010000>,
<0x82000000 0 0x80000000 0x80000000 0 0x0ff00000>;
...
};
};
In the diagram above, the 'parent_bus_addr' field in struct of_pci_range
can indicate internal address (IA) address information.
Link: https://lore.kernel.org/r/20241119-pci_fixup_addr-v8-1-c4bfa5193288@nxp.com
Signed-off-by: Frank Li <Frank.Li@nxp.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
|
|
Add i.MX8MQ, i.MX8Q and i.MX95 PCIe suspend/resume support.
Link: https://lore.kernel.org/r/20241126075702.4099164-10-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
|
|
Call common DWC suspend/resume function. Use DWC common iATU method to
send out PME_TURN_OFF message.
In old DWC implementations, PCIE_ATU_INHIBIT_PAYLOAD in iATU Ctrl2 register
is reserved, so the generic DWC implementation of sending the PME_Turn_Off
message using a dummy MMIO write cannot be used. Use the previous method to
kick off PME_TURN_OFF message for these platforms.
The System Reset Control (SRC) interface is used to toggle 'turnoff_reset'
to send PME_TURN_OFF and since the DWC implementation is used, it is not
needed now.
Replace the imx_pcie_stop_link() and imx_pcie_host_exit() by
dw_pcie_suspend_noirq() in imx_pcie_suspend_noirq().
Since dw_pcie_suspend_noirq() already does these, see below call stack:
dw_pcie_suspend_noirq()
dw_pcie_stop_link()
imx_pcie_stop_link()
pci->pp.ops->deinit()
imx_pcie_host_exit()
Replace the imx_pcie_host_init(), dw_pcie_setup_rc() and
imx_pcie_start_link() by dw_pcie_resume_noirq() in imx_pcie_resume_noirq().
Since dw_pcie_resume_noirq() already does these, see below call stack:
dw_pcie_resume_noirq()
pci->pp.ops->init()
imx_pcie_host_init()
dw_pcie_setup_rc()
dw_pcie_start_link()
imx_pcie_start_link(;
Link: https://lore.kernel.org/r/20241126075702.4099164-9-hongxing.zhu@nxp.com
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
!CONFIG_PCIE_DW_HOST
Previously pcie-designware.h declared dw_pcie_suspend_noirq() and
dw_pcie_resume_noirq() unconditionally, even though they were only
implemented when CONFIG_PCIE_DW_HOST was defined.
Add no-op stubs for them when CONFIG_PCIE_DW_HOST is not defined so
drivers that support both Root Complex and Endpoint modes don't need
Link: https://lore.kernel.org/r/20250117213810.GA656803@bhelgaas
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
pci_intx() is a hybrid function which can sometimes be managed through
devres. This hybrid nature is undesirable.
Since all users of pci_intx() have by now been ported either to
always-managed pcim_intx() or never-managed pci_intx_unmanaged(), the
devres functionality can be removed from pci_intx().
Consequently, pci_intx_unmanaged() is now redundant, because pci_intx()
itself is now unmanaged.
Remove the devres functionality from pci_intx(). Have all users of
pci_intx_unmanaged() call pci_intx(). Remove pci_intx_unmanaged().
Link: https://lore.kernel.org/r/20241209130632.132074-13-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
|
|
pci_intx() is a hybrid function which can sometimes be managed through
devres. To remove this hybrid nature from pci_intx(), it is necessary to
port users to either an always-managed or a never-managed version.
broadcom/bnx2x and brocade/bna enable their PCI devices with
pci_enable_device(). Thus, they need the never-managed version.
Replace pci_intx() with pci_intx_unmanaged().
Link: https://lore.kernel.org/r/20241209130632.132074-5-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
|
|
pci_intx() is a hybrid function which can sometimes be managed through
devres. To remove this hybrid nature from pci_intx(), it is necessary to
port users to either an always-managed or a never-managed version.
All users of amd_mp2_pci_remove(), where pci_intx() is used, call
pcim_enable_device(), which is why the driver needs the always-managed
version.
Replace pci_intx() with pcim_intx().
Link: https://lore.kernel.org/r/20241209130632.132074-12-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
|
|
pci_intx() is a hybrid function which can sometimes be managed through
devres. To remove this hybrid nature from pci_intx(), it is necessary to
port users to either an always-managed or a never-managed version.
qtnfmac enables its PCI device with pcim_enable_device(). Thus, it needs
the always-managed version.
Replace pci_intx() with pcim_intx().
Link: https://lore.kernel.org/r/20241209130632.132074-11-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Kalle Valo <kvalo@kernel.org>
|
|
pci_intx() is a hybrid function which can sometimes be managed through
devres. To remove this hybrid nature from pci_intx(), it is necessary to
port users to either an always-managed or a never-managed version.
All users in ata enable their PCI devices with pcim_enable_device(). Thus,
they need the always-managed version.
Replace pci_intx() with pcim_intx().
Link: https://lore.kernel.org/r/20241209130632.132074-10-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Acked-by: Niklas Cassel <cassel@kernel.org>
|
|
pci_intx() is a hybrid function which can sometimes be managed through
devres. To remove this hybrid nature from pci_intx(), it is necessary to
port users to either an always-managed or a never-managed version.
MSI sets up its own separate devres callback implicitly in
pcim_setup_msi_release(). This callback ultimately uses pci_intx(), which
is problematic since the callback runs on driver detach.
That problem has last been described here:
https://lore.kernel.org/all/ee44ea7ac760e73edad3f20b30b4d2fff66c1a85.camel@redhat.com/
Replace the call to pci_intx() with one to the never-managed version
pci_intx_unmanaged().
Link: https://lore.kernel.org/r/20241209130632.132074-9-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
|
pci_intx() is a hybrid function which can sometimes be managed through
devres. To remove this hybrid nature from pci_intx(), it is necessary to
port users to either an always-managed or a never-managed version.
vfio enables its PCI device with pci_enable_device(). Thus, it needs the
never-managed version.
Replace pci_intx() with pci_intx_unmanaged().
Link: https://lore.kernel.org/r/20241209130632.132074-8-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
|
|
pci_intx() is a hybrid function which can sometimes be managed through
devres. To remove this hybrid nature from pci_intx(), it is necessary to
port users to either an always-managed or a never-managed version.
cardreader/rtsx_pcr.c and tifm_7xx1.c enable their PCI devices with
pci_enable_device(). Thus, they need the never-managed version.
Replace pci_intx() with pci_intx_unmanaged().
Link: https://lore.kernel.org/r/20241209130632.132074-7-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
pci_intx() is a hybrid function which can sometimes be managed through
devres. To remove this hybrid nature from pci_intx(), it is necessary to
port users to either an always-managed or a never-managed version.
hw/amd and how/intel enable their PCI devices with pci_enable_device().
Thus, they need the never-managed version.
Replace pci_intx() with pci_intx_unmanaged().
Link: https://lore.kernel.org/r/20241209130632.132074-6-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> # ntb_hw_amd.c
Acked-by: Dave Jiang <dave.jiang@intel.com> # ntb_hw_gen1.c
|
|
pci_intx() is a hybrid function which can sometimes be managed through
devres. To remove this hybrid nature from pci_intx(), it is necessary to
port users to either an always-managed or a never-managed version.
xen enables its PCI device with pci_enable_device(). Thus, it needs the
never-managed version.
Replace pci_intx() with pci_intx_unmanaged().
Link: https://lore.kernel.org/r/20241209130632.132074-4-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Juergen Gross <jgross@suse.com>
|
|
pci_intx() is a hybrid function which sometimes performs devres operations,
depending on whether pcim_enable_device() has been used to enable the
pci_dev. This sometimes-managed nature of the function is problematic.
Notably, it causes the function to allocate under some circumstances which
makes it unusable from interrupt context.
Export pcim_intx() (which is always managed) and rename __pcim_intx()
(which is never managed) to pci_intx_unmanaged() and export it as well.
Then all callers of pci_intx() can be ported to the version they need,
depending whether they use pci_enable_device() or pcim_enable_device().
Link: https://lore.kernel.org/r/20241209130632.132074-3-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
|
|
It's safe to send PME_TURN_OFF message regardless of whether the link is up
or down, so don't test the LTSSM state before sending the PME_TURN_OFF
message.
Only print an error message when the LTSSM is not in DETECT or POLL. There
shouldn't be an error when no Endpoint is connected at all.
Link: https://lore.kernel.org/r/20241210081557.163555-3-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
|
|
In some of the powerpc platforms, event group testcase fails as below:
# perf test -v 'Event groups'
69: Event groups :
--- start ---
test child forked, pid 9765
Using CPUID 0x00820200
Using hv_24x7 for uncore pmu event
0x0 0x0, 0x0 0x0, 0x0 0x0: Fail
0x0 0x0, 0x0 0x0, 0x1 0x3: Pass
The testcase creates various combinations of hw, sw and uncore
PMU events and verify group creation succeeds or fails as expected.
This tests one of the limitation in perf where it doesn't allow
creating a group of events from different hw PMUs.
The testcase starts a leader event and opens two sibling events.
The combination the fails is three hardware events in a group.
"0x0 0x0, 0x0 0x0, 0x0 0x0: Fail"
Type zero and config zero which translates to PERF_TYPE_HARDWARE
and PERF_COUNT_HW_CPU_CYCLE. There is event constraint in powerpc
that events using same counter cannot be programmed in a group.
Here there is one alternative event for cycles, hence one leader
and only one sibling event can go in as a group.
if all three events (leader and two sibling events), are hardware
events, use instructions as one of the sibling event. Since
PERF_COUNT_HW_INSTRUCTIONS is a generic hardware event and present
in all architectures, use this as third event.
Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20250110094620.94976-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The comparison function cmpworker() violates the C standard's
requirements for qsort() comparison functions, which mandate symmetry
and transitivity:
Symmetry: If x < y, then y > x.
Transitivity: If x < y and y < z, then x < z.
In its current implementation, cmpworker() incorrectly returns 0 when
w1->tid < w2->tid, which breaks both symmetry and transitivity. This
violation causes undefined behavior, potentially leading to issues such
as memory corruption in glibc [1].
Fix the issue by returning -1 when w1->tid < w2->tid, ensuring
compliance with the C standard and preventing undefined behavior.
Link: https://www.qualys.com/2024/01/30/qsort.txt [1]
Fixes: 121dd9ea0116 ("perf bench: Add epoll parallel epoll_wait benchmark")
Cc: stable@vger.kernel.org
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250116110842.4087530-1-visitorckw@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
An evsel idx may not be stable due to sorting, evlist removal,
etc. Try to reduce it being part of APIs by explicitly passing the
evsel in annotate code. Internally the code just reads evsel->core.idx
so behavior is unchanged.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Chen Ni <nichen@iscas.ac.cn>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20250117181848.690474-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
On the i.MX8QM, PCIe link can't be re-established again in
dw_pcie_resume_noirq(), if the LTSSM_EN bit is not cleared
properly in dw_pcie_suspend_noirq().
So, add dw_pcie_stop_link() to dw_pcie_suspend_noirq() to fix
this issue and to align the suspend/resume functions since there
is dw_pcie_start_link() in dw_pcie_resume_noirq() already.
Fixes: 4774faf854f5 ("PCI: dwc: Implement generic suspend/resume functionality")
Link: https://lore.kernel.org/r/20241210081557.163555-2-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
|
|
The Root Complex specific device tree binding for pcie-dw-rockchip has the
'sys' interrupt marked as required.
The driver requests the 'sys' IRQ unconditionally, and errors out if not
provided.
Thus, we can unconditionally set 'use_linkup_irq', so dw_pcie_host_init()
doesn't wait for the link to come up.
This will skip the wait for link up (since the bus will be enumerated once
the link up IRQ is triggered), which reduces the bootup time.
Link: https://lore.kernel.org/r/20250113-rockchip-no-wait-v1-1-25417f37b92f@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|