Age | Commit message (Collapse) | Author |
|
Extend the "rdma sys" command to display whether RDMA
monitoring is supported.
RDMA monitoring is not supported in mlx4 because it does
not use the ib_device_set_netdev() API, which sends the
RDMA events.
Example output for kernel where monitoring is supported:
$ rdma sys show
netns shared privileged-qkey off monitor on copy-on-fork on
Example output for kernel where monitoring is not supported:
$ rdma sys show
netns shared privileged-qkey off monitor off copy-on-fork on
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909173025.30422-8-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Introduce a new netlink command to allow rdma event monitoring.
The rdma events supported now are IB device
registration/unregistration and net device attachment/detachment.
Example output of rdma monitor and the commands which trigger
the events:
$ rdma monitor
$ rmmod mlx5_ib
[UNREGISTER] dev 1 rocep8s0f1
[UNREGISTER] dev 0 rocep8s0f0
$ modprobe mlx5_ib
[REGISTER] dev 2 mlx5_0
[NETDEV_ATTACH] dev 2 mlx5_0 port 1 netdev 4 eth2
[REGISTER] dev 3 mlx5_1
[NETDEV_ATTACH] dev 3 mlx5_1 port 1 netdev 5 eth3
$ devlink dev eswitch set pci/0000:08:00.0 mode switchdev
[UNREGISTER] dev 2 rocep8s0f0
[REGISTER] dev 4 mlx5_0
[NETDEV_ATTACH] dev 4 mlx5_0 port 30 netdev 4 eth2
$ echo 4 > /sys/class/net/eth2/device/sriov_numvfs
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 2 netdev 7 eth4
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 3 netdev 8 eth5
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 4 netdev 9 eth6
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 5 netdev 10 eth7
[REGISTER] dev 5 mlx5_0
[NETDEV_ATTACH] dev 5 mlx5_0 port 1 netdev 11 eth8
[REGISTER] dev 6 mlx5_0
[NETDEV_ATTACH] dev 6 mlx5_0 port 1 netdev 12 eth9
[REGISTER] dev 7 mlx5_0
[NETDEV_ATTACH] dev 7 mlx5_0 port 1 netdev 13 eth10
[REGISTER] dev 8 mlx5_0
[NETDEV_ATTACH] dev 8 mlx5_0 port 1 netdev 14 eth11
$ echo 0 > /sys/class/net/eth2/device/sriov_numvfs
[UNREGISTER] dev 5 rocep8s0f0v0
[UNREGISTER] dev 6 rocep8s0f0v1
[UNREGISTER] dev 7 rocep8s0f0v2
[UNREGISTER] dev 8 rocep8s0f0v3
[NETDEV_DETACH] dev 4 rdmap8s0f0 port 2
[NETDEV_DETACH] dev 4 rdmap8s0f0 port 3
[NETDEV_DETACH] dev 4 rdmap8s0f0 port 4
[NETDEV_DETACH] dev 4 rdmap8s0f0 port 5
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909173025.30422-7-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The IB layer provides a common interface to store and get net
devices associated to an IB device port (ib_device_set_netdev()
and ib_device_get_netdev()).
Previously, mlx5_ib stored and managed the associated net devices
internally.
Replace internal net device management in mlx5_ib with
ib_device_set_netdev() when attaching/detaching a net device and
ib_device_get_netdev() when retrieving the net device.
Export ib_device_get_netdev().
For mlx5 representors/PFs/VFs and lag creation we replace the netdev
assignments with the IB set/get netdev functions.
In active-backup mode lag the active slave net device is stored in the
lag itself. To assure the net device stored in a lag bond IB device is
the active slave we implement the following:
- mlx5_core: when modifying the slave of a bond we send the internal driver event
MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE.
- mlx5_ib: when catching the event call ib_device_set_netdev()
This patch also ensures the correct IB events are sent in switchdev lag.
While at it, when in multiport eswitch mode, only a single IB device is
created for all ports. The said IB device will receive all netdev events
of its VFs once loaded, thus to avoid overwriting the mapping of PF IB
device to PF netdev, ignore NETDEV_REGISTER events if the ib device has
already been mapped to a netdev.
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909173025.30422-6-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The caller of ib_device_get_netdev() relies on its result to accurately
match a given netdev with the ib device associated netdev.
ib_device_get_netdev returns NULL when the IB device associated
netdev is unregistering, preventing the caller of matching netdevs properly.
Thus, remove this optimization and return the netdev even if
it is undergoing unregistration, allowing matching by the caller.
This change ensures proper netdev matching and reference count handling
by the caller of ib_device_get_netdev/ib_device_set_netdev API.
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909173025.30422-5-michaelgur@nvidia.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
phys_port_cnt of the IB device must be initialized before calling
ib_device_set_netdev().
Previously, phys_port_cnt was initialized in the mlx5_ib init function.
Remove this initialization to allow setting it separately, providing
the flexibility to call ib_device_set_netdev before registering the
IB device.
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909173025.30422-4-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Report the upper device's state as the RDMA port state only in RoCE LAG or
switchdev LAG.
Fixes: 27f9e0ccb6da ("net/mlx5: Lag, Add single RDMA device in multiport mode")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909173025.30422-3-michaelgur@nvidia.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Check if RoCE LAG is active before calling the LAG layer for netdev.
This clarifies if LAG is active. No behavior changes with this patch.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909173025.30422-2-michaelgur@nvidia.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Consider also the query_vuid cap before enabling the data_direct
functionality.
This may prevent a syndrome from the FW in case the query_vuid command
is not supported. (e.g. migratable VF)
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Gal Shalom <galshalom@nvidia.com>
Link: https://patch.msgid.link/274c4f6f1ac0b1078243dd296695a49dbe58e7d1.1725907637.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When running over new FW that supports the new memory scheme ODP, set
the cap in the FW to signal the FW we are working in the new scheme.
In the memory scheme ODP the per_transport_service capabilities are RO
for the driver so we skip their setting.
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909100504.29797-9-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
- Remove a double include (Lucas)
- Fix null checks and UAF (Brost)
- Fix access_ok check in user_fence_create (Nirmoy)
- Fix compat IS_DISPLAY_STEP() range (Jani)
- OA fix (Ashutosh)
- Fixes in show_meminfo (Auld)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZuL-sORu54zfz1Lf@intel.com
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
An off-by-one fix for the CMA DMA-buf heap, An init fix for nouveau, a
config dependency fix for stm, a syncobj leak fix, and two iommu fixes
for tegra and rockchip.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240912-phenomenal-upbeat-grouse-a26781@houat
|
|
mlx5e_free_rq previously cleaned resources in an order that was not the
reverse of the resource allocation order in mlx5e_alloc_rq.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-16-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When SHAMPO can't identify the protocol/header of a packet, it will
yield a packet that is not split - all the packet is in the data part.
Count this value in packets and bytes.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-15-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new command status MLX5_CMD_STAT_NOT_READY to handle cases
where the firmware is not ready.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20240911201757.1505453-14-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
SFs didn't allow to configure IRQ affinity for its vectors. Allow users
to configure the affinity of the SFs irqs.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20240911201757.1505453-13-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sync reset request is nacked by the driver when PCIe bridge connected to
mlx5 device has HotPlug interrupt enabled. However, when using reset
method of hot reset this check can be skipped as Hotplug is supported on
this reset method.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-12-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On device that supports sync reset for firmware activate using hot
reset, the driver queries the required reset method while handling the
sync reset request. If the required reset method is hot reset, the
driver will use pci_reset_bus() to reset the PCI link instead of the
link toggle.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-11-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Native capability for some steering engines lacks support for adding an
additional match with the same value to the same flow group. To accommodate
the NO APPEND flag in these scenarios, we include the new rule in the
existing flow table entry (fte) without immediate hardware commitment. When
a request is made to delete the corresponding hardware rule, we then commit
the pending rule to hardware.
Only one pending rule is supported because NO_APPEND is primarily used
during replacement operations. In this scenario, a rule is initially added.
When it needs replacement, the new rule is added with NO_APPEND set. Only
after the insertion of the new rule is the original rule deleted.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce a dedicated structure to encapsulate flow context, actions,
destination count, and modification mask. This refactoring lays the
groundwork for forthcoming patches that will integrate the NO APPEND
software logic. Future modifications should focus solely on these
specific fields.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Counter is in struct fte, remove it.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-7-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Downstream patches will need this as we might not want to reset
it when a pending rule is connected to the FTE.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-6-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As preparation for HW Steering support, where the function
get_root_namespace() is needed to get root FDB, make it an API function
and rename it to mlx5_get_root_namespace().
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20240911201757.1505453-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As preparation for HW steering support in fs core level, move SW
steering helper function that can be reused by HW steering to fs_cmd.h.
The function mlx5_fs_cmd_is_fw_term_table() checks if a flow table is a
flow steering termination table and so should be handled by FW steering.
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fixed all the '-ret' returns in error flow of functions to 'ret',
as the internal functions are already returning negative error values
(e.g. -EINVAL)
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Changed all the functions comments to adhere with kernel-doc formatting.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
disable_irq() after request_irq() still has a time gap in which
interrupts can come. request_irq() with IRQF_NO_AUTOEN flag will
disable IRQ auto-enable when request IRQ.
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20240911094445.1922476-4-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
disable_irq() after request_irq() still has a time gap in which
interrupts can come. request_irq() with IRQF_NO_AUTOEN flag will
disable IRQ auto-enable when request IRQ.
Fixes: bbb96dc7fa1a ("enetc: Factor out the traffic start/stop procedures")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20240911094445.1922476-3-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
disable_irq() after request_irq() still has a time gap in which
interrupts can come. request_irq() with IRQF_NO_AUTOEN flag will
disable IRQ auto-enable when request IRQ.
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20240911094445.1922476-2-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Both bareudp_xmit_skb() and bareudp6_xmit_skb() read their skb's inner
IP header to get its ECN value (with ip_tunnel_ecn_encap()). Therefore
we need to ensure that the inner IP header is part of the skb's linear
data.
Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/267328222f0a11519c6de04c640a4f87a38ea9ed.1726046181.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Bareudp reads the inner IP header to get the ECN value. Therefore, it
needs to ensure that it's part of the skb's linear data.
This is similar to the vxlan and geneve fixes for that same problem:
* commit f7789419137b ("vxlan: Pull inner IP header in vxlan_rcv().")
* commit 1ca1ba465e55 ("geneve: make sure to pull inner header in
geneve_rx()")
Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/5205940067c40218a70fbb888080466b2fc288db.1726046181.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-09-11
We've added 12 non-merge commits during the last 16 day(s) which contain
a total of 20 files changed, 228 insertions(+), 30 deletions(-).
There's a minor merge conflict in drivers/net/netkit.c:
00d066a4d4ed ("netdev_features: convert NETIF_F_LLTX to dev->lltx")
d96608794889 ("netkit: Disable netpoll support")
The main changes are:
1) Enable bpf_dynptr_from_skb for tp_btf such that this can be used
to easily parse skbs in BPF programs attached to tracepoints,
from Philo Lu.
2) Add a cond_resched() point in BPF's sock_hash_free() as there have
been several syzbot soft lockup reports recently, from Eric Dumazet.
3) Fix xsk_buff_can_alloc() to account for queue_empty_descs which
got noticed when zero copy ice driver started to use it,
from Maciej Fijalkowski.
4) Move the xdp:xdp_cpumap_kthread tracepoint before cpumap pushes skbs
up via netif_receive_skb_list() to better measure latencies,
from Daniel Xu.
5) Follow-up to disable netpoll support from netkit, from Daniel Borkmann.
6) Improve xsk selftests to not assume a fixed MAX_SKB_FRAGS of 17 but
instead gather the actual value via /proc/sys/net/core/max_skb_frags,
also from Maciej Fijalkowski.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
sock_map: Add a cond_resched() in sock_hash_free()
selftests/bpf: Expand skb dynptr selftests for tp_btf
bpf: Allow bpf_dynptr_from_skb() for tp_btf
tcp: Use skb__nullable in trace_tcp_send_reset
selftests/bpf: Add test for __nullable suffix in tp_btf
bpf: Support __nullable argument suffix for tp_btf
bpf, cpumap: Move xdp:xdp_cpumap_kthread tracepoint before rcv
selftests/xsk: Read current MAX_SKB_FRAGS from sysctl knob
xsk: Bump xsk_queue::queue_empty_descs in xp_can_alloc()
tcp_bpf: Remove an unused parameter for bpf_tcp_ingress()
bpf, sockmap: Correct spelling skmsg.c
netkit: Disable netpoll support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Link: https://patch.msgid.link/20240911211525.13834-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Prevent a possible int overflow in wq offsets [guc] (Nikita Zhandarovich)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZuKTN2XngNhBB3z3@linux
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.11-2024-09-11:
amdgpu:
- Avoid races between set_drr() functions and dc_state_destruct()
- Fix regerssion related to zpos
- Fix regression related to overlay cursor
- SMU 14.x updates
- JPEG fixes
- Silence an UBSAN warning
amdkfd:
- Fetch cacheline size from IP discovery
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240911170528.838655-1-alexander.deucher@amd.com
|
|
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.4.0.5
This patch set contains bug fixes related to HBA state clean ups, FCP
discovery on older adapters, kref imbalances, log message improvements,
and support for a new diagnostic loopback testing mode.
The patches were cut against Martin's 6.12/scsi-queue tree.
Link: https://lore.kernel.org/r/20240912232447.45607-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Update lpfc version to 14.4.0.5
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The VMID feature adds an extra application services header to each frame.
As such, the loopback test path is updated to accommodate the extra
application header.
Changes include filling in APPID and WQES bit fields for XMIT_SEQUENCE64
commands, a special loopback source APPID for verifying received loopback
data matches what is sent, and increasing ELS WQ size to accommodate the
APPID field in loopback test mode.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Revise certain log messages marked as KERN_ERR LOG_TRACE_EVENT to
KERN_WARNING and use the lpfc_vlog_msg() macro to still log the event. The
benefit is that events of interest are still logged and the entire trace
buffer is not dumped with extraneous logging information when using default
lpfc_log_verbose driver parameter settings.
Also, delete the keyword "fail" from such log messages as they aren't
really causes for concern. The log messages are more for warnings to a SAN
admin about SAN activity.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Deleting an NPIV instance requires all fabric ndlps to be released before
an NPIV's resources can be torn down. Failure to release fabric ndlps
beforehand opens kref imbalance race conditions. Fix by forcing the DA_ID
to complete synchronously with usage of wait_queue.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
With a FLOGI outstanding and loss of physical link connection to the fabric
for the duration of dev_loss_tmo, there is a fabric ndlp kref imbalance
that decrements the kref and sets the NLP_IN_RECOV_POST_DEV_LOSS flag at
the same time.
The issue is that when the FLOGI completion routine executes, the fabric
ndlp could already be freed because of the final kref put from the
dev_loss_tmo handler. Fix by early returning before the ndlp kref put if
the ndlp is deemed a candidate for NLP_IN_RECOV_POST_DEV_LOSS in the FLOGI
completion routine.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
An older generation of HBAs are failing FCP discovery due to usage of an
outdated field in FCP command WQEs.
Fix by checking the SLI Interface Type register for applicable support of
32 Byte CDB commands, and restore a setting for a WQE path using normal 16
byte CDBs.
Fixes: af20bb73ac25 ("scsi: lpfc: Add support for 32 byte CDBs")
Cc: stable@vger.kernel.org # v6.10+
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
It's possible for the driver to send a CMF_SYNC_WQE to nonresponsive
firmware during reset of the adapter. The phba link_state conditional
check is currently a strict == LPFC_LINK_DOWN, which does not cover
initialization states before reaching the LPFC_LINK_UP state.
Update the phba->link_state conditional to < LPFC_LINK_UP so that all
initialization states are covered before allowing sending CMF_SYNC_WQE.
Update taking of the hbalock to be during this link_state check as well.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
During HBA stress testing, a spam of received PLOGIs exposes a resource
recovery bug causing leakage of lpfc_sqlq entries from the global
phba->sli4_hba.lpfc_els_sgl_list.
The issue is in lpfc_els_flush_cmd(), where the driver attempts to recover
outstanding ELS sgls when walking the txcmplq. Only CMD_ELS_REQUEST64_CRs
and CMD_GEN_REQUEST64_CRs are added to the abort and cancel lists. A check
for CMD_XMIT_ELS_RSP64_WQE is missing in order to recover LS_ACC usages of
the phba->sli4_hba.lpfc_els_sgl_list too.
Fix by adding CMD_XMIT_ELS_RSP64_WQE as part of the txcmplq walk when
adding WQEs to the abort and cancel list in lpfc_els_flush_cmd(). Also,
update naming convention from CRs to WQEs.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Ranjan Kumar <ranjan.kumar@broadcom.com> says:
Few Enhancements and minor fix of mpi3mr driver.
Link: https://lore.kernel.org/r/20240905102753.105310-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Update driver version to 8.12.0.0.50.
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20240905102753.105310-6-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
During controller transitioning to READY state, if the controller is found
in transient states ("becoming ready" or "reset requested"), driver waits
for 510 secs even if the controller transitions out of these states
early. This causes an unnecessary delay of 510 secs in the overall firmware
initialization sequence.
Poll the controller state periodically (every 100 milliseconds) while
waiting for the controller to come out of the mentioned transient
states. Once the controller transits out of the transient states, come out
of the wait loop.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20240905102753.105310-5-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Update MPI Headers to revision 34.
Signed-off-by: Prayas Patel <prayas.patel@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20240905102753.105310-4-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Make driver use the timestamp update interval value provided by firmware in
the driver page 1. If firmware fails to provide non-zero value, then the
driver will fall back to the driver defined macro.
Signed-off-by: Prayas Patel <prayas.patel@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20240905102753.105310-3-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When enabling the IOC request and polling for controller ready status, poll
for controller fault and reset history bit. If the controller is faulted
or the reset history bit is set, retry the initialization a maximum of
three times (2 retries) or if the cumulative time taken for all retries
exceeds 510 seconds.
Signed-off-by: Prayas Patel <prayas.patel@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20240905102753.105310-2-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
ENA currently supports the following customer metrics:
- `bw_in_allowance_exceeded`
- `bw_out_allowance_exceeded`
- `conntrack_allowance_exceeded`
- `linklocal_allowance_exceeded`
- `pps_allowance_exceeded`
This patch adds a new metric named:
`conntrack_allowance_available`.
Information about these metrics is available in [1].
In addition, the interface between the driver and the
device has been upgraded to allow more flexibility and
expendability to additional metrics in the future.
[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html#network-performance-metrics
Signed-off-by: Ron Beider <rbeider@amazon.com>
Signed-off-by: Shahar Itzko <itzko@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://patch.msgid.link/20240909084704.13856-3-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ENA Express metrics, called `ena_srd` are exposed to
customers via `ethtool`.
The metrics allow customers to check the configuration
(mode), tx/rx counters as well as resource utilization.
The documentation is also updated to provide a general
explanation about ENA Express as well as links for further
information about metrics and configurations.
Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://patch.msgid.link/20240909084704.13856-2-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|