Age | Commit message (Collapse) | Author |
|
Introduce `mlx5_esw_devlink_rate_node_tc_bw_set()` and
`mlx5_esw_devlink_rate_leaf_tc_bw_set()` with no-op logic.
Future patches will add support for setting traffic class bandwidth
on rate objects.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250629142138.361537-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a NULL check before accessing device memory to prevent a crash if
dev->dm allocation in mlx5_init_once() fails.
Fixes: c9b9dcb430b3 ("net/mlx5: Move device memory management to mlx5_core")
Signed-off-by: Stav Aviram <saviram@nvidia.com>
Link: https://patch.msgid.link/c88711327f4d74d5cebc730dc629607e989ca187.1751370035.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Failing during the initialization of root_namespace didn't cleanup
the priorities of the namespace on which the failure occurred.
Properly cleanup said priorities on failure.
Fixes: 52931f55159e ("net/mlx5: fs, add multiple prios to RDMA TRANSPORT steering domain")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://patch.msgid.link/78cf89b5d8452caf1e979350b30ada6904362f66.1751451780.git.leon@kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In both the read callback for struct cyclecounter, and in struct
timecounter, struct cyclecounter is declared as a const pointer.
Unfortunatly, a number of users of this pointer treat it as a non-const
pointer as it is burried in a larger structure that is heavily modified by
the callback function when accessed. This lie had been hidden by the fact
that container_of() "casts away" a const attribute of a pointer without any
compiler warning happening at all.
Fix this all up by removing the const attribute in the needed places so
that everyone can see that the structure really isn't const, but can,
and is, modified by the users of it.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/2025070124-backyard-hurt-783a@gregkh
|
|
Currently when xdp_rxq_info_reg_mem_model() fails in the XSK path, the
error handling incorrectly jumps to err_destroy_page_pool. While this
may not cause errors, we should make it jump to the correct location.
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the new ib_alloc_device_with_net() API to allocate the IB device
so that it is properly bound to the network namespace obtained via
mlx5_core_net(). This change ensures correct namespace association
(e.g., for containerized setups).
Additionally, expose mlx5_core_net so that RDMA driver can use it.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
RDMA TRANSPORT domains were initially limited to a single priority.
This change allows the domains to have multiple priorities, making
it possible to add several rules and control the order in which
they're evaluated.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/b299cbb4c8678a33da6e6b6988b5bf6145c54b88.1750148083.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/20250618203823.1336156-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc3).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The message "Error getting PHY irq. Use polling instead"
is emitted when the mlxbf_gige driver is loaded by the
kernel before the associated gpio-mlxbf driver, and thus
the call to get the PHY IRQ fails since it is not yet
available. The driver probe() must return -EPROBE_DEFER
if acpi_dev_gpio_irq_get_by() returns the same.
Fixes: 6c2a6ddca763 ("net: mellanox: mlxbf_gige: Replace non-standard interrupt handling")
Signed-off-by: David Thompson <davthompson@nvidia.com>
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250618135902.346-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Static analysis shows that pointer "my_ets" cannot be NULL because it
points to the object "struct ieee_ets".
Remove the extra NULL check. It is meaningless and harms the readability
of the code.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250616045034.26000-1-a.vatoropin@crpt.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Drivers that are using ops lock and don't depend on RTNL lock
still need to manage it because udp_tunnel's RTNL dependency.
Introduce new udp_tunnel_nic_lock and use it instead of
rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from
udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to
grab udp_tunnel_nic_lock mutex and might sleep).
Cover more places in v4:
- netlink
- udp_tunnel_notify_add_rx_port (ndo_open)
- triggers udp_tunnel_nic_device_sync_work
- udp_tunnel_notify_del_rx_port (ndo_stop)
- triggers udp_tunnel_nic_device_sync_work
- udp_tunnel_get_rx_info (__netdev_update_features)
- triggers NETDEV_UDP_TUNNEL_PUSH_INFO
- udp_tunnel_drop_rx_info (__netdev_update_features)
- triggers NETDEV_UDP_TUNNEL_DROP_INFO
- udp_tunnel_nic_reset_ntf (ndo_open)
- notifiers
- udp_tunnel_nic_netdevice_event, depending on the event:
- triggers NETDEV_UDP_TUNNEL_PUSH_INFO
- triggers NETDEV_UDP_TUNNEL_DROP_INFO
- ethnl_tunnel_info_reply_size
- udp_tunnel_nic_set_port_priv (two intel drivers)
Cc: Michael Chan <michael.chan@broadcom.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Link: https://patch.msgid.link/20250616162117.287806-4-stfomichev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rely on existing IB_MTU_XXX definitions which exist in ib_verbs.h.
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/382c91ee506e7f1f3c1801957df6b28963484b7d.1750147222.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Declare netmem TX support in netdev.
As required, use the netmem aware dma unmapping APIs
for unmapping netmems in tx completion path.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-13-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In mlx5, tcp header-data split requires HW GRO to be on.
Enabling it fails when HW GRO is off.
mlx5e_fix_features now keeps HW GRO on when tcp data split is enabled.
Finally, when tcp data split is disabled, features are updated to maybe
remove the forced HW GRO.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-12-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bulk of the work is done in mlx5e_queue_mem_alloc, where we allocate
and create the new channel resources, similar to
mlx5e_safe_switch_params, but here we do it for a single channel using
existing params, sort of a clone channel.
To swap the old channel with the new one, we deactivate and close the
old channel then replace it with the new one, since the swap procedure
doesn't fail in mlx5, we do it all in one place (mlx5e_queue_start).
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250616141441.1243044-11-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On netdev_rx_queue_restart, a special type of page pool maybe expected.
In this patch declare support for UNREADABLE netmem iov pages in the
pool params only when header data split shampo RQ mode is enabled, also
set the queue index in the page pool params struct.
Shampo mode requirement: Without header split rx needs to peek at the data,
we can't do UNREADABLE_NETMEM.
The patch also enables the use of a separate page pool for headers when
a memory provider is installed for the queue, otherwise the same common
page pool continues to be used.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-10-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5e_page_frag holds the physical page itself, to naturally support
zc page pools, remove physical page reference from mlx5 and replace it
with netmem_ref, to avoid internal handling in mlx5 for net_iov backed
pages.
SHAMPO can issue packets that are not split into header and data. These
packets will be dropped if the data part resides in a net_iov as the
driver can't read into this area.
No performance degradation observed.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-9-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow allocating a separate page pool for headers when SHAMPO is on.
This will be useful for adding support to zc page pool, which has to be
different from the headers page pool.
For now, the pools are the same.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-8-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add missing HW capabilities, declare the feature in
netdev->vlan_features, similar to other features in mlx5e_build_nic_netdev.
No functional change here as all by default disabled features are
explicitly disabled at the bottom of the function.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-7-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Two SHAMPO params are static and always the same, remove them from the
global mlx5e_params struct.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop redundant SHAMPO structure alloc/free functions.
Gather together function calls pertaining to header split info, pass
header per WQE (hd_per_wqe) as parameter to those function to avoid use
before initialization future mistakes.
Allocate HW GRO related info outside of the header related info scope.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RX Flow Hashing supports using different configuration for different
RSS contexts. Only two drivers seem to support it. Make sure we
uniformly error out for drivers which don't.
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250611145949.2674086-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc2).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and wireless.
Current release - regressions:
- af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD
Current release - new code bugs:
- eth: airoha: correct enable mask for RX queues 16-31
- veth: prevent NULL pointer dereference in veth_xdp_rcv when peer
disappears under traffic
- ipv6: move fib6_config_validate() to ip6_route_add(), prevent
invalid routes
Previous releases - regressions:
- phy: phy_caps: don't skip better duplex match on non-exact match
- dsa: b53: fix untagged traffic sent via cpu tagged with VID 0
- Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused
transient packet loss, exact reason not fully understood, yet
Previous releases - always broken:
- net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6)
- sched: sfq: fix a potential crash on gso_skb handling
- Bluetooth: intel: improve rx buffer posting to avoid causing issues
in the firmware
- eth: intel: i40e: make reset handling robust against multiple
requests
- eth: mlx5: ensure FW pages are always allocated on the local NUMA
node, even when device is configure to 'serve' another node
- wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850,
prevent kernel crashes
- wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
for 3 sec if fw_stats_done is not set"
* tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
net: ethtool: Don't check if RSS context exists in case of context 0
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
ipv6: Move fib6_config_validate() to ip6_route_add().
net: drv: netdevsim: don't napi_complete() from netpoll
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
veth: prevent NULL pointer dereference in veth_xdp_rcv
net_sched: remove qdisc_tree_flush_backlog()
net_sched: ets: fix a race in ets_qdisc_change()
net_sched: tbf: fix a race in tbf_change()
net_sched: red: fix a race in __red_change()
net_sched: prio: fix a race in prio_tune()
net_sched: sch_sfq: reject invalid perturb period
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
MAINTAINERS: Update Kuniyuki Iwashima's email address.
selftests: net: add test case for NAT46 looping back dst
net: clear the dst when changing skb protocol
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
net/mlx5e: Fix leak of Geneve TLV option object
net/mlx5: HWS, make sure the uplink is the last destination
...
|
|
Check for if ida_alloc() or rhashtable_lookup_get_insert_fast() fails.
Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Link: https://patch.msgid.link/aEmBONjyiF6z5yCV@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the link is up, either eth_proto_oper or ext_eth_proto_oper
typically reports the active link protocol, from which both speed
and number of lanes can be retrieved. However, in certain cases,
such as when a NIC is connected via a non-standard cable, the
firmware may not report the protocol.
In such scenarios, the speed can still be obtained from the
data_rate_oper field in PTYS register. Since data_rate_oper
provides only speed information and lacks lane details, it is
incorrect to derive the number of lanes from it.
This patch corrects the behavior by setting the number of lanes to
UNKNOWN instead of incorrectly using MAX_LANES when relying on
data_rate_oper.
Fixes: 7e959797f021 ("net/mlx5e: Enable lanes configuration when auto-negotiation is off")
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-10-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Previously, a unique tunnel id was added for the matching on TC
non-zero chains, to support inner header rewrite with goto action.
Later, it was used to support VF tunnel offload for vxlan, then for
Geneve and GRE. To support VF tunnel, a temporary mlx5_flow_spec is
used to parse tunnel options. For Geneve, if there is TLV option, a
object is created, or refcnt is added if already exists. But the
temporary mlx5_flow_spec is directly freed after parsing, which causes
the leak because no information regarding the object is saved in
flow's mlx5_flow_spec, which is used to free the object when deleting
the flow.
To fix the leak, call mlx5_geneve_tlv_option_del() before free the
temporary spec if it has TLV object.
Fixes: 521933cdc4aa ("net/mlx5e: Support Geneve and GRE with VF tunnel offload")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Alex Lazar <alazar@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-9-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When there are more than one destinations, we create a FW flow
table and provide it with all the destinations. FW requires to
have wire as the last destination in the list (if it exists),
otherwise the operation fails with FW syndrome.
This patch fixes the destination array action creation: if it
contains a wire destination, it is moved to the end.
Fixes: 504e536d9010 ("net/mlx5: HWS, added actions handling")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-7-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix missing field handling in definer - outer IP version.
Fixes: 74a778b4a63f ("net/mlx5: HWS, added definers handling")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The newly introduced mutex is only used for reformat actions, but it was
initialized for modify header instead.
The struct that contains the mutex is zero-initialized and an all-zero
mutex is valid, so the issue only shows up with CONFIG_DEBUG_MUTEXES.
Fixes: b206d9ec19df ("net/mlx5: HWS, register reformat actions with fw")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When attempting to add a rule to an existing flow group, if a matching
flow group exists but is not active, the error code returned should be
EAGAIN, so that the rule can be added to the matching flow group once
it is active, rather than ENOENT, which indicates that no matching
flow group was found.
Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Gavi Teitz <gavi@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix shutdown flow UAF when a virtual function is created on the embedded
chip (ECVF) of a BlueField device. In such case the vport acl ingress
table is not properly destroyed.
ECVF functionality is independent of ecpf_vport_exists capability and
thus functions mlx5_eswitch_(enable|disable)_pf_vf_vports() should not
test it when enabling/disabling ECVF vports.
kernel log:
[] refcount_t: underflow; use-after-free.
[] WARNING: CPU: 3 PID: 1 at lib/refcount.c:28
refcount_warn_saturate+0x124/0x220
----------------
[] Call trace:
[] refcount_warn_saturate+0x124/0x220
[] tree_put_node+0x164/0x1e0 [mlx5_core]
[] mlx5_destroy_flow_table+0x98/0x2c0 [mlx5_core]
[] esw_acl_ingress_table_destroy+0x28/0x40 [mlx5_core]
[] esw_acl_ingress_lgcy_cleanup+0x80/0xf4 [mlx5_core]
[] esw_legacy_vport_acl_cleanup+0x44/0x60 [mlx5_core]
[] esw_vport_cleanup+0x64/0x90 [mlx5_core]
[] mlx5_esw_vport_disable+0xc0/0x1d0 [mlx5_core]
[] mlx5_eswitch_unload_ec_vf_vports+0xcc/0x150 [mlx5_core]
[] mlx5_eswitch_disable_sriov+0x198/0x2a0 [mlx5_core]
[] mlx5_device_disable_sriov+0xb8/0x1e0 [mlx5_core]
[] mlx5_sriov_detach+0x40/0x50 [mlx5_core]
[] mlx5_unload+0x40/0xc4 [mlx5_core]
[] mlx5_unload_one_devl_locked+0x6c/0xe4 [mlx5_core]
[] mlx5_unload_one+0x3c/0x60 [mlx5_core]
[] shutdown+0x7c/0xa4 [mlx5_core]
[] pci_device_shutdown+0x3c/0xa0
[] device_shutdown+0x170/0x340
[] __do_sys_reboot+0x1f4/0x2a0
[] __arm64_sys_reboot+0x2c/0x40
[] invoke_syscall+0x78/0x100
[] el0_svc_common.constprop.0+0x54/0x184
[] do_el0_svc+0x30/0xac
[] el0_svc+0x48/0x160
[] el0t_64_sync_handler+0xa4/0x12c
[] el0t_64_sync+0x1a4/0x1a8
[] --[ end trace 9c4601d68c70030e ]---
Fixes: a7719b29a821 ("net/mlx5: Add management of EC VF vports")
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When firmware asks the driver to allocate more pages, using event of
give_pages, the driver should always allocate it from same NUMA, the
original device NUMA. Current code uses dev_to_node() which can result
in different NUMA as it is changed by other driver flows, such as
mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for
allocating firmware pages.
Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on reader NUMA node")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Devlink info allows to expose serial number and board serial number
Get the values from PCI VPD and expose it.
$ devlink dev info
pci/0000:08:00.0:
driver mlx5_core
serial_number e4397f872caeed218000846daa7d2f49
board.serial_number MT2314XZ00YA
versions:
fixed:
fw.psid MT_0000000894
running:
fw.version 28.41.1000
fw 28.41.1000
stored:
fw.version 28.41.1000
fw 28.41.1000
auxiliary/mlx5_core.eth.0:
driver mlx5_core.eth
pci/0000:08:00.1:
driver mlx5_core
serial_number e4397f872caeed218000846daa7d2f49
board.serial_number MT2314XZ00YA
versions:
fixed:
fw.psid MT_0000000894
running:
fw.version 28.41.1000
fw 28.41.1000
stored:
fw.version 28.41.1000
fw 28.41.1000
auxiliary/mlx5_core.eth.1:
driver mlx5_core.eth
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250610025128.109232-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
|
|
The "freq" variable is in terms of MHz and "max_val_cycles" is in terms
of Hz. The fact that "max_val_cycles" is a u64 suggests that support
for high frequency is intended but the "freq_khz * 1000" would overflow
the u32 type if we went above 4GHz. Use unsigned long long type for the
mutliplication to prevent that.
Fixes: 31c128b66e5b ("net/mlx4_en: Choose time-stamping shift value according to HW frequency")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/aDbFHe19juIJKjsb@stanley.mountain
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Merge in late fixes to prepare for the 6.16 net-next PR.
No conflicts nor adjacent changes.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
'struct thermal_zone_device_ops' are not modified in this driver.
Constifying these structures moves some data to a read-only section, so
increases overall security, especially when the structure holds some
function pointers.
While at it, also constify a struct thermal_zone_params.
On a x86_64, with allmodconfig:
Before:
======
text data bss dec hex filename
24899 8036 0 32935 80a7 drivers/net/ethernet/mellanox/mlxsw/core_thermal.o
After:
=====
text data bss dec hex filename
25379 7556 0 32935 80a7 drivers/net/ethernet/mellanox/mlxsw/core_thermal.o
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/4516676973f5adc1cdb76db1691c0f98b6fa6614.1748164348.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This was intended to be negative -ENOMEM but the '-' character was left
off accidentally. This typo doesn't affect runtime because the caller
treats all non-zero returns the same.
Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/aDCbjNcquNC68Hyj@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The function mlx5_query_nic_vport_node_guid() calls the function
mlx5_query_nic_vport_context() but does not check its return value.
A proper implementation can be found in mlx5_nic_vport_query_local_lb().
Add error handling for mlx5_query_nic_vport_context(). If it fails, free
the out buffer via kvfree() and return error code.
Fixes: 9efa75254593 ("net/mlx5_core: Introduce access functions to query vport RoCE fields")
Cc: stable@vger.kernel.org # v4.5
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250524163425.1695-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A representor netdev does not correspond to real hardware that needs to
be updated when setting the MAC address. The default eth_mac_addr() is
sufficient for simply updating the netdev's MAC address with validation.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1747898036-1121904-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The function mlx5_query_nic_vport_qkey_viol_cntr() calls the function
mlx5_query_nic_vport_context() but does not check its return value. This
could lead to undefined behavior if the query fails. A proper
implementation can be found in mlx5_nic_vport_query_local_lb().
Add error handling for mlx5_query_nic_vport_context(). If it fails, free
the out buffer via kvfree() and return error code.
Fixes: 9efa75254593 ("net/mlx5_core: Introduce access functions to query vport RoCE fields")
Cc: stable@vger.kernel.org # v4.5
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250521133620.912-1-vulab@iscas.ac.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
1) Remove some unnecessary strscpy_pad() size arguments.
From Thorsten Blum.
2) Correct use of xso.real_dev on bonding offloads.
Patchset from Cosmin Ratiu.
3) Add hardware offload configuration to XFRM_MSG_MIGRATE.
From Chiachang Wang.
4) Refactor migration setup during cloning. This was
done after the clone was created. Now it is done
in the cloning function itself.
From Chiachang Wang.
5) Validate assignment of maximal possible SEQ number.
Prevent from setting to the maximum sequrnce number
as this would cause for traffic drop.
From Leon Romanovsky.
6) Prevent configuration of interface index when offload
is used. Hardware can't handle this case.i
From Leon Romanovsky.
7) Always use kfree_sensitive() for SA secret zeroization.
From Zilin Guan.
ipsec-next-2025-05-23
* tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: use kfree_sensitive() for SA secret zeroization
xfrm: prevent configuration of interface index when offload is used
xfrm: validate assignment of maximal possible SEQ number
xfrm: Refactor migration setup during the cloning process
xfrm: Migrate offload configuration
bonding: Fix multiple long standing offload races
bonding: Mark active offloaded xfrm_states
xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
xfrm: Remove unneeded device check from validate_xmit_xfrm
xfrm: Use xdo.dev instead of xdo.real_dev
net/mlx5: Avoid using xso.real_dev unnecessarily
xfrm: Remove unnecessary strscpy_pad() size arguments
====================
Link: https://patch.msgid.link/20250523075611.3723340-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch convert mlx5 to use the new netdev instance lock in addition
to the pre-existing state_lock (and the RTNL).
mlx5e_priv.state_lock was already used throughout mlx5 to protect
against concurrent state modifications on the same netdev, usually in
addition to the RTNL. The new netdev instance lock will eventually
replace it, but for now, it is acquired in addition to the existing
locks in the order RTNL -> instance lock -> state_lock.
All three netdev types handled by mlx5 are converted to the new style of
locking, because they share a lot of code related to initializing
channels and dealing with NAPI, so it's better to convert all three
rather than introduce different assumptions deep in the call stack
depending on the type of device.
Because of the nature of the call graphs in mlx5, it wasn't possible to
incrementally convert parts of the driver to use the new lock, since
either all call paths into NAPI have to possess the new lock if the
*_locked variants are used, or none of them can have the lock.
One area which required extra care is the interaction between closing
channels and devlink health reporter tasks.
Previously, the recovery tasks were unconditionally acquiring the
RTNL, which could lead to deadlocks in these scenarios:
T1: mlx5e_close (== .ndo_stop(), has RTNL) -> mlx5e_close_locked
-> mlx5e_close_channels -> mlx5e_ptp_close
-> mlx5e_ptp_close_queues -> mlx5e_ptp_close_txqsqs
-> mlx5e_ptp_close_txqsq
-> cancel_work_sync(&ptpsq->report_unhealthy_work) waits for
T2: mlx5e_ptpsq_unhealthy_work -> mlx5e_reporter_tx_ptpsq_unhealthy
-> mlx5e_health_report -> devlink_health_report
-> devlink_health_reporter_recover
-> mlx5e_tx_reporter_ptpsq_unhealthy_recover which does:
rtnl_lock(); => Deadlock.
Another similar instance of this is:
T1: mlx5e_close (== .ndo_stop(), has RTNL) -> mlx5e_close_locked
-> mlx5e_close_channels -> mlx5e_ptp_close
-> mlx5e_ptp_close_queues -> mlx5e_ptp_close_txqsqs
-> mlx5e_ptp_close_txqsq
-> cancel_work_sync(&sq->recover_work) waits for
T2: mlx5e_tx_err_cqe_work -> mlx5e_reporter_tx_err_cqe
-> mlx5e_health_report -> devlink_health_report
-> devlink_health_reporter_recover
-> mlx5e_tx_reporter_err_cqe_recover which does:
rtnl_lock(); => Another deadlock.
Fix that by using the same pattern previously done in
mlx5e_tx_timeout_work, where the RTNL was repeatedly tried to be
acquired until either:
a) it is successfully acquired or
b) there's no need for the work to be done any more (channel is being
closed).
Now, for all three recovery tasks, the instance lock is repeatedly tried
to be acquired until successful or the channel/SQ is closed.
As a side-effect, drop the !test_bit(MLX5E_STATE_OPENED, &priv->state)
check from mlx5e_tx_timeout_work, it's weaker than
!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state) and unnecessary.
Future patches will introduce new call paths (from netdev queue
management ops) which can close channels (and call cancel_work_sync on
the recovery tasks) without the RTNL lock and only with the netdev
instance lock.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1747829342-1018757-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There's no explanation in the original commit of why that was done, but
presumably flashing takes a long time and holding RTNL for so long
blocks other interactions with the netdev layer.
However, the stack is moving towards netdev instance locking and
dropping and reacquiring RTNL in the context of flashing introduces
locking ordering issues: RTNL must be acquired before the netdev
instance lock and released after it.
This patch therefore takes the simpler approach by no longer dropping
and reacquiring the RTNL, as soon RTNL for ethtool will be removed,
leaving only the instance lock to protect against races.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1747829342-1018757-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Having adjacent accelerated modify header actions (so-called
pattern-argument actions) may result in inconsistent outcome.
These inconsistencies can take the form of writes to the same
field or a read coupled with a write to the same field. The
solution is to detect such dependencies and insert nops between
the offending actions.
The existing implementation had a few issues, which pretty much
required a complete rewrite of the code that handles these
dependencies.
In the new implementation we're doing the following:
* Checking any two adjacent actions for conflicts (not just
odd-even pairs).
* Marking 'set' and 'add' action fields as destination, rather
than source, for the purposes of checking for conflicts.
* Checking all types of actions ('add', 'set', 'copy') for
dependencies.
* Managing offsets of the args in the buffer - copy the action
args to the right place in the buffer.
* Checking that after inserting nops we're still within the number
of supported actions - return an error otherwise.
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1747766802-958178-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix typo - rename 'nope_locations' to 'nop_locations', which describes
the locations of 'nop' actions. To shorten the lines, this renaming
also required some refactoring.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1747766802-958178-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Hardware steering handles actions differently from firmware, but for
termination rules that use encapsulation the firmware needs to be aware
of the action.
Fix this by registering reformat actions with the firmware the first
time this is needed. To do this, add a third possible owner for an
action, and also a lock to protect against registration of the same
action from different threads.
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1747766802-958178-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The firmware reformat id is a u32 and can't safely be returned as an
int. Because the functions also need a way to signal error, prefer to
return the id as an output parameter and keep the return code only for
success/error.
While we're at it, also extract some duplicate code to fetch the
reformat id from a more generic struct pkt_reformat.
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1747766802-958178-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|