Age | Commit message (Collapse) | Author |
|
Refine RoCE SRIOV resource configuration design,
using the INITIALIZE_FW's flag as an indication
for the new design to the firmware. RoCE driver does not
have to provision resources to VF when firmware
advertises support for RoCE resource management by NIC driver.
Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
CC: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1730882676-24434-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
During driver load, PF RDMA driver provisions resources
to the RDMA VFs. This logic takes into consideration of
the total number of VFs supported on the PF while
allocating resources. Firmware now advertises a capability
where NIC driver can allocate resources for RDMA VFs when
the user actually creates a VF. So this resource
distribution can be based on the number of active VFs.
This patch adds the support to check for the firmware
capability and follow the new RDMA VF resource allocation
strategy. The current logic in the RDMA driver will be
removed for the newer Firmware versions in a subsequent
patch in this series.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1730882676-24434-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
All MTL and ARL SKUs share the same GSC FW, but the newer platforms are
only supported in newer blobs. In particular, ARL-S is supported
starting from 102.0.10.1878 (which is already the minimum required
version for ARL in the code), while ARL-H and ARL-U are supported from
102.1.15.1926. Therefore, the driver needs to check which specific ARL
subplatform its running on when verifying that the GSC FW is new enough
for it.
Fixes: 2955ae8186c8 ("drm/i915: ARL requires a newer GSC firmware")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241028233132.149745-1-daniele.ceraolospurio@intel.com
(cherry picked from commit 3c1d5ced18db8a67251c8436cf9bdc061f972bdb)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
The struct device_link *link field in struct qcom_ice is only used to
store the result of a device_link_add call with the
DL_FLAG_AUTOREMOVE_SUPPLIER flag. With this flag, the resulting value
can only be used to check whether the link is present or not, as per the
device_link_add description, hence this commit removes the field.
Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20241030025046.303342-1-joe@pf.is.s.u-tokyo.ac.jp
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
The newly added driver causes a warnings when enabling -Wunused-const-variables:
drivers/clk/qcom/gcc-ipq5424.c:1064:30: error: 'ftbl_gcc_q6_axi_clk_src' defined but not used [-Werror=unused-const-variable=]
1064 | static const struct freq_tbl ftbl_gcc_q6_axi_clk_src[] = {
| ^~~~~~~~~~~~~~~~~~~~~~~
drivers/clk/qcom/gcc-ipq5424.c:957:30: error: 'ftbl_gcc_qpic_clk_src' defined but not used [-Werror=unused-const-variable=]
957 | static const struct freq_tbl ftbl_gcc_qpic_clk_src[] = {
| ^~~~~~~~~~~~~~~~~~~~~
drivers/clk/qcom/gcc-ipq5424.c:497:30: error: 'ftbl_gcc_qupv3_2x_core_clk_src' defined but not used [-Werror=unused-const-variable=]
497 | static const struct freq_tbl ftbl_gcc_qupv3_2x_core_clk_src[] = {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to hopefully enable this warning by default in the future,
remove the data for now. If it gets used in the future, it can
trivially get added back.
Fixes: 21b5d5a4a311 ("clk: qcom: add Global Clock controller (GCC) driver for IPQ5424 SoC")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20241111103258.3336183-1-arnd@kernel.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
The current loop code was based on the assumption
that there can be page leftovers from previous function calls.
This patch changes the allocation loop to make it clearer how
pages get allocated every MLX5E_SHAMPO_WQ_HEADER_PER_PAGE headers.
This change has no functional implications.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-13-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The info array is used to store a pointer to the
dma address of the header and to the frag page. However,
this array is not really required:
- The frag page can be calculated from the header index
frag page index = header index / headers per page.
- The dma address can be calculated through a formula:
dma page address + header offset.
This series gets rid of the info array and uses the above
formulas instead.
The current_page_index was used in conjunction with the info array to
store page fragment indices. This variable is dropped as well.
There was no performance regression observed.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-12-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the UMR allocation has been simplified, it is no longer
possible to have a leftover page from a previous call to
mlx5e_build_shampo_hd_umr().
This patch simplifies the code by switching the order of operations:
first take the frag page and then increment the index. This is more
straightforward and it also paves the way for dropping the info
array.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When calculating the index for the next frag page slot, the divisor is
incorrect: it should be the number of pages per queue not the number of
headers per queue. This is currently harmless because frag pages are not
used directly, but they are intermediated through the info array. But it
needs to be fixed as an upcoming patch will get rid of the info array.
This patch introduces a new pages per queue variable and plugs it in the
formula.
Now that this variable exists, additional code can be simplified in the
SHAMPO initialization code.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allocating page fragments for header data split is currently
more complicated than it should be. That's because the number
of KSM entries allocated is not aligned to the number of headers
per page. This leads to having leftovers in the next allocation
which require additional accounting and needlessly complicated
code.
This patch aligns (down) the number of KSM entries in the
UMR WQE to the number of headers per page by:
1) Aligning the max number of entries allocated per UMR WQE
(max_ksm_entries) to MLX5E_SHAMPO_WQ_HEADER_PER_PAGE.
2) Aligning the total number of free headers to
MLX5E_SHAMPO_WQ_HEADER_PER_PAGE.
... and then it drops the extra accounting code from
mlx5e_build_shampo_hd_umr().
Although the number of entries allocated per UMR WQE is slightly
smaller due to aligning down, no performance impact was observed.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor esw_qos_vport_enable to support more generic configurations,
allowing it to be reused for new vport node types in future patches.
This refactor includes a new way to change the vport parent node by
disabling the current setup and re-enabling it with the new parent.
This change sets the foundation for adapting configuration based on the
parent type in future patches.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fold the esw_qos_vport_enable function into operations for configuring
maximum and minimum rates, simplifying QoS logic. This change
consolidates enabling and updating the scheduling element
configuration, streamlining how vport QoS is initialized and adjusted.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce helper functions to create and destroy scheduling elements,
allowing flexible configuration for different scheduling element types.
The new helper functions streamline the process by centralizing error
handling and logging through esw_qos_sched_elem_op_warn, which now
accepts the operation type (create, destroy, or modify).
The changes also adjust the esw_qos_vport_enable and
mlx5_esw_qos_vport_disable functions to leverage the new generalized
create/destroy helpers.
The destroy functions now log errors with esw_warn without returning
them. This prevents unnecessary error handling since the node was
already destroyed and no further action is required from callers.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor esw_qos_sched_elem_config to set bitmasks only when max_rate
or bw_share values change, allowing the function to configure nodes
with only one of these parameters.
This enables more flexible usage for nodes where only one parameter
requires configuration.
Remove scattered assignments and checks to centralize them within this
function, removing the now redundant esw_qos_set_node_max_rate
entirely.
With this refactor, also remove the assignment of the vport scheduling
node max rate to the parent max rate for unlimited vports
(where max rate is set to zero), as firmware already handles this
behavior.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor max_rate and min_rate setting functions to operate on
mlx5_esw_sched_node, allowing for generalized handling of both vports
and nodes.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This change updates esw_qos_normalize_min_rate to not return errors,
significantly simplifying the code.
Normalization failures are software bugs, and it's unnecessary to
handle them with rollback mechanisms. Instead,
`esw_qos_update_sched_node_bw_share` and `esw_qos_normalize_min_rate`
now return void, with any errors logged as warnings to indicate
potential software issues.
This approach avoids compensating for hidden bugs and removes error
handling from all places that perform normalization, streamlining
future patches.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The E-switch mode was previously updated before removing and re-adding the
IB device, which could cause a temporary mismatch between the E-switch mode
and the IB device configuration.
To prevent this discrepancy, the IB device is now removed first, then
the E-switch mode is updated, and finally, the IB device is re-added.
This sequence ensures consistent alignment between the E-switch mode and
the IB device whenever the mode changes, regardless of the new mode value.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107194357.683732-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In Multi-PF (Socket Direct) configurations, when a loopback packet is
sent through one of the secondary devices, it will always be received
on the primary device. This causes the loopback layer to fail in
identifying the loopback packet as the devices are different.
To avoid false test failures, disable the loopback self-test in
Multi-PF configurations.
Fixes: ed29705e4ed1 ("net/mlx5: Enable SD feature")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107183527.676877-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In error flow of mlx5_tc_ct_entry_add_rule(), in case ct_rule_add()
callback returns error, zone_rule->attr is used uninitiated. Fix it to
use attr which has the needed pointer value.
Kernel log:
BUG: kernel NULL pointer dereference, address: 0000000000000110
RIP: 0010:mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core]
…
Call Trace:
<TASK>
? __die+0x20/0x70
? page_fault_oops+0x150/0x3e0
? exc_page_fault+0x74/0x140
? asm_exc_page_fault+0x22/0x30
? mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core]
? mlx5_tc_ct_entry_add_rule+0x1d5/0x2f0 [mlx5_core]
mlx5_tc_ct_block_flow_offload+0xc6a/0xf90 [mlx5_core]
? nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table]
nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table]
flow_offload_work_handler+0x142/0x320 [nf_flow_table]
? finish_task_switch.isra.0+0x15b/0x2b0
process_one_work+0x16c/0x320
worker_thread+0x28c/0x3a0
? __pfx_worker_thread+0x10/0x10
kthread+0xb8/0xf0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2d/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Fixes: 7fac5c2eced3 ("net/mlx5: CT: Avoid reusing modify header context for natted entries")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107183527.676877-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Non-uplink representor port does not support XDP. The patch clears
the xdp feature by checking the net_device_ops.ndo_bpf is set or not.
Verify using the netlink tool:
$ tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml --dump dev-get
Representor netdev before the patch:
{'ifindex': 8,
'xdp-features': {'basic',
'ndo-xmit',
'ndo-xmit-sg',
'redirect',
'rx-sg',
'xsk-zerocopy'},
'xdp-rx-metadata-features': set(),
'xdp-zc-max-segs': 1,
'xsk-features': set()},
With the patch:
{'ifindex': 8,
'xdp-features': set(),
'xdp-rx-metadata-features': set(),
'xsk-features': set()},
Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag")
Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107183527.676877-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The kTLS tx handling code is using a mix of get_page() and
page_ref_inc() APIs to increment the page reference. But on the release
path (mlx5e_ktls_tx_handle_resync_dump_comp()), only put_page() is used.
This is an issue when using pages from large folios: the get_page()
references are stored on the folio page while the page_ref_inc()
references are stored directly in the given page. On release the folio
page will be dereferenced too many times.
This was found while doing kTLS testing with sendfile() + ZC when the
served file was read from NFS on a kernel with NFS large folios support
(commit 49b29a573da8 ("nfs: add support for large folios")).
Fixes: 84d1bb2b139e ("net/mlx5e: kTLS, Limit DUMP wqe size")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107183527.676877-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The referenced commits introduced a two-step process for deleting FTEs:
- Lock the FTE, delete it from hardware, set the hardware deletion function
to NULL and unlock the FTE.
- Lock the parent flow group, delete the software copy of the FTE, and
remove it from the xarray.
However, this approach encounters a race condition if a rule with the same
match value is added simultaneously. In this scenario, fs_core may set the
hardware deletion function to NULL prematurely, causing a panic during
subsequent rule deletions.
To prevent this, ensure the active flag of the FTE is checked under a lock,
which will prevent the fs_core layer from attaching a new steering rule to
an FTE that is in the process of deletion.
[ 438.967589] MOSHE: 2496 mlx5_del_flow_rules del_hw_func
[ 438.968205] ------------[ cut here ]------------
[ 438.968654] refcount_t: decrement hit 0; leaking memory.
[ 438.969249] WARNING: CPU: 0 PID: 8957 at lib/refcount.c:31 refcount_warn_saturate+0xfb/0x110
[ 438.970054] Modules linked in: act_mirred cls_flower act_gact sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core zram zsmalloc fuse [last unloaded: cls_flower]
[ 438.973288] CPU: 0 UID: 0 PID: 8957 Comm: tc Not tainted 6.12.0-rc1+ #8
[ 438.973888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 438.974874] RIP: 0010:refcount_warn_saturate+0xfb/0x110
[ 438.975363] Code: 40 66 3b 82 c6 05 16 e9 4d 01 01 e8 1f 7c a0 ff 0f 0b c3 cc cc cc cc 48 c7 c7 10 66 3b 82 c6 05 fd e8 4d 01 01 e8 05 7c a0 ff <0f> 0b c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90
[ 438.976947] RSP: 0018:ffff888124a53610 EFLAGS: 00010286
[ 438.977446] RAX: 0000000000000000 RBX: ffff888119d56de0 RCX: 0000000000000000
[ 438.978090] RDX: ffff88852c828700 RSI: ffff88852c81b3c0 RDI: ffff88852c81b3c0
[ 438.978721] RBP: ffff888120fa0e88 R08: 0000000000000000 R09: ffff888124a534b0
[ 438.979353] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888119d56de0
[ 438.979979] R13: ffff888120fa0ec0 R14: ffff888120fa0ee8 R15: ffff888119d56de0
[ 438.980607] FS: 00007fe6dcc0f800(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000
[ 438.983984] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 438.984544] CR2: 00000000004275e0 CR3: 0000000186982001 CR4: 0000000000372eb0
[ 438.985205] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 438.985842] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 438.986507] Call Trace:
[ 438.986799] <TASK>
[ 438.987070] ? __warn+0x7d/0x110
[ 438.987426] ? refcount_warn_saturate+0xfb/0x110
[ 438.987877] ? report_bug+0x17d/0x190
[ 438.988261] ? prb_read_valid+0x17/0x20
[ 438.988659] ? handle_bug+0x53/0x90
[ 438.989054] ? exc_invalid_op+0x14/0x70
[ 438.989458] ? asm_exc_invalid_op+0x16/0x20
[ 438.989883] ? refcount_warn_saturate+0xfb/0x110
[ 438.990348] mlx5_del_flow_rules+0x2f7/0x340 [mlx5_core]
[ 438.990932] __mlx5_eswitch_del_rule+0x49/0x170 [mlx5_core]
[ 438.991519] ? mlx5_lag_is_sriov+0x3c/0x50 [mlx5_core]
[ 438.992054] ? xas_load+0x9/0xb0
[ 438.992407] mlx5e_tc_rule_unoffload+0x45/0xe0 [mlx5_core]
[ 438.993037] mlx5e_tc_del_fdb_flow+0x2a6/0x2e0 [mlx5_core]
[ 438.993623] mlx5e_flow_put+0x29/0x60 [mlx5_core]
[ 438.994161] mlx5e_delete_flower+0x261/0x390 [mlx5_core]
[ 438.994728] tc_setup_cb_destroy+0xb9/0x190
[ 438.995150] fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
[ 438.995650] fl_change+0x11a4/0x13c0 [cls_flower]
[ 438.996105] tc_new_tfilter+0x347/0xbc0
[ 438.996503] ? ___slab_alloc+0x70/0x8c0
[ 438.996929] rtnetlink_rcv_msg+0xf9/0x3e0
[ 438.997339] ? __netlink_sendskb+0x4c/0x70
[ 438.997751] ? netlink_unicast+0x286/0x2d0
[ 438.998171] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 438.998625] netlink_rcv_skb+0x54/0x100
[ 438.999020] netlink_unicast+0x203/0x2d0
[ 438.999421] netlink_sendmsg+0x1e4/0x420
[ 438.999820] __sock_sendmsg+0xa1/0xb0
[ 439.000203] ____sys_sendmsg+0x207/0x2a0
[ 439.000600] ? copy_msghdr_from_user+0x6d/0xa0
[ 439.001072] ___sys_sendmsg+0x80/0xc0
[ 439.001459] ? ___sys_recvmsg+0x8b/0xc0
[ 439.001848] ? generic_update_time+0x4d/0x60
[ 439.002282] __sys_sendmsg+0x51/0x90
[ 439.002658] do_syscall_64+0x50/0x110
[ 439.003040] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: 718ce4d601db ("net/mlx5: Consolidate update FTE for all removal changes")
Fixes: cefc23554fc2 ("net/mlx5: Fix FTE cleanup")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107183527.676877-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The number of PCI vectors allocated by the platform (which may be fewer
than requested) is currently not honored when creating the SF pool;
only the PCI MSI-X capability is considered.
As a result, when a platform allocates fewer vectors
(in non-dynamic mode) than requested, the PF and SF pools end up
with an invalid vector range.
This causes incorrect SF vector accounting, which leads to the
following call trace when an invalid IRQ vector is allocated.
This issue is resolved by ensuring that the platform's vector
limit is respected for both the SF and PF pools.
Workqueue: mlx5_vhca_event0 mlx5_sf_dev_add_active_work [mlx5_core]
RIP: 0010:pci_irq_vector+0x23/0x80
RSP: 0018:ffffabd5cebd7248 EFLAGS: 00010246
RAX: ffff980880e7f308 RBX: ffff9808932fb880 RCX: 0000000000000001
RDX: 00000000000001ff RSI: 0000000000000200 RDI: ffff980880e7f308
RBP: 0000000000000200 R08: 0000000000000010 R09: ffff97a9116f0860
R10: 0000000000000002 R11: 0000000000000228 R12: ffff980897cd0160
R13: 0000000000000000 R14: ffff97a920fec0c0 R15: ffffabd5cebd72d0
FS: 0000000000000000(0000) GS:ffff97c7ff9c0000(0000) knlGS:0000000000000000
? rescuer_thread+0x350/0x350
kthread+0x11b/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x22/0x30
mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22
mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22
mlx5_core.sf mlx5_core.sf.6: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
mlx5_core.sf mlx5_core.sf.7: firmware version: 32.43.356
mlx5_core.sf mlx5_core.sf.6 enpa1s0f0s4: renamed from eth0
mlx5_core.sf mlx5_core.sf.7: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps
mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22
mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22
mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22
Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107183527.676877-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
IB representors depend on ETH representors, so the IB representors
should not exist without the ETH ones. When unloading the ETH
representors, the corresponding IB representors should be also
unloaded.
The commit 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions")
introduced the use of the ib_device_set_netdev API in IB
repsresentors. ib_device_set_netdev() increments the refcount of
the representor's netdev when loading an IB representor and
decrements it when unloading.
Without the unloading of the IB representor, the refcount of the
representor's netdev remains greater than 0, preventing it from
being unregistered.
The patch uncovered an underlying bug where the eth representor is
unloaded, without unloading the IB representor.
This issue happened when using multiport E-switch and rebooting,
causing the shutdown to hang when unloading the ETH representor
because the refcount of the representor's netdevice was greater than 0.
Call trace:
unregister_netdevice: waiting for eth3 to become free. Usage count = 2
ref_tracker: eth%d@00000000661d60f7 has 1/1 users at
ib_device_set_netdev+0x160/0x2d0 [ib_core]
mlx5_ib_vport_rep_load+0x104/0x3f0 [mlx5_ib]
mlx5_eswitch_reload_ib_reps+0xfc/0x110 [mlx5_core]
mlx5_mpesw_work+0x236/0x330 [mlx5_core]
process_one_work+0x169/0x320
worker_thread+0x288/0x3a0
kthread+0xb8/0xe0
ret_from_fork+0x2d/0x50
ret_from_fork_asm+0x11/0x20
Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions")
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107183527.676877-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that we have reduced the indentation level, clean up the code
formatting.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1t9RQz-002Ff5-EA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The switch() statement doesn't sit very well with the preceeding if()
statements, so let's just convert everything to if()s. As a result of
the two preceding commits, there is now only one case in the switch()
statement. Remove the switch statement and reduce the code indentation.
Code reformatting will be in the following commit.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1t9RQu-002Fez-AA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The switch() statement doesn't sit very well with the preceeding if()
statements, and results in excessive indentation that spoils code
readability. Continue cleaning this up by converting the MLO_AN_PHY
case to use an if() statmeent.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1t9RQp-002Fet-5W@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The switch() statement doesn't sit very well with the preceeding if()
statements, and results in excessive indentation that spoils code
readability. Begin cleaning this up by converting the MLO_AN_FIXED case
to an if() statement.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1t9RQk-002Fen-1A@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the handling of manual flow control configuration to a common
location during resolve. We currently evaluate this for all but
fixed links.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1t9RQe-002Feh-T1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Serialization of PHC read with FW reset mechanism uses ptp_lock which
also protects timecounter updates. This means we cannot grab it when
called from bnxt_cc_read(). Let's move locking into different function.
Fixes: 6c0828d00f07 ("bnxt_en: replace PTP spinlock with seqlock")
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20241107214917.2980976-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For per-netns RTNL, we need to prefetch the peer device's netns.
Let's set rtnl_link_ops.peer_type and accordingly remove duplicated
validation in ->newlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20241108004823.29419-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For per-netns RTNL, we need to prefetch the peer device's netns.
Let's set rtnl_link_ops.peer_type and accordingly remove duplicated
validation in ->newlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20241108004823.29419-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For per-netns RTNL, we need to prefetch the peer device's netns.
Let's set rtnl_link_ops.peer_type and accordingly remove duplicated
validation in ->newlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20241108004823.29419-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
link_ops is protected by link_ops_mutex and no longer needs RTNL,
so we have no reason to have __rtnl_link_register() separately.
Let's remove it and call rtnl_link_register() from ifb.ko and
dummy.ko.
Note that both modules' init() work on init_net only, so we need
not export pernet_ops_rwsem and can use rtnl_net_lock() there.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241108004823.29419-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
rtnl_link_unregister() holds RTNL and calls __rtnl_link_unregister(),
where we call synchronize_srcu() to wait inflight RTM_NEWLINK requests
for per-netns RTNL.
We put synchronize_srcu() in __rtnl_link_unregister() due to ifb.ko
and dummy.ko.
However, rtnl_newlink() will acquire SRCU before RTNL later in this
series. Then, lockdep will detect the deadlock:
rtnl_link_unregister() rtnl_newlink()
---- ----
lock(rtnl_mutex);
lock(&ops->srcu);
lock(rtnl_mutex);
sync(&ops->srcu);
To avoid the problem, we must call synchronize_srcu() before RTNL in
rtnl_link_unregister().
As a preparation, let's remove __rtnl_link_unregister().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241108004823.29419-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use recently added helper r8169_mod_reg8_cond() to simplify jumbo
mode configuration.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/3df1d484-a02e-46e7-8f75-db5b428e422e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
summary
The Receive Watchdog Timeout (RWT, bit[9]) is not part of Abnormal
Interrupt Summary (AIS). Move the RWT handling out of the AIS
condition statement.
From databook, the AIS is the logical OR of the following interrupt bits:
- Bit 1: Transmit Process Stopped
- Bit 7: Receive Buffer Unavailable
- Bit 8: Receive Process Stopped
- Bit 10: Early Transmit Interrupt
- Bit 12: Fatal Bus Error
- Bit 13: Context Descriptor Error
Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241107063637.2122726-4-leyfoon.tan@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In order to mask off the bits, we need to use the '~' operator to invert
all the bits of _MASK and clear them.
Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241107063637.2122726-3-leyfoon.tan@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RTC fields are located in bits [1:0]. Correct the _MASK and _SHIFT
macros to use the appropriate mask and shift.
Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241107063637.2122726-2-leyfoon.tan@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for configuring MDI-X state of PHY.
Add reporting of resolved MDI-X state in status information.
Tested on AQR113C.
Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz>
Link: https://patch.msgid.link/20241106222057.3965379-1-paul.davey@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for VLAN addition/deletion in HSR mode.
In HSR mode, even if the host port is not a member of
the VLAN domain, the slave ports should simply forward the
frames. So allow forwarding of all VLAN frames in HSR mode.
Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20241106091710.3308519-4-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce ksz_parse_dt_phy_config() to validate and parse PHY
configuration from the device tree for KSZ switches. This function
ensures proper setup of internal PHYs by checking `phy-handle`
properties, verifying expected PHY IDs, and handling parent node
mismatches. Sets the PHY mask on the MII bus if validation is
successful. Returns -EINVAL on configuration errors.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241106075942.1636998-7-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement side MDIO channel support for LAN937x switches, providing an
alternative to SPI for PHY management alongside existing SPI-based
switch configuration. This is needed to reduce SPI load, as SPI can be
relatively expensive for small packets compared to MDIO support.
Also, implemented static mappings for PHY addresses for various LAN937x
models to support different internal PHY configurations. Since the PHY
address mappings are not equal to the port indexes, this patch also
provides PHY address calculation based on hardware strapping
configuration.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241106075942.1636998-6-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace repeated cleanup code with a single error path using a label.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241106075942.1636998-5-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for accessing PHYs via a side MDIO interface in LAN937x
switches. The existing code already supports accessing PHYs via main
management interfaces, which can be SPI, I2C, or MDIO, depending on the
chip variant. This patch enables using a side MDIO bus, where SPI is
used for the main switch configuration and MDIO for managing the
integrated PHYs. On LAN937x, this is optional, allowing them to operate
in both configurations: SPI only, or SPI + MDIO. Typically, the SPI
interface is used for switch configuration, while MDIO handles PHY
management.
Additionally, update interrupt controller code to support non-linear
port to PHY address mapping, enabling correct interrupt handling for
configurations where PHY addresses do not directly correspond to port
indexes. This change ensures that the interrupt mechanism properly
aligns with the new, flexible PHY address mappings introduced by side
MDIO support.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241106075942.1636998-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint()
instead. This removes the side-effect of actually applying the affinity.
The driver does not really need to worry about spreading its IRQs across
CPUs. The core code already takes care of that. when the driver applies the
affinities by itself, it breaks the users' expectations:
1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in
order to prevent IRQs from being moved to certain CPUs that run a
real-time workload.
2. atlantic device reopening will resets the affinity
in aq_ndev_open().
3. atlantic has no idea about irqbalance's config, so it may move an IRQ to
a banned CPU. The real-time workload suffers unacceptable latency.
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241107120739.415743-1-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint()
instead. This removes the side-effect of actually applying the affinity.
The driver does not really need to worry about spreading its IRQs across
CPUs. The core code already takes care of that. when the driver applies the
affinities by itself, it breaks the users' expectations:
1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in
order to prevent IRQs from being moved to certain CPUs that run a
real-time workload.
2. nfp device reopening will resets the affinity
in nfp_net_netdev_open().
3. nfp has no idea about irqbalance's config, so it may move an IRQ to
a banned CPU. The real-time workload suffers unacceptable latency.
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://patch.msgid.link/20241107115002.413358-1-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint()
instead. This removes the side-effect of actually applying the affinity.
The driver does not really need to worry about spreading its IRQs across
CPUs. The core code already takes care of that. when the driver applies the
affinities by itself, it breaks the users' expectations:
1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in
order to prevent IRQs from being moved to certain CPUs that run a
real-time workload.
2. bnxt_en device reopening will resets the affinity
in bnxt_open().
3. bnxt_en has no idea about irqbalance's config, so it may move an IRQ to
a banned CPU. The real-time workload suffers unacceptable latency.
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20241106180811.385175-1-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add devlink knobs to enable/disable counters on NPC
default rule entries.
Sample command to enable default rule counters:
devlink dev param set <dev> name npc_def_rule_cntr value true cmode runtime
Sample command to read the counter:
cat /sys/kernel/debug/cn10k/npc/mcam_rules
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Link: https://patch.msgid.link/20241105125620.2114301-3-lcherian@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce lowlevel variant of rvu_mcam_remove/add_counter_from/to_rule
for better code reuse, which assumes necessary locks are taken at
higher level.
These low level functions would be used for implementing default rule
counter APIs in the subsequent patch.
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241105125620.2114301-2-lcherian@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|