Age | Commit message (Collapse) | Author |
|
Add support for port police. It is possible to police only on the
ingress side. To be able to add police support also it was required to
add tc-matchall classifier offload support.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch optimizes the RX buffer management by using the page
pool. The purpose for this change is to prepare for the following
XDP support. The current driver uses one frame per page for easy
management.
Added __maybe_unused attribute to the following functions to avoid
the compiling warning. Those functions will be removed by a separate
patch once this page pool solution is accepted.
- fec_enet_new_rxbdp
- fec_enet_copybreak
The following are the comparing result between page pool implementation
and the original implementation (non page pool).
--- small packet (64 bytes) testing are almost the same
--- no matter what the implementation is
--- on both i.MX8 and i.MX6SX platforms.
shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 -l 64
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 39728 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 37.0 MBytes 311 Mbits/sec
[ 1] 1.0000-2.0000 sec 36.6 MBytes 307 Mbits/sec
[ 1] 2.0000-3.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 3.0000-4.0000 sec 37.1 MBytes 312 Mbits/sec
[ 1] 4.0000-5.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 5.0000-6.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 6.0000-7.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 7.0000-8.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 0.0000-8.0943 sec 299 MBytes 310 Mbits/sec
--- Page Pool implementation on i.MX8 ----
shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 43204 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 1.0000-2.0000 sec 111 MBytes 934 Mbits/sec
[ 1] 2.0000-3.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 3.0000-4.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 4.0000-5.0000 sec 111 MBytes 934 Mbits/sec
[ 1] 5.0000-6.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 6.0000-7.0000 sec 111 MBytes 931 Mbits/sec
[ 1] 7.0000-8.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 8.0000-9.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 9.0000-10.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 0.0000-10.0077 sec 1.09 GBytes 933 Mbits/sec
--- Non Page Pool implementation on i.MX8 ----
shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 49154 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 104 MBytes 868 Mbits/sec
[ 1] 1.0000-2.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 2.0000-3.0000 sec 105 MBytes 881 Mbits/sec
[ 1] 3.0000-4.0000 sec 105 MBytes 879 Mbits/sec
[ 1] 4.0000-5.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 5.0000-6.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 6.0000-7.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 7.0000-8.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 8.0000-9.0000 sec 104 MBytes 873 Mbits/sec
[ 1] 9.0000-10.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 0.0000-10.0073 sec 1.02 GBytes 875 Mbits/sec
--- Page Pool implementation on i.MX6SX ----
shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 57288 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 78.8 MBytes 661 Mbits/sec
[ 1] 1.0000-2.0000 sec 82.5 MBytes 692 Mbits/sec
[ 1] 2.0000-3.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 3.0000-4.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 4.0000-5.0000 sec 82.5 MBytes 692 Mbits/sec
[ 1] 5.0000-6.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 6.0000-7.0000 sec 82.5 MBytes 692 Mbits/sec
[ 1] 7.0000-8.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 8.0000-9.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 9.0000-9.5506 sec 45.0 MBytes 686 Mbits/sec
[ 1] 0.0000-9.5506 sec 783 MBytes 688 Mbits/sec
--- Non Page Pool implementation on i.MX6SX ----
shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 36486 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 70.5 MBytes 591 Mbits/sec
[ 1] 1.0000-2.0000 sec 64.5 MBytes 541 Mbits/sec
[ 1] 2.0000-3.0000 sec 73.6 MBytes 618 Mbits/sec
[ 1] 3.0000-4.0000 sec 73.6 MBytes 618 Mbits/sec
[ 1] 4.0000-5.0000 sec 72.9 MBytes 611 Mbits/sec
[ 1] 5.0000-6.0000 sec 73.4 MBytes 616 Mbits/sec
[ 1] 6.0000-7.0000 sec 73.5 MBytes 617 Mbits/sec
[ 1] 7.0000-8.0000 sec 73.4 MBytes 616 Mbits/sec
[ 1] 8.0000-9.0000 sec 73.4 MBytes 616 Mbits/sec
[ 1] 9.0000-10.0000 sec 73.9 MBytes 620 Mbits/sec
[ 1] 0.0000-10.0174 sec 723 MBytes 605 Mbits/sec
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bnx2x_tpa_stop() allocates a memory chunk from new_data with
bnx2x_frag_alloc(). The new_data should be freed when gets some error.
But when "pad + len > fp->rx_buf_size" is true, bnx2x_tpa_stop() returns
without releasing the new_data, which will lead to a memory leak.
We should free the new_data with bnx2x_frag_free() when "pad + len >
fp->rx_buf_size" is true.
Fixes: 07b0f00964def8af9321cfd6c4a7e84f6362f728 ("bnx2x: fix possible panic under memory stress")
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove PTP_PF_EXTTS support for non-PCI11x1x devices since they do not support
the PTP-IO Input event triggered timestamping mechanisms added
Fixes: 60942c397af6 ("net: lan743x: Add support for PTP-IO Event Input External Timestamp (extts)")
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As the kemdup could return NULL, it should be better to check the return
value and return error if fails.
Moreover, the return value of prestera_acl_ruleset_keymask_set() should
be checked by cascade.
Fixes: 604ba230902d ("net: prestera: flower template support")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Taras Chornyi<tchornyi@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds support for multigig copper SFP modules from RollBall/Hilink.
These modules have a specific way to access clause 45 registers of the
internal PHY.
We also need to wait at least 22 seconds after deasserting TX disable
before accessing the PHY. The code waits for 25 seconds just to be sure.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some multigig SFPs from RollBall and Hilink do not expose functional
MDIO access to the internal PHY of the SFP via I2C address 0x56
(although there seems to be read-only clause 22 access on this address).
Instead these SFPs PHY can be accessed via I2C via the SFP Enhanced
Digital Diagnostic Interface - I2C address 0x51. The SFP_PAGE has to be
selected to 3 and the password must be filled with 0xff bytes for this
PHY communication to work.
This extends the mdio-i2c driver to support this protocol by adding a
special parameter to mdio_i2c_alloc function via which this RollBall
protocol can be selected.
Signed-off-by: Marek Behún <kabel@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of configuring the I2C mdiobus when SFP driver is probed,
create/destroy the mdiobus before the PHY is probed for/after it is
released.
This way we can tell the mdio-i2c code which protocol to use for each
SFP transceiver.
Move the code that determines MDIO I2C protocol from
sfp_sm_probe_for_phy() to sfp_sm_mod_probe(), where most of the SFP ID
parsing is done. Don't allocate I2C bus if no PHY is expected.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add macros SFP_QUIRK(), SFP_QUIRK_M() and SFP_QUIRK_F() for defining SFP
quirk table entries. Use them to deduplicate the code a little bit.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some SFPs may contain an internal PHY which may in some cases want to
connect with the host interface in 1000base-x/2500base-x mode.
Do not fail if such PHY is being attached in one of these PHY interface
modes.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Pali Rohár <pali@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Select the host interface configuration according to the capabilities of
the host if the host provided them. This is currently provided only when
connecting PHY that is inside a SFP.
The PHY supports several configurations of host communication:
- always communicate with host in 10gbase-r, even if copper speed is
lower (rate matching mode),
- the same as above but use xaui/rxaui instead of 10gbase-r,
- switch host SerDes mode between 10gbase-r, 5gbase-r, 2500base-x and
sgmii according to copper speed,
- the same as above but use xaui/rxaui instead of 10gbase-r.
This mode of host communication, called MACTYPE, is by default selected
by strapping pins, but it can be changed in software.
This adds support for selecting this mode according to which modes are
supported by the host.
This allows the kernel to:
- support SFP modules with 88X33X0 or 88E21X0 inside them
Note: we use mv3310_select_mactype() for both 88X3310 and 88X3340,
although 88X3340 does not support XAUI. This is not a problem because
88X3340 does not declare XAUI in it's supported_interfaces, and so this
function will never choose that MACTYPE.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
[ rebase, updated, also added support for 88E21X0 ]
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some register definitions were defined with spaces used for indentation.
Change them to tabs.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass the supported PHY interface types to phylib if the PHY we are
connecting is inside a SFP, so that the PHY driver can select an
appropriate host configuration mode for their interface according to
the host capabilities.
For example the Marvell 88X3310 PHY inside RollBall SFP modules
defaults to 10gbase-r mode on host's side, and the marvell10g
driver currently does not change this setting. But a host may not
support 10gbase-r. For example Turris Omnia only supports sgmii,
1000base-x and 2500base-x modes. The PHY can be configured to use
those modes, but in order for the PHY driver to do that, it needs
to know which modes are supported.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
phylink_sfp_config() now only deals with configuring the MAC for a
SFP containing a PHY. Rename it to be specific.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Where a MAC provides a phy_interface_t bitmap, use these bitmaps to
select the operating interface mode for optical SFP modules, rather
than using the linkmode bitmaps.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We currently parse the SFP EEPROM to a bitmap of ethtool link modes,
and then attempt to convert the link modes to a PHY interface mode.
While this works at present, there are cases where this is sub-optimal.
For example, where a module can operate with several different PHY
interface modes.
To start addressing this, arrange for the SFP EEPROM parsing to also
provide a bitmap of the possible PHY interface modes.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rather than having the ability to validate all supported interface
modes or a single interface mode, introduce the ability to validate
a subset of supported modes.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[ rebased on current net-next ]
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).
The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.
The return type of sparx5_port_xmit_impl should be changed from int to
netdev_tx_t.
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the initial implementation of XSK in mlx5e, XSK RQs coexisted with
regular RQs in the same channel. The main idea was to allow RSS work the
same for regular traffic, without need to reconfigure RSS to exclude XSK
queues.
However, this scheme didn't prove to be beneficial, mainly because of
incompatibility with other vendors. Some tools don't properly support
using higher indices for XSK queues, some tools get confused with the
double amount of RQs exposed in sysfs. Some use cases are purely XSK,
and allocating the same amount of unused regular RQs is a waste of
resources.
This commit changes the queuing scheme to the standard one, where XSK
RQs replace regular RQs on the channels where XSK sockets are open. Two
RQs still exist in the channel to allow failsafe disable of XSK, but
only one is exposed at a time. The next commit will achieve the desired
memory save by flushing the buffers when the regular RQ is unused.
As the result of this transition:
1. It's possible to use RSS contexts over XSK RQs.
2. It's possible to dedicate all queues to XSK.
3. When XSK RQs coexist with regular RQs, the admin should make sure no
unwanted traffic goes into XSK RQs by either excluding them from RSS or
settings up the XDP program to return XDP_PASS for non-XSK traffic.
4. When using a mixed fleet of mlx5e devices and other netdevs, the same
configuration can be applied. If the application supports the fallback
to copy mode on unsupported drivers, it will work too.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a function to flush an RQ: clean up descriptors, release pages and
reset the RQ. This procedure is used by the recovery flow, and it will
also be used in a following commit to free some memory when switching a
channel to the XSK mode.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for XDP metadata on XSK RQs for cross-program
communication. The driver no longer calls xdp_set_data_meta_invalid and
copies the metadata to a newly allocated SKB on XDP_PASS.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5e_free_rx_mpwqe loops over all pages of a MPWQE, calling
mlx5e_page_release for ones that are not scheduled for XDP_TX or
XDP_REDIRECT; and mlx5e_page_release checks whether it's an XSK RQ or a
regular one for each page/XSK frame. This check can be moved outside the
loop to reduce the number of branches.
mlx5e_free_rx_wqe loops over all fragments, calling mlx5e_page_release
for the ones that are last in a page; and mlx5e_page_release checks
whether it's an XSK RQ or a regular one for each fragment. Using the
fact that XSK doesn't support multiple fragments, it can be optimized
for both XSK and regular usages:
1. Make an early check for XSK and call its deallocator directly, saving
3 branches (loop condition, frag->last_in_page and selection of
deallocator).
2. Call the regular deallocator directly in the non-XSK case, saving a
branch per fragment, except the first one.
After the changes, mlx5e_page_release is removed, as there are no
callers left.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5e_page_release calls the appropriate deallocator depending on
whether it's an XSK RQ or a regular one. Some flows that call this
function are not compatible with XSK, so they can call the non-XSK
deallocator directly to save a branch.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The SHAMPO flow is not compatible with XSK, it can call the page pool
allocator directly to save a branch.
mlx5e_page_alloc is removed, as it's no longer used in any flow.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
XSK provides a function to allocate frames in batches for more efficient
processing. This commit starts using this function on striding RQ and
creates an optimized flow for XSK. A side effect is an opportunity to
optimize the regular RX flow by dropping branching for XSK cases.
Performance improvement is up to 6.4% in the aligned mode and up to 7.5%
in the unaligned mode.
Aligned mode, 2048-byte frames: 12.9 Mpps -> 13.8 Mpps
Aligned mode, 4096-byte frames: 11.8 Mpps -> 12.5 Mpps
Unaligned mode, 2048-byte frames: 11.9 Mpps -> 12.8 Mpps
Unaligned mode, 3072-byte frames: 11.4 Mpps -> 12.1 Mpps
Unaligned mode, 4096-byte frames: 11.0 Mpps -> 11.2 Mpps
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
XSK provides a function to allocate frames in batches for more efficient
processing. This commit starts using this function on legacy RQ, adding
a special case for XSK. The new branch introduced basically replaces the
branch that was removed from the same place a few commits before.
A check is made that DMA sync is not needed, because the batching
allocator falls back to returning one frame when DMA sync is needed, and
this is best handled by the loop in the standard case.
Performance improvement is up to 8% in the aligned mode and up to 9% in
the unaligned mode.
Aligned mode, 2048-byte frames: 12.8 Mpps -> 13.5 Mpps
Aligned mode, 4096-byte frames: 11.5 Mpps -> 12.4 Mpps
Unaligned mode, 2048-byte frames: 12.2 Mpps -> 13.4 Mpps
Unaligned mode, 3072-byte frames: 11.6 Mpps -> 12.5 Mpps
Unaligned mode, 4096-byte frames: 11.2 Mpps -> 12.2 Mpps
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allocation of XSK frames on legacy RQ may be made more efficient with a
specialized routine that relies on certain assumptions, such as there is
only one fragment, allocation units (XSK frames) are not shared among
multiple packets. It reduces the number of branches both in the XSK code
and in the regular RQ, because with this approach there is only a single
check whether it's an XSK or regular RQ.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Legacy RQ WQEs are allocated in a loop in small batches (8 WQEs). As
partial batches are allowed, there is no point to have a loop in a loop,
so the outer loop is removed, and the batch size is increased up to the
total number of WQEs to allocate, still not smaller than 8.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The previous commit allowed allocating WQE batches in legacy RQ
partially, however, XSK still checks whether there are enough frames in
the fill ring. Remove this check to allow to allocate batches partially
also with XSK.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Legacy RQ allocates WQEs in batches. If the batch allocation fails, the
pages of the allocated part are released. This commit changes this
behavior to allow to use the pages that have been already allocated.
After this change, we need to be careful about indexing rq->wqe.frags[].
The WQ size is a power of two that divides by wqe_bulk (8), and the old
code used whole bulks, which allowed to use indices [8*K; 8*K+7] without
overflowing. Now that the bulks may be partial, the range can start at
any location (not only at 8*K), so we need to wrap them around to avoid
out-of-bounds array access.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The old calculation of wqe_index_mask may give false positives, i.e.
request bulking of pairs of WQEs when not strictly needed, for example,
when the first fragment size is equal to the PAGE_SIZE, bulking is not
needed, even if the number of fragments is odd.
Make the calculation more exact to cut false positives.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When fragments of different WQEs share the same page, mlx5e_post_rx_wqes
must wait until the old WQE stops using the page, only then the new WQE
can allocate the new page. Essentially, it means that if WQE index i is
still in use, the allocation must stop before `i % bulk`, where bulk is
the number of WQEs that may share the same page.
As bulk is always a power of two, `i % bulk = i & (bulk - 1)`, and the
new wqe_index_mask field will be equal to `bulk - 1`.
At the same time, wqe_bulk remains for optimization purposes and stores
`max(bulk, 8)`, which allows to skip the allocation until we have at
least 8 WQEs free.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MLX5E_CHANNEL_STATE_XSK flag checked in mlx5e_xsk_wakeup indicates
that XSK queues are open, but not necessarily activated. This check is
not very useful, because:
0. Both XSK setup and netdev state transitions take the same state_lock
mutex, so they can't happen at the same time.
1. If the netdev is up, xsk_is_bound can return true only when
MLX5E_CHANNEL_STATE_XSK is set on the corresponding channel.
mlx5e_xsk_wakeup is only called when xsk_is_bound is true.
2. If the XSK socket is bound, and the netdev is going up or down,
mlx5e_xsk_wakeup can take one of two branches, depending on the return
value of napi_if_scheduled_mark_missed:
2.1. True means one of two things: either NAPI was enabled at this
point, which means MLX5E_CHANNEL_STATE_XSK was also set; or NAPI was
disabled, and nothing really happened.
2.2. False means that NAPI was enabled by this point, which also implies
MLX5E_CHANNEL_STATE_XSK was set. Additionally, mlx5e_xsk_wakeup contains
a following check for MLX5E_SQ_STATE_ENABLED on async_icosq, and this
flag implies MLX5E_CHANNEL_STATE_XSK too on XSK channels.
As checking this flag doesn't cut any flows, remove the check.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5e_xsk_wakeup triggers an IRQ by posting a NOP to async_icosq, taking
a spinlock to protect from concurrent access. There is already a
function that does the same: mlx5e_trigger_napi_icosq. Use this function
in mlx5e_xsk_wakeup.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Setting ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear
routine as done by commit 0e80707d94e4c8 ("net: ethernet: mtk_eth_soc:
fix typo in __mtk_foe_entry_clear") breaks flow offloading, at least
on older MTK_NETSYS_V1 SoCs, OpenWrt users have confirmed the bug on
MT7622 and MT7621 systems.
Felix Fietkau suggested to use MTK_FOE_STATE_INVALID instead which
works well on both, MTK_NETSYS_V1 and MTK_NETSYS_V2.
Tested on MT7622 (Linksys E8450) and MT7986 (BananaPi BPI-R3).
Suggested-by: Felix Fietkau <nbd@nbd.name>
Fixes: 0e80707d94e4c8 ("net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear")
Fixes: 33fc42de33278b ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/YzY+1Yg0FBXcnrtc@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support restart of link auto-negotiation.
This may be initiated using:
# ethtool -r <intf>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Report the auto negotiation capability if it's supported
in management firmware, and advertise it if it's enabled.
Changing port speed is not allowed when autoneg is enabled.
The ethtool <intf> command displays the auto-neg capability:
# ethtool enp1s0np0
Settings for enp1s0np0:
Supported ports: [ FIBRE ]
Supported link modes: Not reported
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: None RS BASER
Advertised link modes: Not reported
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: None RS BASER
Speed: 25000Mb/s
Duplex: Full
Auto-negotiation: on
Port: FIBRE
PHYAD: 0
Transceiver: internal
Link detected: yes
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Considering that whether application firmware is indifferent to
port speed is a firmware property instead of port property, now use
a new rtsym to get the property instead of parsing per-port tlv caps.
With this change, relevant code is moved to `nfp_main` layer.
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It's not a fatal error when setting `hwinfo` into management firmware
fails, no need to halt the whole driver initialization process.
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The latest management firmware can now report the active FEC
mode. Adapt driver accordingly so that user can get the active
FEC mode by running command:
# ethtool --show-fec <intf>
Also correct use of `fec` field.
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
rtl_disable_rxdvgate() is used for disable RXDV_GATE. It is opposite function
of rtl_enable_rxdvgate().
Disable RXDV_GATE does not have to delay. So in this patch, also remove the
delay after disale RXDV_GATE.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Link: https://lore.kernel.org/r/20220928171356.3951-1-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.1
Few stack changes and lots of driver changes in this round. brcmfmac
has more activity as usual and it gets new hardware support. ath11k
improves WCN6750 support and also other smaller features. And of
course changes all over.
Note: in early September wireless tree was merged to wireless-next to
avoid some conflicts with mac80211 patches, this shouldn't cause any
problems but wanted to mention anyway.
Major changes:
mac80211
- refactoring and preparation for Wi-Fi 7 Multi-Link Operation (MLO)
feature continues
brcmfmac
- support CYW43439 SDIO chipset
- support BCM4378 on Apple platforms
- support CYW89459 PCIe chipset
rtw89
- more work to get rtw8852c supported
- P2P support
- support for enabling and disabling MSDU aggregation via nl80211
mt76
- tx status reporting improvements
ath11k
- cold boot calibration support on WCN6750
- Target Wake Time (TWT) debugfs support for STA interface
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- implement SRAM dump debugfs interface
- enable threaded NAPI on all hardware
- WoW support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
wcn36xx
- add SNR from a received frame as a source of system entropy
* tag 'wireless-next-2022-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (231 commits)
wifi: rtl8xxxu: Improve rtl8xxxu_queue_select
wifi: rtl8xxxu: Fix AIFS written to REG_EDCA_*_PARAM
wifi: rtl8xxxu: gen2: Enable 40 MHz channel width
wifi: rtw89: 8852b: configure DLE mem
wifi: rtw89: check DLE FIFO size with reserved size
wifi: rtw89: mac: correct register of report IMR
wifi: rtw89: pci: set power cut closed for 8852be
wifi: rtw89: pci: add to do PCI auto calibration
wifi: rtw89: 8852b: implement chip_ops::{enable,disable}_bb_rf
wifi: rtw89: add DMA busy checking bits to chip info
wifi: rtw89: mac: define DMA channel mask to avoid unsupported channels
wifi: rtw89: pci: mask out unsupported TX channels
iwlegacy: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
ipw2x00: Replace zero-length array with DECLARE_FLEX_ARRAY() helper
wifi: iwlwifi: Track scan_cmd allocation size explicitly
brcmfmac: Remove the call to "dtim_assoc" IOVAR
brcmfmac: increase dcmd maximum buffer size
brcmfmac: Support 89459 pcie
brcmfmac: increase default max WOWL patterns to 16
cw1200: fix incorrect check to determine if no element is found in list
...
====================
Link: https://lore.kernel.org/r/20220930150413.A7984C433D6@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Although mlx5e_rq_free_shampo can be called unconditionally, it belongs
to case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ. Move it there to allow to
add more init/cleanup actions to the striding RQ case.
If xdp_rxq_info_reg_mem_model fails, don't forget to destroy the page
pool.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The same clear_bit is called in both error and success flows. Move the
call to do it only once and remove the out label.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To decrease the nesting level and reduce duplication of code, create
functions to redirect direct RQTs to the actual RQs or drop_rq, which
are used in the activation and deactivation flows of channels.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5e_xsk_page_alloc_pool became a thin wrapper around xsk_buff_alloc.
Drop it and call xsk_buff_alloc directly.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
struct mlx5e_alloc_unit consists of a single union. Convert it to a
union itself to simplify casting it to struct xdp_buff *, which will be
used to implement XSK batching on striding RQ.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5e_alloc_unit stores the DMA address and a pointer to either struct
page (regular RQ) or struct xdp_buff (XSK RQ). This DMA address is
redundant, because when a page or an XSK frame is allocated, the same
address is also stored there. Some flows take the address from struct
mlx5e_alloc_unit, and some take it from struct page or xdp_buff.
This commit removes the address from struct mlx5e_alloc_unit, which
makes it twice as small and improves locality (this struct is used in an
array), also saving on unnecessary stores to the addr field. Almost all
flows know unambiguously whether the DMA address should be taken from
page or from xdp_buff. The exception is the allocation flows, where a
new branch appeared, which will be optimized out in the next commits.
struct mlx5e_alloc_unit used to be called mlx5e_dma_info.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The next commit will remove the DMA address from the struct currently
called mlx5e_dma_info, because the same value can be retrieved with
page_pool_get_dma_addr(page) in almost all cases, with the notable
exception of SHAMPO (HW GRO implementation) that modifies this address
on the fly, after the initial allocation.
To keep the SHAMPO logic intact, struct mlx5e_dma_info remains in the
SHAMPO code, consisting of addr and page (XSK is not compatible with
SHAMPO). The struct used in all other places is renamed to
mlx5e_alloc_unit, allowing the next commit to remove the addr field
without affecting SHAMPO.
The new name means "allocation unit", and it's more appropriate after
the field with the DMA address gets removed.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RX page cache stores dma_info structs, that consist of a pointer to
struct page and a DMA address. In fact, the DMA address is extracted
from struct page using page_pool_get_dma_addr when a page is pushed to
the cache. By moving this call to the point when a page is popped from
the cache, we can avoid storing the DMA address in the cache,
effectively reducing its size by two times without losing any
functionality.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|