Age | Commit message (Collapse) | Author |
|
This frees "mac" and tries to display its address as part of the error
message on the next line. Swap the order.
Fixes: fd3040b9394c ("net: ethernet: Add driver for Sunplus SP7021")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for port mirroring. It is possible to mirror only one port
at a time and it is possible to have both ingress and egress mirroring.
Frames injected by the CPU don't get egress mirrored because they are
bypassing the analyzer module.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for port police. It is possible to police only on the
ingress side. To be able to add police support also it was required to
add tc-matchall classifier offload support.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch optimizes the RX buffer management by using the page
pool. The purpose for this change is to prepare for the following
XDP support. The current driver uses one frame per page for easy
management.
Added __maybe_unused attribute to the following functions to avoid
the compiling warning. Those functions will be removed by a separate
patch once this page pool solution is accepted.
- fec_enet_new_rxbdp
- fec_enet_copybreak
The following are the comparing result between page pool implementation
and the original implementation (non page pool).
--- small packet (64 bytes) testing are almost the same
--- no matter what the implementation is
--- on both i.MX8 and i.MX6SX platforms.
shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 -l 64
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 39728 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 37.0 MBytes 311 Mbits/sec
[ 1] 1.0000-2.0000 sec 36.6 MBytes 307 Mbits/sec
[ 1] 2.0000-3.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 3.0000-4.0000 sec 37.1 MBytes 312 Mbits/sec
[ 1] 4.0000-5.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 5.0000-6.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 6.0000-7.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 7.0000-8.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 0.0000-8.0943 sec 299 MBytes 310 Mbits/sec
--- Page Pool implementation on i.MX8 ----
shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 43204 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 1.0000-2.0000 sec 111 MBytes 934 Mbits/sec
[ 1] 2.0000-3.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 3.0000-4.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 4.0000-5.0000 sec 111 MBytes 934 Mbits/sec
[ 1] 5.0000-6.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 6.0000-7.0000 sec 111 MBytes 931 Mbits/sec
[ 1] 7.0000-8.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 8.0000-9.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 9.0000-10.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 0.0000-10.0077 sec 1.09 GBytes 933 Mbits/sec
--- Non Page Pool implementation on i.MX8 ----
shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 49154 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 104 MBytes 868 Mbits/sec
[ 1] 1.0000-2.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 2.0000-3.0000 sec 105 MBytes 881 Mbits/sec
[ 1] 3.0000-4.0000 sec 105 MBytes 879 Mbits/sec
[ 1] 4.0000-5.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 5.0000-6.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 6.0000-7.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 7.0000-8.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 8.0000-9.0000 sec 104 MBytes 873 Mbits/sec
[ 1] 9.0000-10.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 0.0000-10.0073 sec 1.02 GBytes 875 Mbits/sec
--- Page Pool implementation on i.MX6SX ----
shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 57288 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 78.8 MBytes 661 Mbits/sec
[ 1] 1.0000-2.0000 sec 82.5 MBytes 692 Mbits/sec
[ 1] 2.0000-3.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 3.0000-4.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 4.0000-5.0000 sec 82.5 MBytes 692 Mbits/sec
[ 1] 5.0000-6.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 6.0000-7.0000 sec 82.5 MBytes 692 Mbits/sec
[ 1] 7.0000-8.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 8.0000-9.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 9.0000-9.5506 sec 45.0 MBytes 686 Mbits/sec
[ 1] 0.0000-9.5506 sec 783 MBytes 688 Mbits/sec
--- Non Page Pool implementation on i.MX6SX ----
shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 36486 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 70.5 MBytes 591 Mbits/sec
[ 1] 1.0000-2.0000 sec 64.5 MBytes 541 Mbits/sec
[ 1] 2.0000-3.0000 sec 73.6 MBytes 618 Mbits/sec
[ 1] 3.0000-4.0000 sec 73.6 MBytes 618 Mbits/sec
[ 1] 4.0000-5.0000 sec 72.9 MBytes 611 Mbits/sec
[ 1] 5.0000-6.0000 sec 73.4 MBytes 616 Mbits/sec
[ 1] 6.0000-7.0000 sec 73.5 MBytes 617 Mbits/sec
[ 1] 7.0000-8.0000 sec 73.4 MBytes 616 Mbits/sec
[ 1] 8.0000-9.0000 sec 73.4 MBytes 616 Mbits/sec
[ 1] 9.0000-10.0000 sec 73.9 MBytes 620 Mbits/sec
[ 1] 0.0000-10.0174 sec 723 MBytes 605 Mbits/sec
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bnx2x_tpa_stop() allocates a memory chunk from new_data with
bnx2x_frag_alloc(). The new_data should be freed when gets some error.
But when "pad + len > fp->rx_buf_size" is true, bnx2x_tpa_stop() returns
without releasing the new_data, which will lead to a memory leak.
We should free the new_data with bnx2x_frag_free() when "pad + len >
fp->rx_buf_size" is true.
Fixes: 07b0f00964def8af9321cfd6c4a7e84f6362f728 ("bnx2x: fix possible panic under memory stress")
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove PTP_PF_EXTTS support for non-PCI11x1x devices since they do not support
the PTP-IO Input event triggered timestamping mechanisms added
Fixes: 60942c397af6 ("net: lan743x: Add support for PTP-IO Event Input External Timestamp (extts)")
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As the kemdup could return NULL, it should be better to check the return
value and return error if fails.
Moreover, the return value of prestera_acl_ruleset_keymask_set() should
be checked by cascade.
Fixes: 604ba230902d ("net: prestera: flower template support")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Taras Chornyi<tchornyi@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds support for multigig copper SFP modules from RollBall/Hilink.
These modules have a specific way to access clause 45 registers of the
internal PHY.
We also need to wait at least 22 seconds after deasserting TX disable
before accessing the PHY. The code waits for 25 seconds just to be sure.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some multigig SFPs from RollBall and Hilink do not expose functional
MDIO access to the internal PHY of the SFP via I2C address 0x56
(although there seems to be read-only clause 22 access on this address).
Instead these SFPs PHY can be accessed via I2C via the SFP Enhanced
Digital Diagnostic Interface - I2C address 0x51. The SFP_PAGE has to be
selected to 3 and the password must be filled with 0xff bytes for this
PHY communication to work.
This extends the mdio-i2c driver to support this protocol by adding a
special parameter to mdio_i2c_alloc function via which this RollBall
protocol can be selected.
Signed-off-by: Marek Behún <kabel@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of configuring the I2C mdiobus when SFP driver is probed,
create/destroy the mdiobus before the PHY is probed for/after it is
released.
This way we can tell the mdio-i2c code which protocol to use for each
SFP transceiver.
Move the code that determines MDIO I2C protocol from
sfp_sm_probe_for_phy() to sfp_sm_mod_probe(), where most of the SFP ID
parsing is done. Don't allocate I2C bus if no PHY is expected.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add macros SFP_QUIRK(), SFP_QUIRK_M() and SFP_QUIRK_F() for defining SFP
quirk table entries. Use them to deduplicate the code a little bit.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some SFPs may contain an internal PHY which may in some cases want to
connect with the host interface in 1000base-x/2500base-x mode.
Do not fail if such PHY is being attached in one of these PHY interface
modes.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Pali Rohár <pali@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Select the host interface configuration according to the capabilities of
the host if the host provided them. This is currently provided only when
connecting PHY that is inside a SFP.
The PHY supports several configurations of host communication:
- always communicate with host in 10gbase-r, even if copper speed is
lower (rate matching mode),
- the same as above but use xaui/rxaui instead of 10gbase-r,
- switch host SerDes mode between 10gbase-r, 5gbase-r, 2500base-x and
sgmii according to copper speed,
- the same as above but use xaui/rxaui instead of 10gbase-r.
This mode of host communication, called MACTYPE, is by default selected
by strapping pins, but it can be changed in software.
This adds support for selecting this mode according to which modes are
supported by the host.
This allows the kernel to:
- support SFP modules with 88X33X0 or 88E21X0 inside them
Note: we use mv3310_select_mactype() for both 88X3310 and 88X3340,
although 88X3340 does not support XAUI. This is not a problem because
88X3340 does not declare XAUI in it's supported_interfaces, and so this
function will never choose that MACTYPE.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
[ rebase, updated, also added support for 88E21X0 ]
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some register definitions were defined with spaces used for indentation.
Change them to tabs.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass the supported PHY interface types to phylib if the PHY we are
connecting is inside a SFP, so that the PHY driver can select an
appropriate host configuration mode for their interface according to
the host capabilities.
For example the Marvell 88X3310 PHY inside RollBall SFP modules
defaults to 10gbase-r mode on host's side, and the marvell10g
driver currently does not change this setting. But a host may not
support 10gbase-r. For example Turris Omnia only supports sgmii,
1000base-x and 2500base-x modes. The PHY can be configured to use
those modes, but in order for the PHY driver to do that, it needs
to know which modes are supported.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
phylink_sfp_config() now only deals with configuring the MAC for a
SFP containing a PHY. Rename it to be specific.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Where a MAC provides a phy_interface_t bitmap, use these bitmaps to
select the operating interface mode for optical SFP modules, rather
than using the linkmode bitmaps.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We currently parse the SFP EEPROM to a bitmap of ethtool link modes,
and then attempt to convert the link modes to a PHY interface mode.
While this works at present, there are cases where this is sub-optimal.
For example, where a module can operate with several different PHY
interface modes.
To start addressing this, arrange for the SFP EEPROM parsing to also
provide a bitmap of the possible PHY interface modes.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rather than having the ability to validate all supported interface
modes or a single interface mode, introduce the ability to validate
a subset of supported modes.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[ rebased on current net-next ]
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the `PLATFORM_DEVID_NONE` constant instead of
hard-coding -1 when creating a platform device.
No functional changes are intended.
Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com>
Link: https://lore.kernel.org/r/20220930104857.2796923-1-pobrn@protonmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
The idle mask is dumped during the "prepare" and "restore" stage
right now, which helps to demonstrate issues only related to the
first s2idle entry.
If the system has entered s2idle once, but was woken up never
breaking the s2idle loop but also never went back to sleep we
might still have another issue to deal with however.
Move the dynamic debugging message here so that we'll catch it on
each iteration.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216516
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20220929215042.745-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).
The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.
The return type of sparx5_port_xmit_impl should be changed from int to
netdev_tx_t.
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Also updates the documentation accordingly.
Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net>
Link: https://lore.kernel.org/r/YznOUQ7Pijedu0NW@monster.localdomain
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Add missing DT bindings for STM32 and a resource leak fix for DaVinci"
* tag 'i2c-for-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: davinci: fix PM disable depth imbalance in davinci_i2c_probe
dt-bindings: i2c: st,stm32-i2c: Document wakeup-source property
dt-bindings: i2c: st,stm32-i2c: Document interrupt-names property
|
|
Fix scale factors for reading MPS Multi-phase mp2888 controller.
Fixed sensors:
- PIN/POUT: based on vendor documentation, set bscale factor 0.5W/LSB
- IOUT: based on vendor documentation, set scale factor 0.25 A/LSB
Fixes: e4db7719d037 ("hwmon: (pmbus) Add support for MPS Multi-phase mp2888 controller")
Signed-off-by: Oleksandr Shamray <oleksandrs@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20220929121642.63051-1-oleksandrs@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
When enable 'unused-but-set-variable' compile
warning option, it would raise warning as below:
drivers/hwmon/nct6683.c:415:9:
warning: variable 'j' set but not used [-Wunused-but-set-variable]
Variable 'j' in nct6683_create_attr_group is unused,
so remove it and simplify the 'for' loop.
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Link: https://lore.kernel.org/r/20220927114352.2498079-1-zengheng4@huawei.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Some constants need 'UL' markings, otherwise they are shifted into the
sign bit.
Fixes: 361693697249 ("i2c: microchip: pci1xxxx: Add driver for I2C host controller in multifunction endpoint of pci1xxxx switch")
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Zero-length arrays are deprecated and we are moving towards adopting
C99 flexible-array members, instead. So, replace zero-length arrays
declarations in anonymous union with the new DECLARE_FLEX_ARRAY()
helper macro.
This helper allows for flexible-array members in unions.
Link: https://github.com/KSPP/linux/issues/193
Link: https://github.com/KSPP/linux/issues/218
Link: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
This i801 driver probe can take more than ~190ms in some devices, since
the "i2c_register_spd()" call was added inside
"i801_probe_optional_slaves()".
Prefer async probe so that other drivers can be probed and boot can
continue in parallel while this driver loads, to reduce boot time. There is
no reason to block other drivers from probing while this driver is
loading.
Signed-off-by: Mani Milani <mani@chromium.org>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
The pm_runtime_enable will increase power disable depth. Thus a
pairing decrement is needed on the error handling path to keep
it balanced according to context.
Fixes: 17f88151ff190 ("i2c: davinci: Add PM Runtime Support")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
The pattern
foo = kmalloc(sizeof(*foo), GFP_KERNEL);
has an advantage when foo type is changed. Since we are planning a such,
better to be prepared by using standard pattern for memory allocation.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
The code is ogranized in a way that all related parts
to the certain platform quirk go together. This is not
the case for AMD NAVI. Shuffle code to make it happen.
While at it, drop the frequency definition and use
hard coded value as it's done for other platforms and
add a comment to the PCI ID list.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Remove extra whitespace and add a missing word to a sentence describing
get_random_bytes().
Signed-off-by: William Zijl <postmaster@gusted.xyz>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In the initial implementation of XSK in mlx5e, XSK RQs coexisted with
regular RQs in the same channel. The main idea was to allow RSS work the
same for regular traffic, without need to reconfigure RSS to exclude XSK
queues.
However, this scheme didn't prove to be beneficial, mainly because of
incompatibility with other vendors. Some tools don't properly support
using higher indices for XSK queues, some tools get confused with the
double amount of RQs exposed in sysfs. Some use cases are purely XSK,
and allocating the same amount of unused regular RQs is a waste of
resources.
This commit changes the queuing scheme to the standard one, where XSK
RQs replace regular RQs on the channels where XSK sockets are open. Two
RQs still exist in the channel to allow failsafe disable of XSK, but
only one is exposed at a time. The next commit will achieve the desired
memory save by flushing the buffers when the regular RQ is unused.
As the result of this transition:
1. It's possible to use RSS contexts over XSK RQs.
2. It's possible to dedicate all queues to XSK.
3. When XSK RQs coexist with regular RQs, the admin should make sure no
unwanted traffic goes into XSK RQs by either excluding them from RSS or
settings up the XDP program to return XDP_PASS for non-XSK traffic.
4. When using a mixed fleet of mlx5e devices and other netdevs, the same
configuration can be applied. If the application supports the fallback
to copy mode on unsupported drivers, it will work too.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a function to flush an RQ: clean up descriptors, release pages and
reset the RQ. This procedure is used by the recovery flow, and it will
also be used in a following commit to free some memory when switching a
channel to the XSK mode.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for XDP metadata on XSK RQs for cross-program
communication. The driver no longer calls xdp_set_data_meta_invalid and
copies the metadata to a newly allocated SKB on XDP_PASS.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5e_free_rx_mpwqe loops over all pages of a MPWQE, calling
mlx5e_page_release for ones that are not scheduled for XDP_TX or
XDP_REDIRECT; and mlx5e_page_release checks whether it's an XSK RQ or a
regular one for each page/XSK frame. This check can be moved outside the
loop to reduce the number of branches.
mlx5e_free_rx_wqe loops over all fragments, calling mlx5e_page_release
for the ones that are last in a page; and mlx5e_page_release checks
whether it's an XSK RQ or a regular one for each fragment. Using the
fact that XSK doesn't support multiple fragments, it can be optimized
for both XSK and regular usages:
1. Make an early check for XSK and call its deallocator directly, saving
3 branches (loop condition, frag->last_in_page and selection of
deallocator).
2. Call the regular deallocator directly in the non-XSK case, saving a
branch per fragment, except the first one.
After the changes, mlx5e_page_release is removed, as there are no
callers left.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5e_page_release calls the appropriate deallocator depending on
whether it's an XSK RQ or a regular one. Some flows that call this
function are not compatible with XSK, so they can call the non-XSK
deallocator directly to save a branch.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The SHAMPO flow is not compatible with XSK, it can call the page pool
allocator directly to save a branch.
mlx5e_page_alloc is removed, as it's no longer used in any flow.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
XSK provides a function to allocate frames in batches for more efficient
processing. This commit starts using this function on striding RQ and
creates an optimized flow for XSK. A side effect is an opportunity to
optimize the regular RX flow by dropping branching for XSK cases.
Performance improvement is up to 6.4% in the aligned mode and up to 7.5%
in the unaligned mode.
Aligned mode, 2048-byte frames: 12.9 Mpps -> 13.8 Mpps
Aligned mode, 4096-byte frames: 11.8 Mpps -> 12.5 Mpps
Unaligned mode, 2048-byte frames: 11.9 Mpps -> 12.8 Mpps
Unaligned mode, 3072-byte frames: 11.4 Mpps -> 12.1 Mpps
Unaligned mode, 4096-byte frames: 11.0 Mpps -> 11.2 Mpps
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
XSK provides a function to allocate frames in batches for more efficient
processing. This commit starts using this function on legacy RQ, adding
a special case for XSK. The new branch introduced basically replaces the
branch that was removed from the same place a few commits before.
A check is made that DMA sync is not needed, because the batching
allocator falls back to returning one frame when DMA sync is needed, and
this is best handled by the loop in the standard case.
Performance improvement is up to 8% in the aligned mode and up to 9% in
the unaligned mode.
Aligned mode, 2048-byte frames: 12.8 Mpps -> 13.5 Mpps
Aligned mode, 4096-byte frames: 11.5 Mpps -> 12.4 Mpps
Unaligned mode, 2048-byte frames: 12.2 Mpps -> 13.4 Mpps
Unaligned mode, 3072-byte frames: 11.6 Mpps -> 12.5 Mpps
Unaligned mode, 4096-byte frames: 11.2 Mpps -> 12.2 Mpps
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allocation of XSK frames on legacy RQ may be made more efficient with a
specialized routine that relies on certain assumptions, such as there is
only one fragment, allocation units (XSK frames) are not shared among
multiple packets. It reduces the number of branches both in the XSK code
and in the regular RQ, because with this approach there is only a single
check whether it's an XSK or regular RQ.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Legacy RQ WQEs are allocated in a loop in small batches (8 WQEs). As
partial batches are allowed, there is no point to have a loop in a loop,
so the outer loop is removed, and the batch size is increased up to the
total number of WQEs to allocate, still not smaller than 8.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The previous commit allowed allocating WQE batches in legacy RQ
partially, however, XSK still checks whether there are enough frames in
the fill ring. Remove this check to allow to allocate batches partially
also with XSK.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Legacy RQ allocates WQEs in batches. If the batch allocation fails, the
pages of the allocated part are released. This commit changes this
behavior to allow to use the pages that have been already allocated.
After this change, we need to be careful about indexing rq->wqe.frags[].
The WQ size is a power of two that divides by wqe_bulk (8), and the old
code used whole bulks, which allowed to use indices [8*K; 8*K+7] without
overflowing. Now that the bulks may be partial, the range can start at
any location (not only at 8*K), so we need to wrap them around to avoid
out-of-bounds array access.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The old calculation of wqe_index_mask may give false positives, i.e.
request bulking of pairs of WQEs when not strictly needed, for example,
when the first fragment size is equal to the PAGE_SIZE, bulking is not
needed, even if the number of fragments is odd.
Make the calculation more exact to cut false positives.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When fragments of different WQEs share the same page, mlx5e_post_rx_wqes
must wait until the old WQE stops using the page, only then the new WQE
can allocate the new page. Essentially, it means that if WQE index i is
still in use, the allocation must stop before `i % bulk`, where bulk is
the number of WQEs that may share the same page.
As bulk is always a power of two, `i % bulk = i & (bulk - 1)`, and the
new wqe_index_mask field will be equal to `bulk - 1`.
At the same time, wqe_bulk remains for optimization purposes and stores
`max(bulk, 8)`, which allows to skip the allocation until we have at
least 8 WQEs free.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MLX5E_CHANNEL_STATE_XSK flag checked in mlx5e_xsk_wakeup indicates
that XSK queues are open, but not necessarily activated. This check is
not very useful, because:
0. Both XSK setup and netdev state transitions take the same state_lock
mutex, so they can't happen at the same time.
1. If the netdev is up, xsk_is_bound can return true only when
MLX5E_CHANNEL_STATE_XSK is set on the corresponding channel.
mlx5e_xsk_wakeup is only called when xsk_is_bound is true.
2. If the XSK socket is bound, and the netdev is going up or down,
mlx5e_xsk_wakeup can take one of two branches, depending on the return
value of napi_if_scheduled_mark_missed:
2.1. True means one of two things: either NAPI was enabled at this
point, which means MLX5E_CHANNEL_STATE_XSK was also set; or NAPI was
disabled, and nothing really happened.
2.2. False means that NAPI was enabled by this point, which also implies
MLX5E_CHANNEL_STATE_XSK was set. Additionally, mlx5e_xsk_wakeup contains
a following check for MLX5E_SQ_STATE_ENABLED on async_icosq, and this
flag implies MLX5E_CHANNEL_STATE_XSK too on XSK channels.
As checking this flag doesn't cut any flows, remove the check.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5e_xsk_wakeup triggers an IRQ by posting a NOP to async_icosq, taking
a spinlock to protect from concurrent access. There is already a
function that does the same: mlx5e_trigger_napi_icosq. Use this function
in mlx5e_xsk_wakeup.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If create_singlethread_workqueue() fails, it returns a null pointer,
replace IS_ERR() check with NULL pointer check.
Fixes: 233cb8a47d65 ("power: supply: mt6370: Add MediaTek MT6370 charger driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: ChiaEn Wu <chiaen_wu@richtek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|