Age | Commit message (Collapse) | Author |
|
Set the TSO driver limit to GSO_MAX_SIZE (512 KB).
This allows the admin/user to set a GSO limit up to this value.
ip link set dev veth10 gso_max_size 200000
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set the driver limit to GSO_MAX_SIZE (512 KB).
This allows the admin/user to set a GSO limit up to this value.
Tested:
ip link set dev lo gso_max_size 200000
netperf -H ::1 -t TCP_RR -l 100 -- -r 80000,80000 &
tcpdump shows :
18:28:42.962116 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626480001:3626560001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962138 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962152 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626560001:3626640001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962157 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962180 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626560001:3626640001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962214 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 0
18:28:42.962228 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626640001:3626720001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 80000
18:28:42.962233 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626720001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179266], length 0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow the gro_max_size to exceed a value larger than 65536.
There weren't really any external limitations that prevented this other
than the fact that IPv4 only supports a 16 bit length field. Since we have
the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to
exceed this value and for IPv4 and non-TCP flows we can cap things at 65536
via a constant rather than relying on gro_max_size.
[edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The code for gso_max_size was added originally to allow for debugging and
workaround of buggy devices that couldn't support TSO with blocks 64K in
size. The original reason for limiting it to 64K was because that was the
existing limits of IPv4 and non-jumbogram IPv6 length fields.
With the addition of Big TCP we can remove this limit and allow the value
to potentially go up to UINT_MAX and instead be limited by the tso_max_size
value.
So in order to support this we need to go through and clean up the
remaining users of the gso_max_size value so that the values will cap at
64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value
so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper
limit for GSO_MAX_SIZE.
v6: (edumazet) fixed a compile error if CONFIG_IPV6=n,
in a new sk_trim_gso_size() helper.
netif_set_tso_max_size() caps the requested TSO size
with GSO_MAX_SIZE.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RZ/V2M Ethernet is very similar to R-Car Gen3 Ethernet-AVB, though
some small parts are the same as R-Car Gen2.
Other differences to R-Car Gen3 and Gen2 are:
* It has separate data (DI), error (Line 1) and management (Line 2) irqs
rather than one irq for all three.
* Instead of using the High-speed peripheral bus clock for gPTP, it has a
separate gPTP reference clock.
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RZ/V2M has a separate gPTP reference clock that is used when the
AVB-DMAC Mode Register (CCC) gPTP Clock Select (CSEL) bits are
set to "01: High-speed peripheral bus clock".
Therefore, add a feature that allows this clock to be used for
gPTP.
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
R-Car has a combined interrupt line, ch22 = Line0_DiA | Line1_A | Line2_A.
RZ/V2M has separate interrupt lines for each of these, so add a feature
that allows the driver to get these interrupts and call the common handler.
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, when the HW has a single interrupt, the driver uses the
GIC, TIC, RIC0 registers to enable and disable interrupts.
When the HW has multiple interrupts, it uses the GIE, GID, TIE, TID,
RIE0, RID0 registers.
However, other devices, e.g. RZ/V2M, have multiple irqs and only have
the GIC, TIC, RIC0 registers.
Therefore, split this into a separate feature.
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When ADQ queue groups (TCs) are created via tc mqprio command,
RSS contexts and associated RSS indirection tables are configured
automatically per TC based on the queue ranges specified for
each traffic class.
For ex:
tc qdisc add dev enp175s0f0 root mqprio num_tc 3 map 0 1 2 \
queues 2@0 8@2 4@10 hw 1 mode channel
will create 3 queue groups (TC 0-2) with queue ranges 2, 8 and 4
in 3 queue groups. Each queue group is associated with its
own RSS context and RSS indirection table.
Add support to expose RSS indirection tables for all ADQ queue
groups using ethtool RSS contexts interface.
ethtool -x enp175s0f0 context <tc-num>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220512213249.3747424-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the capability to map non-linear xdp frames in XDP_TX and ndo_xdp_xmit
callback.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220512212621.3746140-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove napi_weight statics which are set to 64 and never modified,
remnants of the out-of-tree napi_weight module param.
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220512205603.1536771-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2022-05-13
1) Cleanups for the code behind the XFRM offload API. This is a
preparation for the extension of the API for policy offload.
From Leon Romanovsky.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: drop not needed flags variable in XFRM offload struct
net/mlx5e: Use XFRM state direction instead of flags
netdevsim: rely on XFRM state direction instead of flags
ixgbe: propagate XFRM offload state direction instead of flags
xfrm: store and rely on direction to construct offload flags
xfrm: rename xfrm_state_offload struct to allow reuse
xfrm: delete not used number of external headers
xfrm: free not used XFRM_ESP_NO_TRAILER flag
====================
Link: https://lore.kernel.org/r/20220513151218.4010119-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If CONFIG_PTP_1588_CLOCK=m and CONFIG_SFC_SIENA=y, the siena driver will fail to link:
drivers/net/ethernet/sfc/siena/ptp.o: In function `efx_ptp_remove_channel':
ptp.c:(.text+0xa28): undefined reference to `ptp_clock_unregister'
drivers/net/ethernet/sfc/siena/ptp.o: In function `efx_ptp_probe_channel':
ptp.c:(.text+0x13a0): undefined reference to `ptp_clock_register'
ptp.c:(.text+0x1470): undefined reference to `ptp_clock_unregister'
drivers/net/ethernet/sfc/siena/ptp.o: In function `efx_ptp_pps_worker':
ptp.c:(.text+0x1d29): undefined reference to `ptp_clock_event'
drivers/net/ethernet/sfc/siena/ptp.o: In function `efx_siena_ptp_get_ts_info':
ptp.c:(.text+0x301b): undefined reference to `ptp_clock_index'
To fix this build error, make SFC_SIENA depends on PTP_1588_CLOCK.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: d48523cb88e0 ("sfc: Copy shared files needed for Siena (part 2)")
Signed-off-by: Ren Zhijie <renzhijie2@huawei.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220513012721.140871-1-renzhijie2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Instead of always returning -ENOPKG, decode the firmware error
code further when the HWRM_NVM_INSTALL_UPDATE firmware call fails.
Return a more suitable error code to userspace and log an error
in dmesg.
This is version 2 of the earlier patch that was reverted:
02acd399533e ("bnxt_en: parse result field when NVRAM package install fails")
In this new version, if the call is made through devlink instead of
ethtool, we'll also set the error message in extack.
Link: https://lore.kernel.org/netdev/20220307141358.4d52462e@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add driver support to enable timestamping on all RX packets
that are received by the NIC. This capability can be requested
by the applications using SIOCSHWTSTAMP ioctl with filter type
HWTSTAMP_FILTER_ALL.
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For correctness, we need to configure the packet filters for timestamping
during bnxt_open. This way they are always configured after firmware
reset or chip reset. We should not assume that the filters will always
be retained across resets.
This patch modifies the ioctl handler and always configures the PTP
filters in the bnxt_open() path.
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The main changes are timestamp support for all RX packets and new PCIe
statistics.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This driver was using the TX IRQ handler to perform all TX completion
tasks. Under heavy TX network load, this can cause significant irqs-off
latencies (found to be in the hundreds of microseconds using ftrace).
This can cause other issues, such as overrunning serial UART FIFOs when
using high baud rates with limited UART FIFO sizes.
Switch to using a NAPI poll handler to perform the TX completion work
to get this out of hard IRQ context and avoid the IRQ latency impact.
A separate poll handler is used for TX and RX since they have separate
IRQs on this controller, so that the completion work for each of them
stays on the same CPU as the interrupt.
Testing on a Xilinx MPSoC ZU9EG platform using iperf3 from a Linux PC
through a switch at 1G link speed showed no significant change in TX or
RX throughput, with approximately 941 Mbps before and after. Hard IRQ
time in the TX throughput test was significantly reduced from 12% to
below 1% on the CPU handling TX interrupts, with total hard+soft IRQ CPU
usage dropping from about 56% down to 48%.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The axienet_start_xmit function was updating the tx_bd_tail variable
multiple times, with potential rollbacks on error or invalid
intermediate positions, even though this variable is also used in the
TX completion path. Use READ_ONCE where this variable is read and
WRITE_ONCE where it is written to make this update more atomic, and
move the write before the MMIO write to start the transfer, so it is
protected by that implicit write barrier.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If reading the Interrupt Source Flag register fails with -ENODEV, then
the PHY has been hot-removed and the correct response is to bail out
instead of throwing a WARN splat and attempting to suspend the PHY.
The PHY should be stopped in due course anyway as the kernel
asynchronously tears down the device.
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cache the interrupt mask to avoid re-reading it from the PHY upon every
interrupt.
This will simplify a subsequent commit which detects hot-removal in the
interrupt handler and bails out.
Analyzing and debugging PHY transactions also becomes simpler if such
redundant reads are avoided.
Last not least, interrupt overhead and latency is slightly improved.
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Link status of SMSC LAN95xx chips is polled once per second, even though
they're capable of signaling PHY interrupts through the MAC layer.
Forward those interrupts to the PHY driver to avoid polling. Benefits
are reduced bus traffic, reduced CPU overhead and quicker interface
bringup.
Polling was introduced in 2016 by commit d69d16949346 ("usbnet:
smsc95xx: fix link detection for disabled autonegotiation").
Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt,
hence couldn't detect link-up events when auto-negotiation was disabled.
The proper solution would have been to enable the ENERGYON interrupt
instead of polling.
Since then, PHY handling was moved from the LAN95xx driver to the SMSC
PHY driver with commit 05b35e7eb9a1 ("smsc95xx: add phylib support").
That PHY driver is capable of link detection with auto-negotiation
disabled because it enables the ENERGYON interrupt.
Note that signaling interrupts through the MAC layer not only works with
the integrated PHY, but also with an external PHY, provided its
interrupt pin is attached to LAN95xx's nPHY_INT pin.
In the unlikely event that the interrupt pin of an external PHY is
attached to a GPIO of the SoC (or not connected at all), the driver can
be amended to retrieve the irq from the PHY's of_node.
To forward PHY interrupts to phylib, it is not sufficient to call
phy_mac_interrupt(). Instead, the PHY's interrupt handler needs to run
so that PHY interrupts are cleared. That's because according to page
119 of the LAN950x datasheet, "The source of this interrupt is a level.
The interrupt persists until it is cleared in the PHY."
https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf
Therefore, create an IRQ domain with a single IRQ for the PHY. In the
future, the IRQ domain may be extended to support the 11 GPIOs on the
LAN95xx.
Normally the PHY interrupt should be masked until the PHY driver has
cleared it. However masking requires a (sleeping) USB transaction and
interrupts are received in (non-sleepable) softirq context. I decided
not to mask the interrupt at all (by using the dummy_irq_chip's noop
->irq_mask() callback): The USB interrupt endpoint is polled in 1 msec
intervals and normally that's sufficient to wake the PHY driver's IRQ
thread and have it clear the interrupt. If it does take longer, worst
thing that can happen is the IRQ thread is woken again. No big deal.
Because PHY interrupts are now perpetually enabled, there's no need to
selectively enable them on suspend. So remove all invocations of
smsc95xx_enable_phy_wakeup_interrupts().
In smsc95xx_resume(), move the call of phy_init_hw() before
usbnet_resume() (which restarts the status URB) to ensure that the PHY
is fully initialized when an interrupt is handled.
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch> # from a PHY perspective
Cc: Andre Edich <andre.edich@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a PHY interrupt is signaled, the SMSC LAN95xx driver updates the
MAC full duplex mode and PHY flow control registers based on cached data
in struct phy_device:
smsc95xx_status() # raises EVENT_LINK_RESET
usbnet_deferred_kevent()
smsc95xx_link_reset() # uses cached data in phydev
Simultaneously, phylib polls link status once per second and updates
that cached data:
phy_state_machine()
phy_check_link_status()
phy_read_status()
lan87xx_read_status()
genphy_read_status() # updates cached data in phydev
If smsc95xx_link_reset() wins the race against genphy_read_status(),
the registers may be updated based on stale data.
E.g. if the link was previously down, phydev->duplex is set to
DUPLEX_UNKNOWN and that's what smsc95xx_link_reset() will use, even
though genphy_read_status() may update it to DUPLEX_FULL afterwards.
PHY interrupts are currently only enabled on suspend to trigger wakeup,
so the impact of the race is limited, but we're about to enable them
perpetually.
Avoid the race by delaying execution of smsc95xx_link_reset() until
phy_state_machine() has done its job and calls back via
smsc95xx_handle_link_change().
Signaling EVENT_LINK_RESET on wakeup is not necessary because phylib
picks up link status changes through polling. So drop the declaration
of a ->link_reset() callback.
Note that the semicolon on a line by itself added in smsc95xx_status()
is a placeholder for a function call which will be added in a subsequent
commit. That function call will actually handle the INT_ENP_PHY_INT_
interrupt.
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
smsc95xx_reset() resets the PHY behind the PHY driver's back, which
seems like a bad idea generally. Remove that portion of the function.
We're about to use PHY interrupts instead of polling to detect link
changes on SMSC LAN95xx chips. Because smsc95xx_reset() is called from
usbnet_open(), PHY interrupt settings are lost whenever the net_device
is brought up.
There are two other callers of smsc95xx_reset(), namely smsc95xx_bind()
and smsc95xx_reset_resume(), and both may indeed benefit from a PHY
reset. However they already perform one through their calls to
phy_connect_direct() and phy_init_hw().
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Martyn Welch <martyn.welch@collabora.com>
Cc: Gabriel Hojda <ghojda@yo2urs.ro>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Upon receiving data from the Interrupt Endpoint, the SMSC LAN95xx driver
attempts to clear the signaled interrupts by writing "all ones" to the
Interrupt Status Register.
However the driver only ever enables a single type of interrupt, namely
the PHY Interrupt. And according to page 119 of the LAN950x datasheet,
its bit in the Interrupt Status Register is read-only. There's no other
way to clear it than in a separate PHY register:
https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf
Consequently, writing "all ones" to the Interrupt Status Register is
pointless and can be dropped.
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 2c9d6c2b871d ("usbnet: run unbind() before unregister_netdev()")
sought to fix a use-after-free on disconnect of USB Ethernet adapters.
It turns out that a different fix is necessary to address the issue:
https://lore.kernel.org/netdev/18b3541e5372bc9b9fc733d422f4e698c089077c.1650177997.git.lukas@wunner.de/
So the commit was not necessary.
The commit made binding and unbinding of USB Ethernet asymmetrical:
Before, usbnet_probe() first invoked the ->bind() callback and then
register_netdev(). usbnet_disconnect() mirrored that by first invoking
unregister_netdev() and then ->unbind().
Since the commit, the order in usbnet_disconnect() is reversed and no
longer mirrors usbnet_probe().
One consequence is that a PHY disconnected (and stopped) in ->unbind()
is afterwards stopped once more by unregister_netdev() as it closes the
netdev before unregistering. That necessitates a contortion in ->stop()
because the PHY may only be stopped if it hasn't already been
disconnected.
Reverting the commit allows making the call to phy_stop() unconditional
in ->stop().
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Oliver Neukum <oneukum@suse.com>
Cc: Martyn Welch <martyn.welch@collabora.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove .owner field if calls are used which set it automatically.
./drivers/net/ethernet/sunplus/spl2sw_driver.c:569:3-8: No need to set
.owner here. The core will do it.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Clean the following coccicheck warning:
./drivers/net/ethernet/sunplus/spl2sw_driver.c:217:27-28: WARNING
opportunity for swap().
./drivers/net/ethernet/sunplus/spl2sw_driver.c:222:27-28: WARNING
opportunity for swap().
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
They were removed in the first series since they were not used for EF10.
Put that code back for Siena, with the prototypes in siena_sriov.h
since that file is a more applicable place for it.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Change the clock name and work queue names to differentiate them from
the names used in sfc.ko.
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a Siena Kconfig option and use it in stead of the sfc one.
Rename the internal variable for the 'mcdi_logging_default' module
parameter to avoid a naming conflict with the one in sfc.ko.
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a Siena Kconfig option and use it in stead of the sfc one.
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a Siena Kconfig option and use it in stead of the sfc one.
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a Siena Kconfig option and use it in stead of the sfc one.
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently the ocelot switch lib is unaware of the index of a struct
ocelot_port, since that is kept in the encapsulating structures of outer
drivers (struct dsa_port :: index, struct ocelot_port_private :: chip_port).
With the upcoming increase in complexity associated with assigning DSA
tag_8021q CPU ports to certain user ports, it becomes necessary for the
switch lib to be able to retrieve the index of a certain ocelot_port.
Therefore, introduce a new u8 to ocelot_port (same size as the chip_port
used by the ocelot switchdev driver) and rework the existing code to
populate and use it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The error handling for the current tagging protocol change procedure is
a bit brittle (we dismantle the previous tagging protocol entirely
before setting up the new one). By identifying which parts of a tagging
protocol are unique to itself and which parts are shared with the other,
we can implement a protocol change procedure where error handling is a
bit more robust, because we start setting up the new protocol first, and
tear down the old one only after the setup of the specific and shared
parts succeeded.
The protocol change is a bit too open-coded too, in the area of
migrating host flood settings and MDBs. By identifying what differs
between tagging protocols (the forwarding masks for host flooding) we
can implement a more straightforward migration procedure which is
handled in the shared portion of the protocol change, rather than
individually by each protocol.
Therefore, a more structured approach calls for the introduction of a
structure of function pointers per tagging protocol. This covers setup,
teardown and the host forwarding mask. In the future it will also cover
how to prepare for a new DSA master.
The initial tagging protocol setup (at driver probe time) and the final
teardown (at driver removal time) are also adapted to call into the
structured methods of the specific protocol in current use. This is
especially relevant for teardown, where we previously called
felix_del_tag_protocol() only for the first CPU port. But by not
specifying which CPU port this is for, we gain more flexibility to
support multiple CPU ports in the future.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ocelot switches support a single active CPU port at a time (at least as
a trapping destination, i.e. for control traffic). This is true
regardless of whether we are using the native copy-to-CPU-port-module
functionality, or a redirect action towards the software-defined
tag_8021q CPU port.
Currently we assume that the trapping destination in tag_8021q mode is
the first CPU port, yet in the future we may want to migrate the user
ports to the second CPU port.
For that to work, we need to make sure that the tag_8021q trapping
destination is a CPU port that is active, i.e. is used by at least some
user port on which the trap was added. Otherwise, we may end up
redirecting the traffic to a CPU port which isn't even up.
Note that due to the current design where we simply choose the CPU port
of the first port from the trap's ingress port mask, it may be that a
CPU port absorbes control traffic from user ports which aren't affine to
it as per user space's request. This isn't ideal, but is the lesser of
two evils. Following the user-configured affinity for traps would mean
that we can no longer reuse a single TCAM entry for multiple traps,
which is what we actually do for e.g. PTP. Either we duplicate and
deduplicate TCAM entries on the fly when user-to-CPU-port mappings
change (which is unnecessarily complicated), or we redirect trapped
traffic to all tag_8021q CPU ports if multiple such ports are in use.
The latter would have actually been nice, if it actually worked, but it
doesn't, since a OCELOT_MASK_MODE_REDIRECT action towards multiple ports
would not take PGID_SRC into consideration, and it would just duplicate
the packet towards each (CPU) port, leading to duplicates in software.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
DSA has not supported (and probably will not support in the future
either) independent tagging protocols per CPU port.
Different switch drivers have different requirements, some may need to
replicate some settings for each CPU port, some may need to apply some
settings on a single CPU port, while some may have to configure some
global settings and then some per-CPU-port settings.
In any case, the current model where DSA calls ->change_tag_protocol for
each CPU port turns out to be impractical for drivers where there are
global things to be done. For example, felix calls dsa_tag_8021q_register(),
which makes no sense per CPU port, so it suppresses the second call.
Let drivers deal with replication towards all CPU ports, and remove the
CPU port argument from the function prototype.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
At the time - commit 7569459a52c9 ("net: dsa: manage flooding on the CPU
ports") - not introducing a dedicated switch callback for host flooding
made sense, because for the only user, the felix driver, there was
nothing different to do for the CPU port than set the flood flags on the
CPU port just like on any other bridge port.
There are 2 reasons why this approach is not good enough, however.
(1) Other drivers, like sja1105, support configuring flooding as a
function of {ingress port, egress port}, whereas the DSA
->port_bridge_flags() function only operates on an egress port.
So with that driver we'd have useless host flooding from user ports
which don't need it.
(2) Even with the felix driver, support for multiple CPU ports makes it
difficult to piggyback on ->port_bridge_flags(). The way in which
the felix driver is going to support host-filtered addresses with
multiple CPU ports is that it will direct these addresses towards
both CPU ports (in a sort of multicast fashion), then restrict the
forwarding to only one of the two using the forwarding masks.
Consequently, flooding will also be enabled towards both CPU ports.
However, ->port_bridge_flags() gets passed the index of a single CPU
port, and that leaves the flood settings out of sync between the 2
CPU ports.
This is to say, it's better to have a specific driver method for host
flooding, which takes the user port as argument. This solves problem (1)
by allowing the driver to do different things for different user ports,
and problem (2) by abstracting the operation and letting the driver do
whatever, rather than explicitly making the DSA core point to the CPU
port it thinks needs to be touched.
This new method also creates a problem, which is that cross-chip setups
are not handled. However I don't have hardware right now where I can
test what is the proper thing to do, and there isn't hardware compatible
with multi-switch trees that supports host flooding. So it remains a
problem to be tackled in the future.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For symmetry with host FDBs and MDBs where the indirection is now
handled outside the ocelot switch lib, do the same for bridge port
flags (unicast/multicast/broadcast flooding).
The only caller of the ocelot switch lib which uses the NPI port is the
Felix DSA driver.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For symmetry with host FDBs where the indirection is now handled outside
the ocelot switch lib, do the same for host MDB entries. The only caller
of the ocelot switch lib which uses the NPI port is the Felix DSA driver.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I remembered why we had the host FDB migration procedure in place.
It is true that host FDB entry migration can be done by changing the
value of PGID_CPU, but the problem is that only host FDB entries learned
while operating in NPI mode go to PGID_CPU. When the CPU port operates
in tag_8021q mode, the FDB entries are learned towards the unicast PGID
equal to the physical port number of this CPU port, bypassing the
PGID_CPU indirection.
So host FDB entries learned in tag_8021q mode are not migrated any
longer towards the NPI port.
Fix this by extracting the NPI port -> PGID_CPU redirection from the
ocelot switch lib, moving it to the Felix DSA driver, and applying it
for any CPU port regardless of its kind (NPI or tag_8021q).
Fixes: a51c1c3f3218 ("net: dsa: felix: stop migrating FDBs back and forth on tag proto change")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The smatch found the following warning:
drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:736 lan966x_fdma_reload()
warn: 'rx_dcbs' was already freed.
This issue can happen when changing the MTU on one of the ports and once
the RX buffers are allocated and then the TX buffer allocation fails.
In that case the RX buffers should not be restore. This fix this issue
such that the RX buffers will not be restored if the TX buffers failed
to be allocated.
Fixes: 2ea1cbac267e2a ("net: lan966x: Update FDMA to change MTU.")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20220511204059.2689199-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver currently has three interrupt counters,
which are incremented every time each interrupt handler
executes. These driver-managed counters are not
necessary as the kernel already has logic that manages
interrupt counts and exposes them via /proc/interrupts.
This patch removes the driver-managed counters.
Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Link: https://lore.kernel.org/r/20220511135251.2989-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No conflicts.
Build issue in drivers/net/ethernet/sfc/ptp.c
54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static")
49e6123c65da ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, and bluetooth.
No outstanding fires.
Current release - regressions:
- eth: atlantic: always deep reset on pm op, fix null-deref
Current release - new code bugs:
- rds: use maybe_get_net() when acquiring refcount on TCP sockets
[refinement of a previous fix]
- eth: ocelot: mark traps with a bool instead of guessing type based
on list membership
Previous releases - regressions:
- net: fix skipping features in for_each_netdev_feature()
- phy: micrel: fix null-derefs on suspend/resume and probe
- bcmgenet: check for Wake-on-LAN interrupt probe deferral
Previous releases - always broken:
- ipv4: drop dst in multicast routing path, prevent leaks
- ping: fix address binding wrt vrf
- net: fix wrong network header length when BPF protocol translation
is used on skbs with a fraglist
- bluetooth: fix the creation of hdev->name
- rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
- wifi: iwlwifi: iwl-dbg: use del_timer_sync() before freeing
- wifi: ath11k: reduce the wait time of 11d scan and hw scan while
adding an interface
- mac80211: fix rx reordering with non explicit / psmp ack policy
- mac80211: reset MBSSID parameters upon connection
- nl80211: fix races in nl80211_set_tx_bitrate_mask()
- tls: fix context leak on tls_device_down
- sched: act_pedit: really ensure the skb is writable
- batman-adv: don't skb_split skbuffs with frag_list
- eth: ocelot: fix various issues with TC actions (null-deref; bad
stats; ineffective drops; ineffective filter removal)"
* tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
tls: Fix context leak on tls_device_down
net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()
net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending
net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()
mlxsw: Avoid warning during ip6gre device removal
net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral
net: ethernet: mediatek: ppe: fix wrong size passed to memset()
Bluetooth: Fix the creation of hdev->name
i40e: i40e_main: fix a missing check on list iterator
net/sched: act_pedit: really ensure the skb is writable
s390/lcs: fix variable dereferenced before check
s390/ctcm: fix potential memory leak
s390/ctcm: fix variable dereferenced before check
net: atlantic: verify hw_head_ lies within TX buffer ring
net: atlantic: add check for MAX_SKB_FRAGS
net: atlantic: reduce scope of is_rsc_complete
net: atlantic: fix "frag[0] not initialized"
net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()
net: phy: micrel: Fix incorrect variable type in micrel
decnet: Use container_of() for struct dn_neigh casts
...
|
|
In the NIC ->probe() callback, ->mtd_probe() callback is called.
If NIC has 2 ports, ->probe() is called twice and ->mtd_probe() too.
In the ->mtd_probe(), which is efx_ef10_mtd_probe() it allocates and
initializes mtd partiion.
But mtd partition for sfc is shared data.
So that allocated mtd partition data from last called
efx_ef10_mtd_probe() will not be used.
Therefore it must be freed.
But it doesn't free a not used mtd partition data in efx_ef10_mtd_probe().
kmemleak reports:
unreferenced object 0xffff88811ddb0000 (size 63168):
comm "systemd-udevd", pid 265, jiffies 4294681048 (age 348.586s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffffa3767749>] kmalloc_order_trace+0x19/0x120
[<ffffffffa3873f0e>] __kmalloc+0x20e/0x250
[<ffffffffc041389f>] efx_ef10_mtd_probe+0x11f/0x270 [sfc]
[<ffffffffc0484c8a>] efx_pci_probe.cold.17+0x3df/0x53d [sfc]
[<ffffffffa414192c>] local_pci_probe+0xdc/0x170
[<ffffffffa4145df5>] pci_device_probe+0x235/0x680
[<ffffffffa443dd52>] really_probe+0x1c2/0x8f0
[<ffffffffa443e72b>] __driver_probe_device+0x2ab/0x460
[<ffffffffa443e92a>] driver_probe_device+0x4a/0x120
[<ffffffffa443f2ae>] __driver_attach+0x16e/0x320
[<ffffffffa4437a90>] bus_for_each_dev+0x110/0x190
[<ffffffffa443b75e>] bus_add_driver+0x39e/0x560
[<ffffffffa4440b1e>] driver_register+0x18e/0x310
[<ffffffffc02e2055>] 0xffffffffc02e2055
[<ffffffffa3001af3>] do_one_initcall+0xc3/0x450
[<ffffffffa33ca574>] do_init_module+0x1b4/0x700
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Fixes: 8127d661e77f ("sfc: Add support for Solarflare SFC9100 family")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20220512054709.12513-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After commit 2d1f90f9ba83 ("net: dsa/bcm_sf2: fix incorrect usage of
state->link") the interface suspend path would call our mac_link_down()
call back which would forcibly set the link down, thus preventing
Wake-on-LAN packets from reaching our management port.
Fix this by looking at whether the port is enabled for Wake-on-LAN and
not clearing the link status in that case to let packets go through.
Fixes: 2d1f90f9ba83 ("net: dsa/bcm_sf2: fix incorrect usage of state->link")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220512021731.2494261-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
IPv6 addresses which are used for tunnels are stored in a hash table
with reference counting. When a new GRE tunnel is configured, the driver
is notified and configures it in hardware.
Currently, any change in the tunnel is not applied in the driver. It
means that if the remote address is changed, the driver is not aware of
this change and the first address will be used.
This behavior results in a warning [1] in scenarios such as the
following:
# ip link add name gre1 type ip6gre local 2000::3 remote 2000::fffe tos inherit ttl inherit
# ip link set name gre1 type ip6gre local 2000::3 remote 2000::ffff ttl inherit
# ip link delete gre1
The change of the address is not applied in the driver. Currently, the
driver uses the remote address which is stored in the 'parms' of the
overlay device. When the tunnel is removed, the new IPv6 address is
used, the driver tries to release it, but as it is not aware of the
change, this address is not configured and it warns about releasing non
existing IPv6 address.
Fix it by using the IPv6 address which is cached in the IPIP entry, this
address is the last one that the driver used, so even in cases such the
above, the first address will be released, without any warning.
[1]:
WARNING: CPU: 1 PID: 2197 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2920 mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum]
...
CPU: 1 PID: 2197 Comm: ip Not tainted 5.17.0-rc8-custom-95062-gc1e5ded51a9a #84
Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
RIP: 0010:mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum]
...
Call Trace:
<TASK>
mlxsw_sp2_ipip_rem_addr_unset_gre6+0xf1/0x120 [mlxsw_spectrum]
mlxsw_sp_netdevice_ipip_ol_event+0xdb/0x640 [mlxsw_spectrum]
mlxsw_sp_netdevice_event+0xc4/0x850 [mlxsw_spectrum]
raw_notifier_call_chain+0x3c/0x50
call_netdevice_notifiers_info+0x2f/0x80
unregister_netdevice_many+0x311/0x6d0
rtnl_dellink+0x136/0x360
rtnetlink_rcv_msg+0x12f/0x380
netlink_rcv_skb+0x49/0xf0
netlink_unicast+0x233/0x340
netlink_sendmsg+0x202/0x440
____sys_sendmsg+0x1f3/0x220
___sys_sendmsg+0x70/0xb0
__sys_sendmsg+0x54/0xa0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: e846efe2737b ("mlxsw: spectrum: Add hash table for IPv6 address mapping")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220511115747.238602-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add VF rate limit feature
This patch enhances the NFP driver to supports assignment of
both max_tx_rate and min_tx_rate to VFs
The template of configurations below is all supported.
e.g.
# ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE
# ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE
# ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE \
min_tx_rate $RATE_VALUE
# ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE \
max_tx_rate $RATE_VALUE
The max RATE_VALUE is limited to 0xFFFF which is about
63Gbps (using 1024 for 1G)
Signed-off-by: Bin Chen <bin.chen@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|