Age | Commit message (Collapse) | Author |
|
Support ethtool -g / -G. Leverage the code added for -l / -L
to alloc / stop / start / free.
Check parameters against HW min/max but also our own min/max.
Min HW queue is 16 entries, we can't deal with TWQs that small
because of the queue waking logic. Add similar contraint on RCQ
for symmetry.
We need 3 sizes on Rx, as the NIC does header-data split two separate
buffer pools:
(1) head page ring - how many empty pages we post for headers
(2) payload page ring - how many empty pages we post for payloads
(3) completion ring - where NIC produces the Rx descriptors
Acked-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250306145150.1757263-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add ethtool support to configure the IRQ coalescing behavior. Support
separate timers for Rx and Tx for time based coalescing. For frame based
configuration, currently we only support the Rx side.
The hardware allows configuration of descriptor count instead of frame
count requiring conversion between the two. We assume 2 descriptors
per frame, one for the metadata and one for the data segment.
When rx-frames are not configured, we set the RX descriptor count to
half the ring size as a fail safe.
Default configuration:
ethtool -c eth0 | grep -E "rx-usecs:|tx-usecs:|rx-frames:"
rx-usecs: 30
rx-frames: 0
tx-usecs: 35
IRQ rate test:
With single iperf flow we monitor IRQ rate while changing the tx-usesc and
rx-usecs to high and low values.
ethtool -C eth0 rx-frames 8192 rx-usecs 150 tx-usecs 150
irq/sec 13k
irq/sec 14k
irq/sec 14k
ethtool -C eth0 rx-frames 8192 rx-usecs 10 tx-usecs 10
irq/sec 27k
irq/sec 28k
irq/sec 28k
Validating the use of extack:
ethtool -C eth0 rx-frames 16384
netlink error: fbnic: rx_frames is above device max
netlink error: Invalid argument
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250218023520.2038010-1-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add TSO support to the driver. Device can handle unencapsulated or
IPv6-in-IPv6 packets. Any other tunnel stacks are handled with
GSO partial.
Validate that the packet can be offloaded in ndo_features_check.
Main thing we need to check for is that the header geometry can
be expressed in the decriptor fields (offsets aren't too large).
Report number of TSO super-packets via the qstat API.
Link: https://patch.msgid.link/20250216174109.2808351-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gather and report software Tx queue stats - checksum stats
and queue stop / start.
Acked-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250211181356.580800-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gather and report software Rx queue stats - checksum stats
and allocation failures.
Acked-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250211181356.580800-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The queue stats struct is used for Rx and Tx queues. Wrap
the Tx stats in a struct and a union, so that we can reuse
the same space for Rx stats on Rx queues.
This also makes it easy to add an assert to the stat handling
code to catch new stats not being aggregated on shutdown.
Acked-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250211181356.580800-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement the channel count changes. Copy the netdev priv,
allocate new channels using it. Stop, swap, start.
Then free the copy of the priv along with the channels it
holds, which are now the channels that used to be on the
real priv.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241220025241.1522781-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To simplify dealing with RTNL_ASSERT() requirements further
down the line, move setting queue count and NAPI<>queue
association to their own helpers.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/20241220025241.1522781-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Change our method of swapping NAPIs without disturbing existing config.
This is primarily needed for "live reconfiguration" such as changing
the channel count when interface is already up.
Previously we were planning to use a trick of using shared interrupts.
We would install a second IRQ handler for the new NAPI, and make it
return IRQ_NONE until we were ready for it to take over. This works fine
functionally but breaks IRQ naming. The IRQ subsystem uses the IRQ name
to create the procfs entry, since both handlers used the same name
the second handler wouldn't get a proc directory registered.
When first one gets removed on success full ring count change
it would remove its directory and we would be left with none.
New approach uses a double pointer to the NAPI. The IRQ handler needs
to know how to locate the NAPI to schedule. We register a single IRQ handler
and give it a pointer to a pointer. We can then change what it points to
without re-registering. This may have a tiny perf impact, but really
really negligible.
Link: https://patch.msgid.link/20241220025241.1522781-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We will need an array for storing NAPIs in the upcoming IRQ handler
reuse rework. Replace the current list we have, so that we are able
to reuse it later.
In a few places replace i as the iterator with t when we iterate
over triads, this seems slightly less confusing than having
i, j, k variables.
Link: https://patch.msgid.link/20241220025241.1522781-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add counters of packets with HW timestamps requests and lost timestamps
with no associated skbs. Use ethtool interface to report these counters.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add callbacks to support timestamping configuration via ethtool.
Add processing of RX timestamps.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Count packets, bytes and drop on the datapath, and report
to the user. Since queues are completely freed when the
device is down - accumulate the stats in the main netdev struct.
This means that per-queue stats will only report values since
last reset (per qstat recommendation).
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240810054322.2766421-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Handle Rx packets with basic csum and Rx hash offloads.
NIC writes back to the completion ring a head buffer descriptor
(data buffer allocated from header pages), variable number of payload
descriptors (data buffers in payload pages), an optional metadata
descriptor (type 2) and finally the primary metadata descriptor
(type 3).
This format makes scatter support fairly easy - start gathering
the pages when we see head page, gather until we see the primary
metadata descriptor, do the processing. Use XDP infra to collect
the packet fragments as we traverse the descriptors. XDP itself
is not supported yet, but it will be soon.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079942839.1778861.10509071985738726125.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Handle Tx of simple packets. Support checksum offload and gather.
Use .ndo_features_check to make sure packet geometry will be
supported by the HW, i.e. we can fit the header lengths into
the descriptor fields.
The device writes to the completion rings the position of the tail
(consumer) pointer. Read all those writebacks, obviously the last
one will be the most recent, complete skbs up to that point.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079942464.1778861.17919428039428796180.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement control path parts of Rx queue handling.
The NIC consumes memory in pages. It takes a full page and places
packets into it in a configurable manner (with the ability to define
headroom / tailroom as well as head alignment requirements).
As mentioned in prior patches there are two page submissions queues
one for packet headers and second (optional) for packet payloads.
For now feed both queues from a single page pool.
Use the page pool "fragment" API, as we can't predict upfront
how the page will be sliced.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079939092.1778861.3780136633831329550.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement basic management operations for Tx queues.
Allocate memory for submission and completion rings.
Learn how to start the queues, stop them, and wait for HW
to be idle.
We call HW rings "descriptor rings" (stored in ring->desc),
and SW context rings "buffer rings" (stored in ring->*_buf union).
This is the first patch which actually touches CSRs so add CSR
helpers.
No actual datapath / packet handling here, yet.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079938724.1778861.8329677776612865169.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allocate a netdev and figure out basics like how many queues
we need, MAC address, MTU bounds. Kick off a service task
to do various periodic things like health checking.
The service task only runs when device is open.
We have four levels of objects here:
- ring - A HW ring with head / tail pointers,
- triad - Two submission and one completion ring,
- NAPI - NAPI, with one IRQ and any number of Rx and Tx triads,
- Netdev - The ultimate container of the rings and napi vectors.
The "triad" is the only less-than-usual construct. On Rx we have
two "free buffer" submission rings, one for packet headers and
one for packet data. On Tx we have separate rings for XDP Tx
and normal Tx. So we ended up with ring triplets in both
directions.
We keep NAPIs on a local list, even though core already maintains a list.
Later on having a separate list will matter for live reconfig.
We introduce the list already, the churn would not be worth it.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079938358.1778861.11681469974633489463.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|