summaryrefslogtreecommitdiff
path: root/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking/device_drivers/ethernet/amazon/ena.rst')
-rw-r--r--Documentation/networking/device_drivers/ethernet/amazon/ena.rst154
1 files changed, 154 insertions, 0 deletions
diff --git a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
index 8bcb173e0353..14784a0a6a8a 100644
--- a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
+++ b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
@@ -38,6 +38,7 @@ debug logs.
Some of the ENA devices support a working mode called Low-latency
Queue (LLQ), which saves several more microseconds.
+
ENA Source Code Directory Structure
===================================
@@ -53,7 +54,11 @@ ena_common_defs.h Common definitions for ena_com layer.
ena_regs_defs.h Definition of ENA PCI memory-mapped (MMIO) registers.
ena_netdev.[ch] Main Linux kernel driver.
ena_ethtool.c ethtool callbacks.
+ena_xdp.[ch] XDP files
ena_pci_id_tbl.h Supported device IDs.
+ena_phc.[ch] PTP hardware clock infrastructure (see `PHC`_ for more info)
+ena_devlink.[ch] devlink files.
+ena_debugfs.[ch] debugfs files.
================= ======================================================
Management Interface:
@@ -205,12 +210,113 @@ Adaptive coalescing can be switched on/off through `ethtool(8)`'s
More information about Adaptive Interrupt Moderation (DIM) can be found in
Documentation/networking/net_dim.rst
+.. _`RX copybreak`:
+
RX copybreak
============
+
The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
and can be configured by the ETHTOOL_STUNABLE command of the
SIOCETHTOOL ioctl.
+This option controls the maximum packet length for which the RX
+descriptor it was received on would be recycled. When a packet smaller
+than RX copybreak bytes is received, it is copied into a new memory
+buffer and the RX descriptor is returned to HW.
+
+.. _`PHC`:
+
+PTP Hardware Clock (PHC)
+========================
+.. _`ptp-userspace-api`: https://docs.kernel.org/driver-api/ptp.html#ptp-hardware-clock-user-space-api
+.. _`testptp`: https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/ptp/testptp.c
+
+ENA Linux driver supports PTP hardware clock providing timestamp reference to achieve nanosecond resolution.
+
+**PHC support**
+
+PHC depends on the PTP module, which needs to be either loaded as a module or compiled into the kernel.
+
+Verify if the PTP module is present:
+
+.. code-block:: shell
+
+ grep -w '^CONFIG_PTP_1588_CLOCK=[ym]' /boot/config-`uname -r`
+
+- If no output is provided, the ENA driver cannot be loaded with PHC support.
+
+**PHC activation**
+
+The feature is turned off by default, in order to turn the feature on, the ENA driver
+can be loaded in the following way:
+
+- devlink:
+
+.. code-block:: shell
+
+ sudo devlink dev param set pci/<domain:bus:slot.function> name enable_phc value true cmode driverinit
+ sudo devlink dev reload pci/<domain:bus:slot.function>
+ # for example:
+ sudo devlink dev param set pci/0000:00:06.0 name enable_phc value true cmode driverinit
+ sudo devlink dev reload pci/0000:00:06.0
+
+All available PTP clock sources can be tracked here:
+
+.. code-block:: shell
+
+ ls /sys/class/ptp
+
+PHC support and capabilities can be verified using ethtool:
+
+.. code-block:: shell
+
+ ethtool -T <interface>
+
+**PHC timestamp**
+
+To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example using `testptp`_:
+
+.. code-block:: shell
+
+ testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware Clock:/ {print $NF}') -k 1
+
+PHC get time requests should be within reasonable bounds,
+avoid excessive utilization to ensure optimal performance and efficiency.
+The ENA device restricts the frequency of PHC get time requests to a maximum
+of 125 requests per second. If this limit is surpassed, the get time request
+will fail, leading to an increment in the phc_err_ts statistic.
+
+**PHC statistics**
+
+PHC can be monitored using debugfs (if mounted):
+
+.. code-block:: shell
+
+ sudo cat /sys/kernel/debug/<domain:bus:slot.function>/phc_stats
+
+ # for example:
+ sudo cat /sys/kernel/debug/0000:00:06.0/phc_stats
+
+PHC errors must remain below 1% of all PHC requests to maintain the desired level of accuracy and reliability
+
+================= ======================================================
+**phc_cnt** | Number of successful retrieved timestamps (below expire timeout).
+**phc_exp** | Number of expired retrieved timestamps (above expire timeout).
+**phc_skp** | Number of skipped get time attempts (during block period).
+**phc_err_dv** | Number of failed get time attempts due to device errors (entering into block state).
+**phc_err_ts** | Number of failed get time attempts due to timestamp errors (entering into block state),
+ | This occurs if driver exceeded the request limit or device received an invalid timestamp.
+================= ======================================================
+
+PHC timeouts:
+
+================= ======================================================
+**expire** | Max time for a valid timestamp retrieval, passing this threshold will fail
+ | the get time request and block new requests until block timeout.
+**block** | Blocking period starts once get time request expires or fails,
+ | all get time requests during block period will be skipped.
+================= ======================================================
+
Statistics
==========
@@ -220,6 +326,11 @@ per-queue stats) from the device.
In addition the driver logs the stats to syslog upon device reset.
+On supported instance types, the statistics will also include the
+ENA Express data (fields prefixed with `ena_srd`). For a complete
+documentation of ENA Express data refer to
+https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ena-express.html#ena-express-monitor
+
MTU
===
@@ -253,6 +364,18 @@ RSS
- The user can provide a hash key, hash function, and configure the
indirection table through `ethtool(8)`.
+DEVLINK SUPPORT
+===============
+.. _`devlink`: https://www.kernel.org/doc/html/latest/networking/devlink/index.html
+
+`devlink`_ supports reloading the driver and initiating re-negotiation with the ENA device
+
+.. code-block:: shell
+
+ sudo devlink dev reload pci/<domain:bus:slot.function>
+ # for example:
+ sudo devlink dev reload pci/0000:00:06.0
+
DATA PATH
=========
@@ -315,3 +438,34 @@ Rx
- The new SKB is updated with the necessary information (protocol,
checksum hw verify result, etc), and then passed to the network
stack, using the NAPI interface function :code:`napi_gro_receive()`.
+
+Dynamic RX Buffers (DRB)
+------------------------
+
+Each RX descriptor in the RX ring is a single memory page (which is either 4KB
+or 16KB long depending on system's configurations).
+To reduce the memory allocations required when dealing with a high rate of small
+packets, the driver tries to reuse the remaining RX descriptor's space if more
+than 2KB of this page remain unused.
+
+A simple example of this mechanism is the following sequence of events:
+
+::
+
+ 1. Driver allocates page-sized RX buffer and passes it to hardware
+ +----------------------+
+ |4KB RX Buffer |
+ +----------------------+
+
+ 2. A 300Bytes packet is received on this buffer
+
+ 3. The driver increases the ref count on this page and returns it back to
+ HW as an RX buffer of size 4KB - 300Bytes = 3796 Bytes
+ +----+--------------------+
+ |****|3796 Bytes RX Buffer|
+ +----+--------------------+
+
+This mechanism isn't used when an XDP program is loaded, or when the
+RX packet is less than rx_copybreak bytes (in which case the packet is
+copied out of the RX buffer into the linear part of a new skb allocated
+for it and the RX buffer remains the same size, see `RX copybreak`_).