diff options
Diffstat (limited to 'Documentation/networking/device_drivers/ethernet/amazon/ena.rst')
| -rw-r--r-- | Documentation/networking/device_drivers/ethernet/amazon/ena.rst | 154 |
1 files changed, 154 insertions, 0 deletions
diff --git a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst index 8bcb173e0353..14784a0a6a8a 100644 --- a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst +++ b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst @@ -38,6 +38,7 @@ debug logs. Some of the ENA devices support a working mode called Low-latency Queue (LLQ), which saves several more microseconds. + ENA Source Code Directory Structure =================================== @@ -53,7 +54,11 @@ ena_common_defs.h Common definitions for ena_com layer. ena_regs_defs.h Definition of ENA PCI memory-mapped (MMIO) registers. ena_netdev.[ch] Main Linux kernel driver. ena_ethtool.c ethtool callbacks. +ena_xdp.[ch] XDP files ena_pci_id_tbl.h Supported device IDs. +ena_phc.[ch] PTP hardware clock infrastructure (see `PHC`_ for more info) +ena_devlink.[ch] devlink files. +ena_debugfs.[ch] debugfs files. ================= ====================================================== Management Interface: @@ -205,12 +210,113 @@ Adaptive coalescing can be switched on/off through `ethtool(8)`'s More information about Adaptive Interrupt Moderation (DIM) can be found in Documentation/networking/net_dim.rst +.. _`RX copybreak`: + RX copybreak ============ + The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK and can be configured by the ETHTOOL_STUNABLE command of the SIOCETHTOOL ioctl. +This option controls the maximum packet length for which the RX +descriptor it was received on would be recycled. When a packet smaller +than RX copybreak bytes is received, it is copied into a new memory +buffer and the RX descriptor is returned to HW. + +.. _`PHC`: + +PTP Hardware Clock (PHC) +======================== +.. _`ptp-userspace-api`: https://docs.kernel.org/driver-api/ptp.html#ptp-hardware-clock-user-space-api +.. _`testptp`: https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/ptp/testptp.c + +ENA Linux driver supports PTP hardware clock providing timestamp reference to achieve nanosecond resolution. + +**PHC support** + +PHC depends on the PTP module, which needs to be either loaded as a module or compiled into the kernel. + +Verify if the PTP module is present: + +.. code-block:: shell + + grep -w '^CONFIG_PTP_1588_CLOCK=[ym]' /boot/config-`uname -r` + +- If no output is provided, the ENA driver cannot be loaded with PHC support. + +**PHC activation** + +The feature is turned off by default, in order to turn the feature on, the ENA driver +can be loaded in the following way: + +- devlink: + +.. code-block:: shell + + sudo devlink dev param set pci/<domain:bus:slot.function> name enable_phc value true cmode driverinit + sudo devlink dev reload pci/<domain:bus:slot.function> + # for example: + sudo devlink dev param set pci/0000:00:06.0 name enable_phc value true cmode driverinit + sudo devlink dev reload pci/0000:00:06.0 + +All available PTP clock sources can be tracked here: + +.. code-block:: shell + + ls /sys/class/ptp + +PHC support and capabilities can be verified using ethtool: + +.. code-block:: shell + + ethtool -T <interface> + +**PHC timestamp** + +To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example using `testptp`_: + +.. code-block:: shell + + testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware Clock:/ {print $NF}') -k 1 + +PHC get time requests should be within reasonable bounds, +avoid excessive utilization to ensure optimal performance and efficiency. +The ENA device restricts the frequency of PHC get time requests to a maximum +of 125 requests per second. If this limit is surpassed, the get time request +will fail, leading to an increment in the phc_err_ts statistic. + +**PHC statistics** + +PHC can be monitored using debugfs (if mounted): + +.. code-block:: shell + + sudo cat /sys/kernel/debug/<domain:bus:slot.function>/phc_stats + + # for example: + sudo cat /sys/kernel/debug/0000:00:06.0/phc_stats + +PHC errors must remain below 1% of all PHC requests to maintain the desired level of accuracy and reliability + +================= ====================================================== +**phc_cnt** | Number of successful retrieved timestamps (below expire timeout). +**phc_exp** | Number of expired retrieved timestamps (above expire timeout). +**phc_skp** | Number of skipped get time attempts (during block period). +**phc_err_dv** | Number of failed get time attempts due to device errors (entering into block state). +**phc_err_ts** | Number of failed get time attempts due to timestamp errors (entering into block state), + | This occurs if driver exceeded the request limit or device received an invalid timestamp. +================= ====================================================== + +PHC timeouts: + +================= ====================================================== +**expire** | Max time for a valid timestamp retrieval, passing this threshold will fail + | the get time request and block new requests until block timeout. +**block** | Blocking period starts once get time request expires or fails, + | all get time requests during block period will be skipped. +================= ====================================================== + Statistics ========== @@ -220,6 +326,11 @@ per-queue stats) from the device. In addition the driver logs the stats to syslog upon device reset. +On supported instance types, the statistics will also include the +ENA Express data (fields prefixed with `ena_srd`). For a complete +documentation of ENA Express data refer to +https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ena-express.html#ena-express-monitor + MTU === @@ -253,6 +364,18 @@ RSS - The user can provide a hash key, hash function, and configure the indirection table through `ethtool(8)`. +DEVLINK SUPPORT +=============== +.. _`devlink`: https://www.kernel.org/doc/html/latest/networking/devlink/index.html + +`devlink`_ supports reloading the driver and initiating re-negotiation with the ENA device + +.. code-block:: shell + + sudo devlink dev reload pci/<domain:bus:slot.function> + # for example: + sudo devlink dev reload pci/0000:00:06.0 + DATA PATH ========= @@ -315,3 +438,34 @@ Rx - The new SKB is updated with the necessary information (protocol, checksum hw verify result, etc), and then passed to the network stack, using the NAPI interface function :code:`napi_gro_receive()`. + +Dynamic RX Buffers (DRB) +------------------------ + +Each RX descriptor in the RX ring is a single memory page (which is either 4KB +or 16KB long depending on system's configurations). +To reduce the memory allocations required when dealing with a high rate of small +packets, the driver tries to reuse the remaining RX descriptor's space if more +than 2KB of this page remain unused. + +A simple example of this mechanism is the following sequence of events: + +:: + + 1. Driver allocates page-sized RX buffer and passes it to hardware + +----------------------+ + |4KB RX Buffer | + +----------------------+ + + 2. A 300Bytes packet is received on this buffer + + 3. The driver increases the ref count on this page and returns it back to + HW as an RX buffer of size 4KB - 300Bytes = 3796 Bytes + +----+--------------------+ + |****|3796 Bytes RX Buffer| + +----+--------------------+ + +This mechanism isn't used when an XDP program is loaded, or when the +RX packet is less than rx_copybreak bytes (in which case the packet is +copied out of the RX buffer into the linear part of a new skb allocated +for it and the RX buffer remains the same size, see `RX copybreak`_). |
