diff options
Diffstat (limited to 'Documentation/networking/devlink/mlx5.rst')
| -rw-r--r-- | Documentation/networking/devlink/mlx5.rst | 143 |
1 files changed, 138 insertions, 5 deletions
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst index 702f204a3dbd..4bba4d780a4a 100644 --- a/Documentation/networking/devlink/mlx5.rst +++ b/Documentation/networking/devlink/mlx5.rst @@ -15,23 +15,62 @@ Parameters * - Name - Mode - Validation + - Notes * - ``enable_roce`` - driverinit - - Type: Boolean - - If the device supports RoCE disablement, RoCE enablement state controls + - Boolean + - If the device supports RoCE disablement, RoCE enablement state controls device support for RoCE capability. Otherwise, the control occurs in the driver stack. When RoCE is disabled at the driver level, only raw ethernet QPs are supported. * - ``io_eq_size`` - driverinit - The range is between 64 and 4096. + - * - ``event_eq_size`` - driverinit - The range is between 64 and 4096. + - * - ``max_macs`` - driverinit - The range is between 1 and 2^31. Only power of 2 values are supported. + - + * - ``enable_sriov`` + - permanent + - Boolean + - Applies to each physical function (PF) independently, if the device + supports it. Otherwise, it applies symmetrically to all PFs. + * - ``total_vfs`` + - permanent + - The range is between 1 and a device-specific max. + - Applies to each physical function (PF) independently, if the device + supports it. Otherwise, it applies symmetrically to all PFs. + +Note: permanent parameters such as ``enable_sriov`` and ``total_vfs`` require FW reset to take effect + +.. code-block:: bash + + # setup parameters + devlink dev param set pci/0000:01:00.0 name enable_sriov value true cmode permanent + devlink dev param set pci/0000:01:00.0 name total_vfs value 8 cmode permanent + + # Fw reset + devlink dev reload pci/0000:01:00.0 action fw_activate + + # for PCI related config such as sriov PCI reset/rescan is required: + echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove + echo 1 >/sys/bus/pci/rescan + grep ^ /sys/bus/pci/devices/0000:01:00.0/sriov_* + + * - ``num_doorbells`` + - driverinit + - This controls the number of channel doorbells used by the netdev. In all + cases, an additional doorbell is allocated and used for non-channel + communication (e.g. for PTP, HWS, etc.). Supported values are: + + - 0: No channel-specific doorbells, use the global one for everything. + - [1, max_num_channels]: Spread netdev channels equally across these + doorbells. The ``mlx5`` driver also implements the following driver-specific parameters. @@ -53,6 +92,9 @@ parameters. * ``smfs`` Software managed flow steering. In SMFS mode, the HW steering entities are created and manage through the driver without firmware intervention. + * ``hmfs`` Hardware managed flow steering. In HMFS mode, the driver + is configuring steering rules directly to the HW using Work Queues with + a special new type of WQE (Work Queue Element). SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode. @@ -97,6 +139,10 @@ parameters. When metadata is disabled, the above use cases will fail to initialize if users try to enable them. + + Note: Setting this parameter does not take effect immediately. Setting + must happen in legacy mode and eswitch port metadata takes effect after + enabling switchdev mode. * - ``hairpin_num_queues`` - u32 - driverinit @@ -109,6 +155,82 @@ parameters. - u32 - driverinit - Control the size (in packets) of the hairpin queues. + * - ``pcie_cong_inbound_high`` + - u16 + - driverinit + - High threshold configuration for PCIe congestion events. The firmware + will send an event once device side inbound PCIe traffic went + above the configured high threshold for a long enough period (at least + 200ms). + + See pci_bw_inbound_high ethtool stat. + + Units are 0.01 %. Accepted values are in range [0, 10000]. + pcie_cong_inbound_low < pcie_cong_inbound_high. + Default value: 9000 (Corresponds to 90%). + * - ``pcie_cong_inbound_low`` + - u16 + - driverinit + - Low threshold configuration for PCIe congestion events. The firmware + will send an event once device side inbound PCIe traffic went + below the configured low threshold, only after having been previously in + a congested state. + + See pci_bw_inbound_low ethtool stat. + + Units are 0.01 %. Accepted values are in range [0, 10000]. + pcie_cong_inbound_low < pcie_cong_inbound_high. + Default value: 7500. + * - ``pcie_cong_outbound_high`` + - u16 + - driverinit + - High threshold configuration for PCIe congestion events. The firmware + will send an event once device side outbound PCIe traffic went + above the configured high threshold for a long enough period (at least + 200ms). + + See pci_bw_outbound_high ethtool stat. + + Units are 0.01 %. Accepted values are in range [0, 10000]. + pcie_cong_outbound_low < pcie_cong_outbound_high. + Default value: 9000 (Corresponds to 90%). + * - ``pcie_cong_outbound_low`` + - u16 + - driverinit + - Low threshold configuration for PCIe congestion events. The firmware + will send an event once device side outbound PCIe traffic went + below the configured low threshold, only after having been previously in + a congested state. + + See pci_bw_outbound_low ethtool stat. + + Units are 0.01 %. Accepted values are in range [0, 10000]. + pcie_cong_outbound_low < pcie_cong_outbound_high. + Default value: 7500. + + * - ``cqe_compress_type`` + - string + - permanent + - Configure which mechanism/algorithm should be used by the NIC that will + affect the rate (aggressiveness) of compressed CQEs depending on PCIe bus + conditions and other internal NIC factors. This mode affects all queues + that enable compression. + * ``balanced`` : Merges fewer CQEs, resulting in a moderate compression ratio but maintaining a balance between bandwidth savings and performance + * ``aggressive`` : Merges more CQEs into a single entry, achieving a higher compression rate and maximizing performance, particularly under high traffic loads + + * - ``swp_l4_csum_mode`` + - string + - permanent + - Configure how the L4 checksum is calculated by the device when using + Software Parser (SWP) hints for header locations. + + * ``default`` : Use the device's default checksum calculation + mode. The driver will discover during init whether or + full_csum or l4_only is in use. Setting this value explicitly + from userspace is not allowed, but some firmware versions may + return this value on param read. + * ``full_csum`` : Calculate full checksum including the pseudo-header + * ``l4_only`` : Calculate L4-only checksum, excluding the pseudo-header The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD`` @@ -246,7 +368,7 @@ them in realtime. Description of the vnic counters: -- total_q_under_processor_handle +- total_error_queues number of queues in an error state due to an async error or errored command. - send_queue_priority_update_flow @@ -255,7 +377,8 @@ Description of the vnic counters: number of times CQ entered an error state due to an overflow. - async_eq_overrun number of times an EQ mapped to async events was overrun. - comp_eq_overrun number of times an EQ mapped to completion events was +- comp_eq_overrun + number of times an EQ mapped to completion events was overrun. - quota_exceeded_command number of commands issued and failed due to quota exceeded. @@ -272,6 +395,16 @@ Description of the vnic counters: number of packets handled by the VNIC experiencing unexpected steering failure (at any point in steering flow owned by the VNIC, including the FDB for the eswitch owner). +- icm_consumption + amount of Interconnect Host Memory (ICM) consumed by the vnic in + granularity of 4KB. ICM is host memory allocated by SW upon HCA request + and is used for storing data structures that control HCA operation. +- bar_uar_access + number of WRITE or READ access operations to the UAR on the PCIe BAR. +- odp_local_triggered_page_fault + number of locally-triggered page-faults due to ODP. +- odp_remote_triggered_page_fault + number of remotly-triggered page-faults due to ODP. User commands examples: |
