diff options
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst | 46 | ||||
-rw-r--r-- | Documentation/networking/devlink/devlink-port.rst | 122 | ||||
-rw-r--r-- | Documentation/networking/ethtool-netlink.rst | 31 | ||||
-rw-r--r-- | Documentation/networking/timestamping.rst | 32 |
4 files changed, 196 insertions, 35 deletions
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst index e8fa7ac9e6b1..6969652f593c 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst @@ -351,42 +351,26 @@ driver. MAC address setup ----------------- -mlx5 driver provides mechanism to setup the MAC address of the PCI VF/SF. +mlx5 driver support devlink port function attr mechanism to setup MAC +address. (refer to Documentation/networking/devlink/devlink-port.rst) -The configured MAC address of the PCI VF/SF will be used by netdevice and rdma -device created for the PCI VF/SF. +RoCE capability setup +--------------------- +Not all mlx5 PCI devices/SFs require RoCE capability. -- Get the MAC address of the VF identified by its unique devlink port index:: +When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per +PCI devices/SF. - $ devlink port show pci/0000:06:00.0/2 - pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 - function: - hw_addr 00:00:00:00:00:00 - -- Set the MAC address of the VF identified by its unique devlink port index:: - - $ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55 - - $ devlink port show pci/0000:06:00.0/2 - pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 - function: - hw_addr 00:11:22:33:44:55 - -- Get the MAC address of the SF identified by its unique devlink port index:: - - $ devlink port show pci/0000:06:00.0/32768 - pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 - function: - hw_addr 00:00:00:00:00:00 - -- Set the MAC address of the SF identified by its unique devlink port index:: +mlx5 driver support devlink port function attr mechanism to setup RoCE +capability. (refer to Documentation/networking/devlink/devlink-port.rst) - $ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 +migratable capability setup +--------------------------- +User who wants mlx5 PCI VFs to be able to perform live migration need to +explicitly enable the VF migratable capability. - $ devlink port show pci/0000:06:00.0/32768 - pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 - function: - hw_addr 00:00:00:00:88:88 +mlx5 driver support devlink port function attr mechanism to setup migratable +capability. (refer to Documentation/networking/devlink/devlink-port.rst) SF state setup -------------- diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst index 98557c2ab1c1..3da590953ce8 100644 --- a/Documentation/networking/devlink/devlink-port.rst +++ b/Documentation/networking/devlink/devlink-port.rst @@ -110,7 +110,7 @@ devlink ports for both the controllers. Function configuration ====================== -A user can configure the function attribute before enumerating the PCI +Users can configure one or more function attributes before enumerating the PCI function. Usually it means, user should configure function attribute before a bus specific device for the function is created. However, when SRIOV is enabled, virtual function devices are created on the PCI bus. @@ -119,9 +119,127 @@ function device to the driver. For subfunctions, this means user should configure port function attribute before activating the port function. A user may set the hardware address of the function using -'devlink port function set hw_addr' command. For Ethernet port function +`devlink port function set hw_addr` command. For Ethernet port function this means a MAC address. +Users may also set the RoCE capability of the function using +`devlink port function set roce` command. + +Users may also set the function as migratable using +'devlink port function set migratable' command. + +Function attributes +=================== + +MAC address setup +----------------- +The configured MAC address of the PCI VF/SF will be used by netdevice and rdma +device created for the PCI VF/SF. + +- Get the MAC address of the VF identified by its unique devlink port index:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 + +- Set the MAC address of the VF identified by its unique devlink port index:: + + $ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55 + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:11:22:33:44:55 + +- Get the MAC address of the SF identified by its unique devlink port index:: + + $ devlink port show pci/0000:06:00.0/32768 + pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 + function: + hw_addr 00:00:00:00:00:00 + +- Set the MAC address of the SF identified by its unique devlink port index:: + + $ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 + + $ devlink port show pci/0000:06:00.0/32768 + pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 + function: + hw_addr 00:00:00:00:88:88 + +RoCE capability setup +--------------------- +Not all PCI VFs/SFs require RoCE capability. + +When RoCE capability is disabled, it saves system memory per PCI VF/SF. + +When user disables RoCE capability for a VF/SF, user application cannot send or +receive any RoCE packets through this VF/SF and RoCE GID table for this PCI +will be empty. + +When RoCE capability is disabled in the device using port function attribute, +VF/SF driver cannot override it. + +- Get RoCE capability of the VF device:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 roce enable + +- Set RoCE capability of the VF device:: + + $ devlink port function set pci/0000:06:00.0/2 roce disable + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 roce disable + +migratable capability setup +--------------------------- +Live migration is the process of transferring a live virtual machine +from one physical host to another without disrupting its normal +operation. + +User who want PCI VFs to be able to perform live migration need to +explicitly enable the VF migratable capability. + +When user enables migratable capability for a VF, and the HV binds the VF to VFIO driver +with migration support, the user can migrate the VM with this VF from one HV to a +different one. + +However, when migratable capability is enable, device will disable features which cannot +be migrated. Thus migratable cap can impose limitations on a VF so let the user decide. + +Example of LM with migratable function configuration: +- Get migratable capability of the VF device:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 migratable disable + +- Set migratable capability of the VF device:: + + $ devlink port function set pci/0000:06:00.0/2 migratable enable + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 migratable enable + +- Bind VF to VFIO driver with migration support:: + + $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind + $ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override + $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind + +Attach VF to the VM. +Start the VM. +Perform live migration. + Subfunction ============ diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst index bede24ef44fd..f10f8eb44255 100644 --- a/Documentation/networking/ethtool-netlink.rst +++ b/Documentation/networking/ethtool-netlink.rst @@ -222,6 +222,7 @@ Userspace to kernel: ``ETHTOOL_MSG_MODULE_GET`` get transceiver module parameters ``ETHTOOL_MSG_PSE_SET`` set PSE parameters ``ETHTOOL_MSG_PSE_GET`` get PSE parameters + ``ETHTOOL_MSG_RSS_GET`` get RSS settings ===================================== ================================= Kernel to userspace: @@ -263,6 +264,7 @@ Kernel to userspace: ``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY`` PHC virtual clocks info ``ETHTOOL_MSG_MODULE_GET_REPLY`` transceiver module parameters ``ETHTOOL_MSG_PSE_GET_REPLY`` PSE parameters + ``ETHTOOL_MSG_RSS_GET_REPLY`` RSS settings ======================================== ================================= ``GET`` requests are sent by userspace applications to retrieve device @@ -1687,6 +1689,33 @@ to control PoDL PSE Admin functions. This option is implementing ``IEEE 802.3-2018`` 30.15.1.2.1 acPoDLPSEAdminControl. See ``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` for supported values. +RSS_GET +======= + +Get indirection table, hash key and hash function info associated with a +RSS context of an interface similar to ``ETHTOOL_GRSSH`` ioctl request. + +Request contents: + +===================================== ====== ========================== + ``ETHTOOL_A_RSS_HEADER`` nested request header + ``ETHTOOL_A_RSS_CONTEXT`` u32 context number +===================================== ====== ========================== + +Kernel response contents: + +===================================== ====== ========================== + ``ETHTOOL_A_RSS_HEADER`` nested reply header + ``ETHTOOL_A_RSS_HFUNC`` u32 RSS hash func + ``ETHTOOL_A_RSS_INDIR`` binary Indir table bytes + ``ETHTOOL_A_RSS_HKEY`` binary Hash key bytes +===================================== ====== ========================== + +ETHTOOL_A_RSS_HFUNC attribute is bitmap indicating the hash function +being used. Current supported options are toeplitz, xor or crc32. +ETHTOOL_A_RSS_INDIR attribute returns RSS indrection table where each byte +indicates queue number. + Request translation =================== @@ -1768,7 +1797,7 @@ are netlink only. ``ETHTOOL_GMODULEEEPROM`` ``ETHTOOL_MSG_MODULE_EEPROM_GET`` ``ETHTOOL_GEEE`` ``ETHTOOL_MSG_EEE_GET`` ``ETHTOOL_SEEE`` ``ETHTOOL_MSG_EEE_SET`` - ``ETHTOOL_GRSSH`` n/a + ``ETHTOOL_GRSSH`` ``ETHTOOL_MSG_RSS_GET`` ``ETHTOOL_SRSSH`` n/a ``ETHTOOL_GTUNABLE`` n/a ``ETHTOOL_STUNABLE`` n/a diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst index be4eb1242057..f17c01834a12 100644 --- a/Documentation/networking/timestamping.rst +++ b/Documentation/networking/timestamping.rst @@ -179,7 +179,8 @@ SOF_TIMESTAMPING_OPT_ID: identifier and returns that along with the timestamp. The identifier is derived from a per-socket u32 counter (that wraps). For datagram sockets, the counter increments with each sent packet. For stream - sockets, it increments with every byte. + sockets, it increments with every byte. For stream sockets, also set + SOF_TIMESTAMPING_OPT_ID_TCP, see the section below. The counter starts at zero. It is initialized the first time that the socket option is enabled. It is reset each time the option is @@ -192,6 +193,35 @@ SOF_TIMESTAMPING_OPT_ID: among all possibly concurrently outstanding timestamp requests for that socket. +SOF_TIMESTAMPING_OPT_ID_TCP: + Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP + timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the + counter increments for stream sockets, but its starting point is + not entirely trivial. This option fixes that. + + For stream sockets, if SOF_TIMESTAMPING_OPT_ID is set, this should + always be set too. On datagram sockets the option has no effect. + + A reasonable expectation is that the counter is reset to zero with + the system call, so that a subsequent write() of N bytes generates + a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP + implements this behavior under all conditions. + + SOF_TIMESTAMPING_OPT_ID without modifier often reports the same, + especially when the socket option is set when no data is in + transmission. If data is being transmitted, it may be off by the + length of the output queue (SIOCOUTQ). + + The difference is due to being based on snd_una versus write_seq. + snd_una is the offset in the stream acknowledged by the peer. This + depends on factors outside of process control, such as network RTT. + write_seq is the last byte written by the process. This offset is + not affected by external inputs. + + The difference is subtle and unlikely to be noticed when configured + at initial socket creation, when no data is queued or sent. But + SOF_TIMESTAMPING_OPT_ID_TCP behavior is more robust regardless of + when the socket option is set. SOF_TIMESTAMPING_OPT_CMSG: Support recv() cmsg for all timestamped packets. Control messages |