diff options
Diffstat (limited to 'Documentation/networking/scaling.rst')
| -rw-r--r-- | Documentation/networking/scaling.rst | 74 |
1 files changed, 70 insertions, 4 deletions
diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst index 3d435caa3ef2..99b6a61e5e31 100644 --- a/Documentation/networking/scaling.rst +++ b/Documentation/networking/scaling.rst @@ -44,6 +44,28 @@ by masking out the low order seven bits of the computed hash for the packet (usually a Toeplitz hash), taking this number as a key into the indirection table and reading the corresponding value. +Some NICs support symmetric RSS hashing where, if the IP (source address, +destination address) and TCP/UDP (source port, destination port) tuples +are swapped, the computed hash is the same. This is beneficial in some +applications that monitor TCP/IP flows (IDS, firewalls, ...etc) and need +both directions of the flow to land on the same Rx queue (and CPU). The +"Symmetric-XOR" and "Symmetric-OR-XOR" are types of RSS algorithms that +achieve this hash symmetry by XOR/ORing the input source and destination +fields of the IP and/or L4 protocols. This, however, results in reduced +input entropy and could potentially be exploited. + +Specifically, the "Symmetric-XOR" algorithm XORs the input +as follows:: + + # (SRC_IP ^ DST_IP, SRC_IP ^ DST_IP, SRC_PORT ^ DST_PORT, SRC_PORT ^ DST_PORT) + +The "Symmetric-OR-XOR" algorithm, on the other hand, transforms the input as +follows:: + + # (SRC_IP | DST_IP, SRC_IP ^ DST_IP, SRC_PORT | DST_PORT, SRC_PORT ^ DST_PORT) + +The result is then fed to the underlying RSS algorithm. + Some advanced NICs allow steering packets to queues based on programmable filters. For example, webserver bound TCP port 80 packets can be directed to their own receive queue. Such “n-tuple” filters can @@ -105,6 +127,48 @@ a separate CPU. For interrupt handling, HT has shown no benefit in initial tests, so limit the number of queues to the number of CPU cores in the system. +Dedicated RSS contexts +~~~~~~~~~~~~~~~~~~~~~~ + +Modern NICs support creating multiple co-existing RSS configurations +which are selected based on explicit matching rules. This can be very +useful when application wants to constrain the set of queues receiving +traffic for e.g. a particular destination port or IP address. +The example below shows how to direct all traffic to TCP port 22 +to queues 0 and 1. + +To create an additional RSS context use:: + + # ethtool -X eth0 hfunc toeplitz context new + New RSS context is 1 + +Kernel reports back the ID of the allocated context (the default, always +present RSS context has ID of 0). The new context can be queried and +modified using the same APIs as the default context:: + + # ethtool -x eth0 context 1 + RX flow hash indirection table for eth0 with 13 RX ring(s): + 0: 0 1 2 3 4 5 6 7 + 8: 8 9 10 11 12 0 1 2 + [...] + # ethtool -X eth0 equal 2 context 1 + # ethtool -x eth0 context 1 + RX flow hash indirection table for eth0 with 13 RX ring(s): + 0: 0 1 0 1 0 1 0 1 + 8: 0 1 0 1 0 1 0 1 + [...] + +To make use of the new context direct traffic to it using an n-tuple +filter:: + + # ethtool -N eth0 flow-type tcp6 dst-port 22 context 1 + Added rule with ID 1023 + +When done, remove the context and the rule:: + + # ethtool -N eth0 delete 1023 + # ethtool -X eth0 context 1 delete + RPS: Receive Packet Steering ============================ @@ -269,8 +333,8 @@ a single application thread handles flows with many different flow hashes. rps_sock_flow_table is a global flow table that contains the *desired* CPU for flows: the CPU that is currently processing the flow in userspace. Each table value is a CPU index that is updated during calls to recvmsg -and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage() -and tcp_splice_read()). +and sendmsg (specifically, inet_recvmsg(), inet_sendmsg() and +tcp_splice_read()). When the scheduler moves a thread to a new CPU while it has outstanding receive packets on the old CPU, packets may arrive out of order. To @@ -370,8 +434,10 @@ rps_dev_flow_table. The stack consults a CPU to hardware queue map which is maintained by the NIC driver. This is an auto-generated reverse map of the IRQ affinity table shown by /proc/interrupts. Drivers can use functions in the cpu_rmap (“CPU affinity reverse map”) kernel library -to populate the map. For each CPU, the corresponding queue in the map is -set to be one whose processing CPU is closest in cache locality. +to populate the map. Alternatively, drivers can delegate the cpu_rmap +management to the Kernel by calling netif_enable_cpu_rmap(). For each CPU, +the corresponding queue in the map is set to be one whose processing CPU is +closest in cache locality. Accelerated RFS Configuration |
