summaryrefslogtreecommitdiff
path: root/Documentation/networking/scaling.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking/scaling.rst')
-rw-r--r--Documentation/networking/scaling.rst74
1 files changed, 70 insertions, 4 deletions
diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst
index 3d435caa3ef2..99b6a61e5e31 100644
--- a/Documentation/networking/scaling.rst
+++ b/Documentation/networking/scaling.rst
@@ -44,6 +44,28 @@ by masking out the low order seven bits of the computed hash for the
packet (usually a Toeplitz hash), taking this number as a key into the
indirection table and reading the corresponding value.
+Some NICs support symmetric RSS hashing where, if the IP (source address,
+destination address) and TCP/UDP (source port, destination port) tuples
+are swapped, the computed hash is the same. This is beneficial in some
+applications that monitor TCP/IP flows (IDS, firewalls, ...etc) and need
+both directions of the flow to land on the same Rx queue (and CPU). The
+"Symmetric-XOR" and "Symmetric-OR-XOR" are types of RSS algorithms that
+achieve this hash symmetry by XOR/ORing the input source and destination
+fields of the IP and/or L4 protocols. This, however, results in reduced
+input entropy and could potentially be exploited.
+
+Specifically, the "Symmetric-XOR" algorithm XORs the input
+as follows::
+
+ # (SRC_IP ^ DST_IP, SRC_IP ^ DST_IP, SRC_PORT ^ DST_PORT, SRC_PORT ^ DST_PORT)
+
+The "Symmetric-OR-XOR" algorithm, on the other hand, transforms the input as
+follows::
+
+ # (SRC_IP | DST_IP, SRC_IP ^ DST_IP, SRC_PORT | DST_PORT, SRC_PORT ^ DST_PORT)
+
+The result is then fed to the underlying RSS algorithm.
+
Some advanced NICs allow steering packets to queues based on
programmable filters. For example, webserver bound TCP port 80 packets
can be directed to their own receive queue. Such “n-tuple” filters can
@@ -105,6 +127,48 @@ a separate CPU. For interrupt handling, HT has shown no benefit in
initial tests, so limit the number of queues to the number of CPU cores
in the system.
+Dedicated RSS contexts
+~~~~~~~~~~~~~~~~~~~~~~
+
+Modern NICs support creating multiple co-existing RSS configurations
+which are selected based on explicit matching rules. This can be very
+useful when application wants to constrain the set of queues receiving
+traffic for e.g. a particular destination port or IP address.
+The example below shows how to direct all traffic to TCP port 22
+to queues 0 and 1.
+
+To create an additional RSS context use::
+
+ # ethtool -X eth0 hfunc toeplitz context new
+ New RSS context is 1
+
+Kernel reports back the ID of the allocated context (the default, always
+present RSS context has ID of 0). The new context can be queried and
+modified using the same APIs as the default context::
+
+ # ethtool -x eth0 context 1
+ RX flow hash indirection table for eth0 with 13 RX ring(s):
+ 0: 0 1 2 3 4 5 6 7
+ 8: 8 9 10 11 12 0 1 2
+ [...]
+ # ethtool -X eth0 equal 2 context 1
+ # ethtool -x eth0 context 1
+ RX flow hash indirection table for eth0 with 13 RX ring(s):
+ 0: 0 1 0 1 0 1 0 1
+ 8: 0 1 0 1 0 1 0 1
+ [...]
+
+To make use of the new context direct traffic to it using an n-tuple
+filter::
+
+ # ethtool -N eth0 flow-type tcp6 dst-port 22 context 1
+ Added rule with ID 1023
+
+When done, remove the context and the rule::
+
+ # ethtool -N eth0 delete 1023
+ # ethtool -X eth0 context 1 delete
+
RPS: Receive Packet Steering
============================
@@ -269,8 +333,8 @@ a single application thread handles flows with many different flow hashes.
rps_sock_flow_table is a global flow table that contains the *desired* CPU
for flows: the CPU that is currently processing the flow in userspace.
Each table value is a CPU index that is updated during calls to recvmsg
-and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage()
-and tcp_splice_read()).
+and sendmsg (specifically, inet_recvmsg(), inet_sendmsg() and
+tcp_splice_read()).
When the scheduler moves a thread to a new CPU while it has outstanding
receive packets on the old CPU, packets may arrive out of order. To
@@ -370,8 +434,10 @@ rps_dev_flow_table. The stack consults a CPU to hardware queue map which
is maintained by the NIC driver. This is an auto-generated reverse map of
the IRQ affinity table shown by /proc/interrupts. Drivers can use
functions in the cpu_rmap (“CPU affinity reverse map”) kernel library
-to populate the map. For each CPU, the corresponding queue in the map is
-set to be one whose processing CPU is closest in cache locality.
+to populate the map. Alternatively, drivers can delegate the cpu_rmap
+management to the Kernel by calling netif_enable_cpu_rmap(). For each CPU,
+the corresponding queue in the map is set to be one whose processing CPU is
+closest in cache locality.
Accelerated RFS Configuration