1 files changed, 209 insertions, 29 deletions
diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst
index c5642f430d2e..ff1e6a8ffe21 100644
--- a/Documentation/networking/snmp_counter.rst
+++ b/Documentation/networking/snmp_counter.rst
@@ -313,8 +313,8 @@ https://lwn.net/Articles/576263/
 
 * TcpExtTCPOrigDataSent
 
-This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
-explaination below::
+This counter is explained by kernel commit f19c29e3e391, I pasted the
+explanation below::
 
   TCPOrigDataSent: number of outgoing packets with original data (excluding
   retransmission but including data-in-SYN). This counter is different from
@@ -323,22 +323,20 @@ explaination below::
 
 * TCPSynRetrans
 
-This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
-explaination below::
+This counter is explained by kernel commit f19c29e3e391, I pasted the
+explanation below::
 
   TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down
   retransmissions into SYN, fast-retransmits, timeout retransmits, etc.
 
 * TCPFastOpenActiveFail
 
-This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
-explaination below::
+This counter is explained by kernel commit f19c29e3e391, I pasted the
+explanation below::
 
   TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed because
   the remote does not accept it or the attempts timed out.
 
-.. _kernel commit f19c29e3e391: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd
-
 * TcpExtListenOverflows and TcpExtListenDrops
 
 When kernel receives a SYN from a client, and if the TCP accept queue
@@ -367,26 +365,53 @@ to the accept queue.
 TCP Fast Open
 =============
 * TcpEstabResets
+
 Defined in `RFC1213 tcpEstabResets`_.
 
 .. _RFC1213 tcpEstabResets: https://tools.ietf.org/html/rfc1213#page-48
 
 * TcpAttemptFails
+
 Defined in `RFC1213 tcpAttemptFails`_.
 
 .. _RFC1213 tcpAttemptFails: https://tools.ietf.org/html/rfc1213#page-48
 
 * TcpOutRsts
+
 Defined in `RFC1213 tcpOutRsts`_. The RFC says this counter indicates
 the 'segments sent containing the RST flag', but in linux kernel, this
-couner indicates the segments kerenl tried to send. The sending
+counter indicates the segments kernel tried to send. The sending
 process might be failed due to some errors (e.g. memory alloc failed).
 
 .. _RFC1213 tcpOutRsts: https://tools.ietf.org/html/rfc1213#page-52
 
+* TcpExtTCPSpuriousRtxHostQueues
+
+When the TCP stack wants to retransmit a packet, and finds that packet
+is not lost in the network, but the packet is not sent yet, the TCP
+stack would give up the retransmission and update this counter. It
+might happen if a packet stays too long time in a qdisc or driver
+queue.
+
+* TcpEstabResets
+
+The socket receives a RST packet in Establish or CloseWait state.
+
+* TcpExtTCPKeepAlive
+
+This counter indicates many keepalive packets were sent. The keepalive
+won't be enabled by default. A userspace program could enable it by
+setting the SO_KEEPALIVE socket option.
+
+* TcpExtTCPSpuriousRTOs
+
+The spurious retransmission timeout detected by the `F-RTO`_
+algorithm.
+
+.. _F-RTO: https://tools.ietf.org/html/rfc5682
 
 TCP Fast Path
-============
+=============
 When kernel receives a TCP packet, it has two paths to handler the
 packet, one is fast path, another is slow path. The comment in kernel
 code provides a good explanation of them, I pasted them below::
@@ -609,6 +634,29 @@ packet yet, the sender would know packet 4 is out of order. The TCP
 stack of kernel will increase TcpExtTCPSACKReorder for both of the
 above scenarios.
 
+* TcpExtTCPSlowStartRetrans
+
+The TCP stack wants to retransmit a packet and the congestion control
+state is 'Loss'.
+
+* TcpExtTCPFastRetrans
+
+The TCP stack wants to retransmit a packet and the congestion control
+state is not 'Loss'.
+
+* TcpExtTCPLostRetransmit
+
+A SACK points out that a retransmission packet is lost again.
+
+* TcpExtTCPRetransFail
+
+The TCP stack tries to deliver a retransmission packet to lower layers
+but the lower layers return an error.
+
+* TcpExtTCPSynRetrans
+
+The TCP stack retransmits a SYN packet.
+
 DSACK
 =====
 The DSACK is defined in `RFC2883`_. The receiver uses DSACK to report
@@ -631,6 +679,7 @@ The TCP stack receives an out of order duplicate packet, so it sends a
 DSACK to the sender.
 
 * TcpExtTCPDSACKRecv
+
 The TCP stack receives a DSACK, which indicates an acknowledged
 duplicate packet is received.
 
@@ -640,25 +689,25 @@ The TCP stack receives a DSACK, which indicate an out of order
 duplicate packet is received.
 
 invalid SACK and DSACK
-====================
+======================
 When a SACK (or DSACK) block is invalid, a corresponding counter would
 be updated. The validation method is base on the start/end sequence
 number of the SACK block. For more details, please refer the comment
 of the function tcp_is_sackblock_valid in the kernel source code. A
 SACK option could have up to 4 blocks, they are checked
 individually. E.g., if 3 blocks of a SACk is invalid, the
-corresponding counter would be updated 3 times. The comment of the
-`Add counters for discarded SACK blocks`_ patch has additional
-explaination:
-
-.. _Add counters for discarded SACK blocks: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18f02545a9a16c9a89778b91a162ad16d510bb32
+corresponding counter would be updated 3 times. The comment of commit
+18f02545a9a1 ("[TCP] MIB: Add counters for discarded SACK blocks")
+has additional explanation:
 
 * TcpExtTCPSACKDiscard
+
 This counter indicates how many SACK blocks are invalid. If the invalid
 SACK block is caused by ACK recording, the TCP stack will only ignore
 it and won't update this counter.
 
 * TcpExtTCPDSACKIgnoredOld and TcpExtTCPDSACKIgnoredNoUndo
+
 When a DSACK block is invalid, one of these two counters would be
 updated. Which counter will be updated depends on the undo_marker flag
 of the TCP socket. If the undo_marker is not set, the TCP stack isn't
@@ -669,7 +718,7 @@ will be updated. If the undo_marker is set, TcpExtTCPDSACKIgnoredOld
 will be updated. As implied in its name, it might be an old packet.
 
 SACK shift
-=========
+==========
 The linux networking stack stores data in sk_buff struct (skb for
 short). If a SACK block acrosses multiple skb, the TCP stack will try
 to re-arrange data in these skb. E.g. if a SACK block acknowledges seq
@@ -680,12 +729,15 @@ seq 14 to 20. All data in skb2 will be moved to skb1, and skb2 will be
 discard, this operation is 'merge'.
 
 * TcpExtTCPSackShifted
+
 A skb is shifted
 
 * TcpExtTCPSackMerged
+
 A skb is merged
 
 * TcpExtTCPSackShiftFallback
+
 A skb should be shifted or merged, but the TCP stack doesn't do it for
 some reasons.
 
@@ -736,7 +788,7 @@ counters to indicate the ACK is skipped in which scenario. The ACK
 would only be skipped if the received packet is either a SYN packet or
 it has no data.
 
-.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
+.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst
 
 * TcpExtTCPACKSkippedSynRecv
 
@@ -773,7 +825,7 @@ PAWS check fails or the received sequence number is out of window.
 
 * TcpExtTCPACKSkippedTimeWait
 
-Tha ACK is skipped in Time-Wait status, the reason would be either
+The ACK is skipped in Time-Wait status, the reason would be either
 PAWS check failed or the received sequence number is out of window.
 
 * TcpExtTCPACKSkippedChallenge
@@ -790,8 +842,9 @@ unacknowledged number (more strict than `RFC 5961 section 5.2`_).
 .. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11
 
 TCP receive window
-=================
+==================
 * TcpExtTCPWantZeroWindowAdv
+
 Depending on current memory usage, the TCP stack tries to set receive
 window to zero. But the receive window might still be a no-zero
 value. For example, if the previous window size is 10, and the TCP
@@ -799,14 +852,16 @@ stack receives 3 bytes, the current window size would be 7 even if the
 window size calculated by the memory usage is zero.
 
 * TcpExtTCPToZeroWindowAdv
+
 The TCP receive window is set to zero from a no-zero value.
 
 * TcpExtTCPFromZeroWindowAdv
+
 The TCP receive window is set to no-zero value from zero.
 
 
 Delayed ACK
-==========
+===========
 The TCP Delayed ACK is a technique which is used for reducing the
 packet count in the network. For more details, please refer the
 `Delayed ACK wiki`_
@@ -814,10 +869,12 @@ packet count in the network. For more details, please refer the
 .. _Delayed ACK wiki: https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment
 
 * TcpExtDelayedACKs
+
 A delayed ACK timer expires. The TCP stack will send a pure ACK packet
 and exit the delayed ACK mode.
 
 * TcpExtDelayedACKLocked
+
 A delayed ACK timer expires, but the TCP stack can't send an ACK
 immediately due to the socket is locked by a userspace program. The
 TCP stack will send a pure ACK later (after the userspace program
@@ -826,24 +883,147 @@ TCP stack will also update TcpExtDelayedACKs and exit the delayed ACK
 mode.
 
 * TcpExtDelayedACKLost
+
 It will be updated when the TCP stack receives a packet which has been
 ACKed. A Delayed ACK loss might cause this issue, but it would also be
 triggered by other reasons, such as a packet is duplicated in the
 network.
 
 Tail Loss Probe (TLP)
-===================
+=====================
 TLP is an algorithm which is used to detect TCP packet loss. For more
 details, please refer the `TLP paper`_.
 
 .. _TLP paper: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01
 
 * TcpExtTCPLossProbes
+
 A TLP probe packet is sent.
 
 * TcpExtTCPLossProbeRecovery
+
 A packet loss is detected and recovered by TLP.
 
+TCP Fast Open description
+=========================
+TCP Fast Open is a technology which allows data transfer before the
+3-way handshake complete. Please refer the `TCP Fast Open wiki`_ for a
+general description.
+
+.. _TCP Fast Open wiki: https://en.wikipedia.org/wiki/TCP_Fast_Open
+
+* TcpExtTCPFastOpenActive
+
+When the TCP stack receives an ACK packet in the SYN-SENT status, and
+the ACK packet acknowledges the data in the SYN packet, the TCP stack
+understand the TFO cookie is accepted by the other side, then it
+updates this counter.
+
+* TcpExtTCPFastOpenActiveFail
+
+This counter indicates that the TCP stack initiated a TCP Fast Open,
+but it failed. This counter would be updated in three scenarios: (1)
+the other side doesn't acknowledge the data in the SYN packet. (2) The
+SYN packet which has the TFO cookie is timeout at least once. (3)
+after the 3-way handshake, the retransmission timeout happens
+net.ipv4.tcp_retries1 times, because some middle-boxes may black-hole
+fast open after the handshake.
+
+* TcpExtTCPFastOpenPassive
+
+This counter indicates how many times the TCP stack accepts the fast
+open request.
+
+* TcpExtTCPFastOpenPassiveFail
+
+This counter indicates how many times the TCP stack rejects the fast
+open request. It is caused by either the TFO cookie is invalid or the
+TCP stack finds an error during the socket creating process.
+
+* TcpExtTCPFastOpenListenOverflow
+
+When the pending fast open request number is larger than
+fastopenq->max_qlen, the TCP stack will reject the fast open request
+and update this counter. When this counter is updated, the TCP stack
+won't update TcpExtTCPFastOpenPassive or
+TcpExtTCPFastOpenPassiveFail. The fastopenq->max_qlen is set by the
+TCP_FASTOPEN socket operation and it could not be larger than
+net.core.somaxconn. For example:
+
+setsockopt(sfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen));
+
+* TcpExtTCPFastOpenCookieReqd
+
+This counter indicates how many times a client wants to request a TFO
+cookie.
+
+SYN cookies
+===========
+SYN cookies are used to mitigate SYN flood, for details, please refer
+the `SYN cookies wiki`_.
+
+.. _SYN cookies wiki: https://en.wikipedia.org/wiki/SYN_cookies
+
+* TcpExtSyncookiesSent
+
+It indicates how many SYN cookies are sent.
+
+* TcpExtSyncookiesRecv
+
+How many reply packets of the SYN cookies the TCP stack receives.
+
+* TcpExtSyncookiesFailed
+
+The MSS decoded from the SYN cookie is invalid. When this counter is
+updated, the received packet won't be treated as a SYN cookie and the
+TcpExtSyncookiesRecv counter won't be updated.
+
+Challenge ACK
+=============
+For details of challenge ACK, please refer the explanation of
+TcpExtTCPACKSkippedChallenge.
+
+* TcpExtTCPChallengeACK
+
+The number of challenge acks sent.
+
+* TcpExtTCPSYNChallenge
+
+The number of challenge acks sent in response to SYN packets. After
+updates this counter, the TCP stack might send a challenge ACK and
+update the TcpExtTCPChallengeACK counter, or it might also skip to
+send the challenge and update the TcpExtTCPACKSkippedChallenge.
+
+prune
+=====
+When a socket is under memory pressure, the TCP stack will try to
+reclaim memory from the receiving queue and out of order queue. One of
+the reclaiming method is 'collapse', which means allocate a big skb,
+copy the contiguous skbs to the single big skb, and free these
+contiguous skbs.
+
+* TcpExtPruneCalled
+
+The TCP stack tries to reclaim memory for a socket. After updates this
+counter, the TCP stack will try to collapse the out of order queue and
+the receiving queue. If the memory is still not enough, the TCP stack
+will try to discard packets from the out of order queue (and update the
+TcpExtOfoPruned counter)
+
+* TcpExtOfoPruned
+
+The TCP stack tries to discard packet on the out of order queue.
+
+* TcpExtRcvPruned
+
+After 'collapse' and discard packets from the out of order queue, if
+the actually used memory is still larger than the max allowed memory,
+this counter will be updated. It means the 'prune' fails.
+
+* TcpExtTCPRcvCollapsed
+
+This counter indicates how many skbs are freed during 'collapse'.
+
 examples
 ========
 
@@ -979,7 +1159,7 @@ The server side nstat output::
   IpExtOutOctets                  52                 0.0
   IpExtInNoECTPkts                1                  0.0
 
-Input a string in nc client side again ('world' in our exmaple)::
+Input a string in nc client side again ('world' in our example)::
 
   nstatuser@nstat-a:~$ nc -v nstat-b 9000
   Connection to nstat-b 9000 port [tcp/*] succeeded!
@@ -1027,7 +1207,7 @@ replied an ACK. But kernel handled them in different ways. When the
 TCP window scale option is not used, kernel will try to enable fast
 path immediately when the connection comes into the established state,
 but if the TCP window scale option is used, kernel will disable the
-fast path at first, and try to enable it after kerenl receives
+fast path at first, and try to enable it after kernel receives
 packets. We could use the 'ss' command to verify whether the window
 scale option is used. e.g. run below command on either server or
 client::
@@ -1159,7 +1339,7 @@ Check TcpExtTCPAbortOnMemory on client::
   nstatuser@nstat-a:~$ nstat | grep -i abort
   TcpExtTCPAbortOnMemory          54                 0.0
 
-Check orphane socket count on client::
+Check orphaned socket count on client::
 
   nstatuser@nstat-a:~$ ss -s
   Total: 131 (kernel 0)
@@ -1497,11 +1677,11 @@ RST to nstat-b::
 
   nstatuser@nstat-a:~$ sudo iptables -A INPUT -p tcp --sport 9000 -j DROP
 
-Send 3 SYN repeatly to nstat-b::
+Send 3 SYN repeatedly to nstat-b::
 
   nstatuser@nstat-a:~$ for i in {1..3}; do sudo tcpreplay -i ens3 /tmp/syn_fixcsum.pcap; done
 
-Check snmp cunter on nstat-b::
+Check snmp counter on nstat-b::
 
   nstatuser@nstat-b:~$ nstat | grep -i skip
   TcpExtTCPACKSkippedSynRecv      1                  0.0
@@ -1586,7 +1766,7 @@ string 'foo' in our example::
   Connection from nstat-a 42132 received!
   foo
 
-On nstat-a, the tcpdump should have caputred the ACK. We should check
+On nstat-a, the tcpdump should have captured the ACK. We should check
 the source port numbers of the two nc clients::
 
   nstatuser@nstat-a:~$ ss -ta '( dport = :9000 || dport = :9001 )' | tee
@@ -1594,7 +1774,7 @@ the source port numbers of the two nc clients::
   ESTAB  0        0            192.168.122.250:50208       192.168.122.251:9000
   ESTAB  0        0            192.168.122.250:42132       192.168.122.251:9001
 
-Run tcprewrite, change port 9001 to port 9000, chagne port 42132 to
+Run tcprewrite, change port 9001 to port 9000, change port 42132 to
 port 50208::
 
   nstatuser@nstat-a:~$ tcprewrite --infile /tmp/seq_pre.pcap --outfile /tmp/seq.pcap -r 9001:9000 -r 42132:50208 --fixcsum