summaryrefslogtreecommitdiff
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/bonding.txt2
-rw-r--r--Documentation/networking/cdc_mbim.txt4
-rw-r--r--Documentation/networking/checksum-offloads.txt2
-rw-r--r--Documentation/networking/dsa/lan9303.txt37
-rw-r--r--Documentation/networking/filter.txt2
-rw-r--r--Documentation/networking/gtp.txt103
-rw-r--r--Documentation/networking/ila.txt285
-rw-r--r--Documentation/networking/ip-sysctl.txt55
-rw-r--r--Documentation/networking/ipvlan.txt42
-rw-r--r--Documentation/networking/netdev-FAQ.txt5
-rw-r--r--Documentation/networking/netvsc.txt8
-rw-r--r--Documentation/networking/packet_mmap.txt2
-rw-r--r--Documentation/networking/regulatory.txt30
-rw-r--r--Documentation/networking/rxrpc.txt53
-rw-r--r--Documentation/networking/switchdev.txt68
-rw-r--r--Documentation/networking/vrf.txt13
16 files changed, 626 insertions, 85 deletions
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 57f52cdce32e..9ba04c0bab8d 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -2387,7 +2387,7 @@ broadcast: Like active-backup, there is not much advantage to this
and packet type ID), so in a "gatewayed" configuration, all
outgoing traffic will generally use the same device. Incoming
traffic may also end up on a single device, but that is
- dependent upon the balancing policy of the peer's 8023.ad
+ dependent upon the balancing policy of the peer's 802.3ad
implementation. In a "local" configuration, traffic will be
distributed across the devices in the bond.
diff --git a/Documentation/networking/cdc_mbim.txt b/Documentation/networking/cdc_mbim.txt
index e4c376abbdad..4e68f0bc5dba 100644
--- a/Documentation/networking/cdc_mbim.txt
+++ b/Documentation/networking/cdc_mbim.txt
@@ -332,8 +332,8 @@ References
[5] "MBIM (Mobile Broadband Interface Model) Registry"
- http://compliance.usb.org/mbim/
-[6] "/dev/bus/usb filesystem output"
- - Documentation/usb/proc_usb_info.txt
+[6] "/sys/kernel/debug/usb/devices output format"
+ - Documentation/driver-api/usb/usb.rst
[7] "/sys/bus/usb/devices/.../descriptors"
- Documentation/ABI/stable/sysfs-bus-usb
diff --git a/Documentation/networking/checksum-offloads.txt b/Documentation/networking/checksum-offloads.txt
index d52d191bbb0c..27bc09cfcf6d 100644
--- a/Documentation/networking/checksum-offloads.txt
+++ b/Documentation/networking/checksum-offloads.txt
@@ -47,7 +47,7 @@ The requirements for GSO are more complicated, because when segmenting an
(section 'E') for more details.
A driver declares its offload capabilities in netdev->hw_features; see
- Documentation/networking/netdev-features for more. Note that a device
+ Documentation/networking/netdev-features.txt for more. Note that a device
which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start
and csum_offset given in the SKB; if it tries to deduce these itself in
hardware (as some NICs do) the driver should check that the values in the
diff --git a/Documentation/networking/dsa/lan9303.txt b/Documentation/networking/dsa/lan9303.txt
new file mode 100644
index 000000000000..144b02b95207
--- /dev/null
+++ b/Documentation/networking/dsa/lan9303.txt
@@ -0,0 +1,37 @@
+LAN9303 Ethernet switch driver
+==============================
+
+The LAN9303 is a three port 10/100 Mbps ethernet switch with integrated phys for
+the two external ethernet ports. The third port is an RMII/MII interface to a
+host master network interface (e.g. fixed link).
+
+
+Driver details
+==============
+
+The driver is implemented as a DSA driver, see
+Documentation/networking/dsa/dsa.txt.
+
+See Documentation/devicetree/bindings/net/dsa/lan9303.txt for device tree
+binding.
+
+The LAN9303 can be managed both via MDIO and I2C, both supported by this driver.
+
+At startup the driver configures the device to provide two separate network
+interfaces (which is the default state of a DSA device). Due to HW limitations,
+no HW MAC learning takes place in this mode.
+
+When both user ports are joined to the same bridge, the normal HW MAC learning
+is enabled. This means that unicast traffic is forwarded in HW. Broadcast and
+multicast is flooded in HW. STP is also supported in this mode. The driver
+support fdb/mdb operations as well, meaning IGMP snooping is supported.
+
+If one of the user ports leave the bridge, the ports goes back to the initial
+separated operation.
+
+
+Driver limitations
+==================
+
+ - Support for VLAN filtering is not implemented
+ - The HW does not support VLAN-specific fdb entries
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 789b74dbe1d9..87814859cfc2 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -337,7 +337,7 @@ Examples for low-level BPF:
jeq #14, good /* __NR_rt_sigprocmask */
jeq #13, good /* __NR_rt_sigaction */
jeq #35, good /* __NR_nanosleep */
- bad: ret #0 /* SECCOMP_RET_KILL */
+ bad: ret #0 /* SECCOMP_RET_KILL_THREAD */
good: ret #0x7fff0000 /* SECCOMP_RET_ALLOW */
The above example code can be placed into a file (here called "foo"), and
diff --git a/Documentation/networking/gtp.txt b/Documentation/networking/gtp.txt
index 93e96750f103..0d9c18f05ec6 100644
--- a/Documentation/networking/gtp.txt
+++ b/Documentation/networking/gtp.txt
@@ -1,6 +1,7 @@
The Linux kernel GTP tunneling module
======================================================================
-Documentation by Harald Welte <laforge@gnumonks.org>
+Documentation by Harald Welte <laforge@gnumonks.org> and
+ Andreas Schultz <aschultz@tpip.net>
In 'drivers/net/gtp.c' you are finding a kernel-level implementation
of a GTP tunnel endpoint.
@@ -91,9 +92,13 @@ http://git.osmocom.org/libgtpnl/
== Protocol Versions ==
-There are two different versions of GTP-U: v0 and v1. Both are
-implemented in the Kernel GTP module. Version 0 is a legacy version,
-and deprecated from recent 3GPP specifications.
+There are two different versions of GTP-U: v0 [GSM TS 09.60] and v1
+[3GPP TS 29.281]. Both are implemented in the Kernel GTP module.
+Version 0 is a legacy version, and deprecated from recent 3GPP
+specifications.
+
+GTP-U uses UDP for transporting PDUs. The receiving UDP port is 2151
+for GTPv1-U and 3386 for GTPv0-U.
There are three versions of GTP-C: v0, v1, and v2. As the kernel
doesn't implement GTP-C, we don't have to worry about this. It's the
@@ -133,3 +138,93 @@ doe to a lack of user interest, it never got merged.
In 2015, Andreas Schultz came to the rescue and fixed lots more bugs,
extended it with new features and finally pushed all of us to get it
mainline, where it was merged in 4.7.0.
+
+== Architectural Details ==
+
+=== Local GTP-U entity and tunnel identification ===
+
+GTP-U uses UDP for transporting PDU's. The receiving UDP port is 2152
+for GTPv1-U and 3386 for GTPv0-U.
+
+There is only one GTP-U entity (and therefor SGSN/GGSN/S-GW/PDN-GW
+instance) per IP address. Tunnel Endpoint Identifier (TEID) are unique
+per GTP-U entity.
+
+A specific tunnel is only defined by the destination entity. Since the
+destination port is constant, only the destination IP and TEID define
+a tunnel. The source IP and Port have no meaning for the tunnel.
+
+Therefore:
+
+ * when sending, the remote entity is defined by the remote IP and
+ the tunnel endpoint id. The source IP and port have no meaning and
+ can be changed at any time.
+
+ * when receiving the local entity is defined by the local
+ destination IP and the tunnel endpoint id. The source IP and port
+ have no meaning and can change at any time.
+
+[3GPP TS 29.281] Section 4.3.0 defines this so:
+
+> The TEID in the GTP-U header is used to de-multiplex traffic
+> incoming from remote tunnel endpoints so that it is delivered to the
+> User plane entities in a way that allows multiplexing of different
+> users, different packet protocols and different QoS levels.
+> Therefore no two remote GTP-U endpoints shall send traffic to a
+> GTP-U protocol entity using the same TEID value except
+> for data forwarding as part of mobility procedures.
+
+The definition above only defines that two remote GTP-U endpoints
+*should not* send to the same TEID, it *does not* forbid or exclude
+such a scenario. In fact, the mentioned mobility procedures make it
+necessary that the GTP-U entity accepts traffic for TEIDs from
+multiple or unknown peers.
+
+Therefore, the receiving side identifies tunnels exclusively based on
+TEIDs, not based on the source IP!
+
+== APN vs. Network Device ==
+
+The GTP-U driver creates a Linux network device for each Gi/SGi
+interface.
+
+[3GPP TS 29.281] calls the Gi/SGi reference point an interface. This
+may lead to the impression that the GGSN/P-GW can have only one such
+interface.
+
+Correct is that the Gi/SGi reference point defines the interworking
+between +the 3GPP packet domain (PDN) based on GTP-U tunnel and IP
+based networks.
+
+There is no provision in any of the 3GPP documents that limits the
+number of Gi/SGi interfaces implemented by a GGSN/P-GW.
+
+[3GPP TS 29.061] Section 11.3 makes it clear that the selection of a
+specific Gi/SGi interfaces is made through the Access Point Name
+(APN):
+
+> 2. each private network manages its own addressing. In general this
+> will result in different private networks having overlapping
+> address ranges. A logically separate connection (e.g. an IP in IP
+> tunnel or layer 2 virtual circuit) is used between the GGSN/P-GW
+> and each private network.
+>
+> In this case the IP address alone is not necessarily unique. The
+> pair of values, Access Point Name (APN) and IPv4 address and/or
+> IPv6 prefixes, is unique.
+
+In order to support the overlapping address range use case, each APN
+is mapped to a separate Gi/SGi interface (network device).
+
+NOTE: The Access Point Name is purely a control plane (GTP-C) concept.
+At the GTP-U level, only Tunnel Endpoint Identifiers are present in
+GTP-U packets and network devices are known
+
+Therefore for a given UE the mapping in IP to PDN network is:
+ * network device + MS IP -> Peer IP + Peer TEID,
+
+and from PDN to IP network:
+ * local GTP-U IP + TEID -> network device
+
+Furthermore, before a received T-PDU is injected into the network
+device the MS IP is checked against the IP recorded in PDP context.
diff --git a/Documentation/networking/ila.txt b/Documentation/networking/ila.txt
new file mode 100644
index 000000000000..78df879abd26
--- /dev/null
+++ b/Documentation/networking/ila.txt
@@ -0,0 +1,285 @@
+Identifier Locator Addressing (ILA)
+
+
+Introduction
+============
+
+Identifier-locator addressing (ILA) is a technique used with IPv6 that
+differentiates between location and identity of a network node. Part of an
+address expresses the immutable identity of the node, and another part
+indicates the location of the node which can be dynamic. Identifier-locator
+addressing can be used to efficiently implement overlay networks for
+network virtualization as well as solutions for use cases in mobility.
+
+ILA can be thought of as means to implement an overlay network without
+encapsulation. This is accomplished by performing network address
+translation on destination addresses as a packet traverses a network. To
+the network, an ILA translated packet appears to be no different than any
+other IPv6 packet. For instance, if the transport protocol is TCP then an
+ILA translated packet looks like just another TCP/IPv6 packet. The
+advantage of this is that ILA is transparent to the network so that
+optimizations in the network, such as ECMP, RSS, GRO, GSO, etc., just work.
+
+The ILA protocol is described in Internet-Draft draft-herbert-intarea-ila.
+
+
+ILA terminology
+===============
+
+ - Identifier A number that identifies an addressable node in the network
+ independent of its location. ILA identifiers are sixty-four
+ bit values.
+
+ - Locator A network prefix that routes to a physical host. Locators
+ provide the topological location of an addressed node. ILA
+ locators are sixty-four bit prefixes.
+
+ - ILA mapping
+ A mapping of an ILA identifier to a locator (or to a
+ locator and meta data). An ILA domain maintains a database
+ that contains mappings for all destinations in the domain.
+
+ - SIR address
+ An IPv6 address composed of a SIR prefix (upper sixty-
+ four bits) and an identifier (lower sixty-four bits).
+ SIR addresses are visible to applications and provide a
+ means for them to address nodes independent of their
+ location.
+
+ - ILA address
+ An IPv6 address composed of a locator (upper sixty-four
+ bits) and an identifier (low order sixty-four bits). ILA
+ addresses are never visible to an application.
+
+ - ILA host An end host that is capable of performing ILA translations
+ on transmit or receive.
+
+ - ILA router A network node that performs ILA translation and forwarding
+ of translated packets.
+
+ - ILA forwarding cache
+ A type of ILA router that only maintains a working set
+ cache of mappings.
+
+ - ILA node A network node capable of performing ILA translations. This
+ can be an ILA router, ILA forwarding cache, or ILA host.
+
+
+Operation
+=========
+
+There are two fundamental operations with ILA:
+
+ - Translate a SIR address to an ILA address. This is performed on ingress
+ to an ILA overlay.
+
+ - Translate an ILA address to a SIR address. This is performed on egress
+ from the ILA overlay.
+
+ILA can be deployed either on end hosts or intermediate devices in the
+network; these are provided by "ILA hosts" and "ILA routers" respectively.
+Configuration and datapath for these two points of deployment is somewhat
+different.
+
+The diagram below illustrates the flow of packets through ILA as well
+as showing ILA hosts and routers.
+
+ +--------+ +--------+
+ | Host A +-+ +--->| Host B |
+ | | | (2) ILA (') | |
+ +--------+ | ...addressed.... ( ) +--------+
+ V +---+--+ . packet . +---+--+ (_)
+ (1) SIR | | ILA |----->-------->---->| ILA | | (3) SIR
+ addressed +->|router| . . |router|->-+ addressed
+ packet +---+--+ . IPv6 . +---+--+ packet
+ / . Network .
+ / . . +--+-++--------+
+ +--------+ / . . |ILA || Host |
+ | Host +--+ . .- -|host|| |
+ | | . . +--+-++--------+
+ +--------+ ................
+
+
+Transport checksum handling
+===========================
+
+When an address is translated by ILA, an encapsulated transport checksum
+that includes the translated address in a pseudo header may be rendered
+incorrect on the wire. This is a problem for intermediate devices,
+including checksum offload in NICs, that process the checksum. There are
+three options to deal with this:
+
+- no action Allow the checksum to be incorrect on the wire. Before
+ a receiver verifies a checksum the ILA to SIR address
+ translation must be done.
+
+- adjust transport checksum
+ When ILA translation is performed the packet is parsed
+ and if a transport layer checksum is found then it is
+ adjusted to reflect the correct checksum per the
+ translated address.
+
+- checksum neutral mapping
+ When an address is translated the difference can be offset
+ elsewhere in a part of the packet that is covered by the
+ the checksum. The low order sixteen bits of the identifier
+ are used. This method is preferred since it doesn't require
+ parsing a packet beyond the IP header and in most cases the
+ adjustment can be precomputed and saved with the mapping.
+
+Note that the checksum neutral adjustment affects the low order sixteen
+bits of the identifier. When ILA to SIR address translation is done on
+egress the low order bits are restored to the original value which
+restores the identifier as it was originally sent.
+
+
+Identifier types
+================
+
+ILA defines different types of identifiers for different use cases.
+
+The defined types are:
+
+ 0: interface identifier
+
+ 1: locally unique identifier
+
+ 2: virtual networking identifier for IPv4 address
+
+ 3: virtual networking identifier for IPv6 unicast address
+
+ 4: virtual networking identifier for IPv6 multicast address
+
+ 5: non-local address identifier
+
+In the current implementation of kernel ILA only locally unique identifiers
+(LUID) are supported. LUID allows for a generic, unformatted 64 bit
+identifier.
+
+
+Identifier formats
+==================
+
+Kernel ILA supports two optional fields in an identifier for formatting:
+"C-bit" and "identifier type". The presence of these fields is determined
+by configuration as demonstrated below.
+
+If the identifier type is present it occupies the three highest order
+bits of an identifier. The possible values are given in the above list.
+
+If the C-bit is present, this is used as an indication that checksum
+neutral mapping has been done. The C-bit can only be set in an
+ILA address, never a SIR address.
+
+In the simplest format the identifier types, C-bit, and checksum
+adjustment value are not present so an identifier is considered an
+unstructured sixty-four bit value.
+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Identifier |
+ + +
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+The checksum neutral adjustment may be configured to always be
+present using neutral-map-auto. In this case there is no C-bit, but the
+checksum adjustment is in the low order 16 bits. The identifier is
+still sixty-four bits.
+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Identifier |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | | Checksum-neutral adjustment |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+The C-bit may used to explicitly indicate that checksum neutral
+mapping has been applied to an ILA address. The format is:
+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |C| Identifier |
+ | +-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | | Checksum-neutral adjustment |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+The identifier type field may be present to indicate the identifier
+type. If it is not present then the type is inferred based on mapping
+configuration. The checksum neutral adjustment may automatically
+used with the identifier type as illustrated below.
+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Type| Identifier |
+ +-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | | Checksum-neutral adjustment |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+If the identifier type and the C-bit can be present simultaneously so
+the identifier format would be:
+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Type|C| Identifier |
+ +-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | | Checksum-neutral adjustment |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+
+Configuration
+=============
+
+There are two methods to configure ILA mappings. One is by using LWT routes
+and the other is ila_xlat (called from NFHOOK PREROUTING hook). ila_xlat
+is intended to be used in the receive path for ILA hosts .
+
+An ILA router has also been implemented in XDP. Description of that is
+outside the scope of this document.
+
+The usage of for ILA LWT routes is:
+
+ip route add DEST/128 encap ila LOC csum-mode MODE ident-type TYPE via ADDR
+
+Destination (DEST) can either be a SIR address (for an ILA host or ingress
+ILA router) or an ILA address (egress ILA router). LOC is the sixty-four
+bit locator (with format W:X:Y:Z) that overwrites the upper sixty-four
+bits of the destination address. Checksum MODE is one of "no-action",
+"adj-transport", "neutral-map", and "neutral-map-auto". If neutral-map is
+set then the C-bit will be present. Identifier TYPE one of "luid" or
+"use-format." In the case of use-format, the identifier type field is
+present and the effective type is taken from that.
+
+The usage of ila_xlat is:
+
+ip ila add loc_match MATCH loc LOC csum-mode MODE ident-type TYPE
+
+MATCH indicates the incoming locator that must be matched to apply
+a the translaiton. LOC is the locator that overwrites the upper
+sixty-four bits of the destination address. MODE and TYPE have the
+same meanings as described above.
+
+
+Some examples
+=============
+
+# Configure an ILA route that uses checksum neutral mapping as well
+# as type field. Note that the type field is set in the SIR address
+# (the 2000 implies type is 1 which is LUID).
+ip route add 3333:0:0:1:2000:0:1:87/128 encap ila 2001:0:87:0 \
+ csum-mode neutral-map ident-type use-format
+
+# Configure an ILA LWT route that uses auto checksum neutral mapping
+# (no C-bit) and configure identifier type to be LUID so that the
+# identifier type field will not be present.
+ip route add 3333:0:0:1:2000:0:2:87/128 encap ila 2001:0:87:1 \
+ csum-mode neutral-map-auto ident-type luid
+
+ila_xlat configuration
+
+# Configure an ILA to SIR mapping that matches a locator and overwrites
+# it with a SIR address (3333:0:0:1 in this example). The C-bit and
+# identifier field are used.
+ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
+ csum-mode neutral-map-auto ident-type use-format
+
+# Configure an ILA to SIR mapping where checksum neutral is automatically
+# set without the C-bit and the identifier type is configured to be LUID
+# so that the identifier type field is not present.
+ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
+ csum-mode neutral-map-auto ident-type use-format
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index b3345d0fe0a6..46c7e1085efc 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -289,8 +289,7 @@ tcp_ecn_fallback - BOOLEAN
Default: 1 (fallback enabled)
tcp_fack - BOOLEAN
- Enable FACK congestion avoidance and fast retransmission.
- The value is not used, if tcp_sack is not enabled.
+ This is a legacy option, it has no effect anymore.
tcp_fin_timeout - INTEGER
The length of time an orphaned (no longer referenced by any
@@ -454,6 +453,7 @@ tcp_recovery - INTEGER
RACK: 0x1 enables the RACK loss detection for fast detection of lost
retransmissions and tail drops.
+ RACK: 0x2 makes RACK's reordering window static (min_rtt/4).
Default: 0x1
@@ -1385,6 +1385,30 @@ mld_qrv - INTEGER
Default: 2 (as specified by RFC3810 9.1)
Minimum: 1 (as specified by RFC6636 4.5)
+max_dst_opts_cnt - INTEGER
+ Maximum number of non-padding TLVs allowed in a Destination
+ options extension header. If this value is less than zero
+ then unknown options are disallowed and the number of known
+ TLVs allowed is the absolute value of this number.
+ Default: 8
+
+max_hbh_opts_cnt - INTEGER
+ Maximum number of non-padding TLVs allowed in a Hop-by-Hop
+ options extension header. If this value is less than zero
+ then unknown options are disallowed and the number of known
+ TLVs allowed is the absolute value of this number.
+ Default: 8
+
+max dst_opts_len - INTEGER
+ Maximum length allowed for a Destination options extension
+ header.
+ Default: INT_MAX (unlimited)
+
+max hbh_opts_len - INTEGER
+ Maximum length allowed for a Hop-by-Hop options extension
+ header.
+ Default: INT_MAX (unlimited)
+
IPv6 Fragmentation:
ip6frag_high_thresh - INTEGER
@@ -1680,6 +1704,9 @@ accept_dad - INTEGER
2: Enable DAD, and disable IPv6 operation if MAC-based duplicate
link-local address has been found.
+ DAD operation and mode on a given interface will be selected according
+ to the maximum value of conf/{all,interface}/accept_dad.
+
force_tllao - BOOLEAN
Enable sending the target link-layer address option even when
responding to a unicast neighbor solicitation.
@@ -1704,6 +1731,15 @@ ndisc_notify - BOOLEAN
1 - Generate unsolicited neighbour advertisements when device is brought
up or hardware address changes.
+ndisc_tclass - INTEGER
+ The IPv6 Traffic Class to use by default when sending IPv6 Neighbor
+ Discovery (Router Solicitation, Router Advertisement, Neighbor
+ Solicitation, Neighbor Advertisement, Redirect) messages.
+ These 8 bits can be interpreted as 6 high order bits holding the DSCP
+ value and 2 low order bits representing ECN (which you probably want
+ to leave cleared).
+ 0 - (default)
+
mldv1_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited
MLDv1 report retransmit will take place.
@@ -1727,16 +1763,23 @@ suppress_frag_ndisc - INTEGER
optimistic_dad - BOOLEAN
Whether to perform Optimistic Duplicate Address Detection (RFC 4429).
- 0: disabled (default)
- 1: enabled
+ 0: disabled (default)
+ 1: enabled
+
+ Optimistic Duplicate Address Detection for the interface will be enabled
+ if at least one of conf/{all,interface}/optimistic_dad is set to 1,
+ it will be disabled otherwise.
use_optimistic - BOOLEAN
If enabled, do not classify optimistic addresses as deprecated during
source address selection. Preferred addresses will still be chosen
before optimistic addresses, subject to other ranking in the source
address selection algorithm.
- 0: disabled (default)
- 1: enabled
+ 0: disabled (default)
+ 1: enabled
+
+ This will be enabled if at least one of
+ conf/{all,interface}/use_optimistic is set to 1, disabled otherwise.
stable_secret - IPv6 address
This IPv6 address will be used as a secret to generate IPv6
diff --git a/Documentation/networking/ipvlan.txt b/Documentation/networking/ipvlan.txt
index 1fe42a874aae..812ef003e0a8 100644
--- a/Documentation/networking/ipvlan.txt
+++ b/Documentation/networking/ipvlan.txt
@@ -22,9 +22,21 @@ The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module
There are no module parameters for this driver and it can be configured
using IProute2/ip utility.
- ip link add link <master-dev> name <slave-dev> type ipvlan mode { l2 | l3 | l3s }
-
- e.g. ip link add link eth0 name ipvl0 type ipvlan mode l2
+ ip link add link <master> name <slave> type ipvlan [ mode MODE ] [ FLAGS ]
+ where
+ MODE: l3 (default) | l3s | l2
+ FLAGS: bridge (default) | private | vepa
+
+ e.g.
+ (a) Following will create IPvlan link with eth0 as master in
+ L3 bridge mode
+ bash# ip link add link eth0 name ipvl0 type ipvlan
+ (b) This command will create IPvlan link in L2 bridge mode.
+ bash# ip link add link eth0 name ipvl0 type ipvlan mode l2 bridge
+ (c) This command will create an IPvlan device in L2 private mode.
+ bash# ip link add link eth0 name ipvlan type ipvlan mode l2 private
+ (d) This command will create an IPvlan device in L2 vepa mode.
+ bash# ip link add link eth0 name ipvlan type ipvlan mode l2 vepa
4. Operating modes:
@@ -54,7 +66,29 @@ works in this mode and hence it is L3-symmetric (L3s). This will have slightly l
performance but that shouldn't matter since you are choosing this mode over plain-L3
mode to make conn-tracking work.
-5. What to choose (macvlan vs. ipvlan)?
+5. Mode flags:
+ At this time following mode flags are available
+
+5.1 bridge:
+ This is the default option. To configure the IPvlan port in this mode,
+user can choose to either add this option on the command-line or don't specify
+anything. This is the traditional mode where slaves can cross-talk among
+themseleves apart from talking through the master device.
+
+5.2 private:
+ If this option is added to the command-line, the port is set in private
+mode. i.e. port wont allow cross communication between slaves.
+
+5.3 vepa:
+ If this is added to the command-line, the port is set in VEPA mode.
+i.e. port will offload switching functionality to the external entity as
+described in 802.1Qbg
+Note: VEPA mode in IPvlan has limitations. IPvlan uses the mac-address of the
+master-device, so the packets which are emitted in this mode for the adjacent
+neighbor will have source and destination mac same. This will make the switch /
+router send the redirect message.
+
+6. What to choose (macvlan vs. ipvlan)?
These two devices are very similar in many regards and the specific use
case could very well define which device to choose. if one of the following
situations defines your use case then you can choose to use ipvlan -
diff --git a/Documentation/networking/netdev-FAQ.txt b/Documentation/networking/netdev-FAQ.txt
index cfc66ea72329..2a3278d5cf35 100644
--- a/Documentation/networking/netdev-FAQ.txt
+++ b/Documentation/networking/netdev-FAQ.txt
@@ -64,7 +64,10 @@ A: To understand this, you need to know a bit of background information
If you aren't subscribed to netdev and/or are simply unsure if net-next
has re-opened yet, simply check the net-next git repository link above for
- any new networking-related commits.
+ any new networking-related commits. You may also check the following
+ website for the current status:
+
+ http://vger.kernel.org/~davem/net-next.html
The "net" tree continues to collect fixes for the vX.Y content, and
is fed back to Linus at regular (~weekly) intervals. Meaning that the
diff --git a/Documentation/networking/netvsc.txt b/Documentation/networking/netvsc.txt
index 93560fb1170a..92f5b31392fa 100644
--- a/Documentation/networking/netvsc.txt
+++ b/Documentation/networking/netvsc.txt
@@ -19,12 +19,12 @@ Features
Receive Side Scaling
--------------------
- Hyper-V supports receive side scaling. For TCP, packets are
- distributed among available queues based on IP address and port
+ Hyper-V supports receive side scaling. For TCP & UDP, packets can
+ be distributed among available queues based on IP address and port
number.
- For UDP, we can switch UDP hash level between L3 and L4 by ethtool
- command. UDP over IPv4 and v6 can be set differently. The default
+ For TCP & UDP, we can switch hash level between L3 and L4 by ethtool
+ command. TCP/UDP over IPv4 and v6 can be set differently. The default
hash level is L4. We currently only allow switching TX hash level
from within the guests.
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index f3b9e507ab05..bf654845556e 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -1055,7 +1055,7 @@ TX_RING part only TP_STATUS_AVAILABLE is set, then the tp_sec and tp_{n,u}sec
members do not contain a valid value. For TX_RINGs, by default no timestamp
is generated!
-See include/linux/net_tstamp.h and Documentation/networking/timestamping
+See include/linux/net_tstamp.h and Documentation/networking/timestamping.txt
for more information on hardware timestamps.
-------------------------------------------------------------------------------
diff --git a/Documentation/networking/regulatory.txt b/Documentation/networking/regulatory.txt
index 7818b5fe448b..381e5b23d61d 100644
--- a/Documentation/networking/regulatory.txt
+++ b/Documentation/networking/regulatory.txt
@@ -19,6 +19,14 @@ core regulatory domain all wireless devices should adhere to.
How to get regulatory domains to the kernel
-------------------------------------------
+When the regulatory domain is first set up, the kernel will request a
+database file (regulatory.db) containing all the regulatory rules. It
+will then use that database when it needs to look up the rules for a
+given country.
+
+How to get regulatory domains to the kernel (old CRDA solution)
+---------------------------------------------------------------
+
Userspace gets a regulatory domain in the kernel by having
a userspace agent build it and send it via nl80211. Only
expected regulatory domains will be respected by the kernel.
@@ -192,23 +200,5 @@ Then in some part of your code after your wiphy has been registered:
Statically compiled regulatory database
---------------------------------------
-In most situations the userland solution using CRDA as described
-above is the preferred solution. However in some cases a set of
-rules built into the kernel itself may be desirable. To account
-for this situation, a configuration option has been provided
-(i.e. CONFIG_CFG80211_INTERNAL_REGDB). With this option enabled,
-the wireless database information contained in net/wireless/db.txt is
-used to generate a data structure encoded in net/wireless/regdb.c.
-That option also enables code in net/wireless/reg.c which queries
-the data in regdb.c as an alternative to using CRDA.
-
-The file net/wireless/db.txt should be kept up-to-date with the db.txt
-file available in the git repository here:
-
- git://git.kernel.org/pub/scm/linux/kernel/git/sforshee/wireless-regdb.git
-
-Again, most users in most situations should be using the CRDA package
-provided with their distribution, and in most other situations users
-should be building and using CRDA on their own rather than using
-this option. If you are not absolutely sure that you should be using
-CONFIG_CFG80211_INTERNAL_REGDB then _DO_NOT_USE_IT_.
+When a database should be fixed into the kernel, it can be provided as a
+firmware file at build time that is then linked into the kernel.
diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt
index 810620153a44..b5407163d53b 100644
--- a/Documentation/networking/rxrpc.txt
+++ b/Documentation/networking/rxrpc.txt
@@ -280,6 +280,18 @@ Interaction with the user of the RxRPC socket:
nominated by a socket option.
+Notes on sendmsg:
+
+ (*) MSG_WAITALL can be set to tell sendmsg to ignore signals if the peer is
+ making progress at accepting packets within a reasonable time such that we
+ manage to queue up all the data for transmission. This requires the
+ client to accept at least one packet per 2*RTT time period.
+
+ If this isn't set, sendmsg() will return immediately, either returning
+ EINTR/ERESTARTSYS if nothing was consumed or returning the amount of data
+ consumed.
+
+
Notes on recvmsg:
(*) If there's a sequence of data messages belonging to a particular call on
@@ -782,7 +794,9 @@ The kernel interface functions are as follows:
struct key *key,
unsigned long user_call_ID,
s64 tx_total_len,
- gfp_t gfp);
+ gfp_t gfp,
+ rxrpc_notify_rx_t notify_rx,
+ bool upgrade);
This allocates the infrastructure to make a new RxRPC call and assigns
call and connection numbers. The call will be made on the UDP port that
@@ -803,6 +817,13 @@ The kernel interface functions are as follows:
allows the kernel to encrypt directly to the packet buffers, thereby
saving a copy. The value may not be less than -1.
+ notify_rx is a pointer to a function to be called when events such as
+ incoming data packets or remote aborts happen.
+
+ upgrade should be set to true if a client operation should request that
+ the server upgrade the service to a better one. The resultant service ID
+ is returned by rxrpc_kernel_recv_data().
+
If this function is successful, an opaque reference to the RxRPC call is
returned. The caller now holds a reference on this and it must be
properly ended.
@@ -850,7 +871,8 @@ The kernel interface functions are as follows:
size_t size,
size_t *_offset,
bool want_more,
- u32 *_abort)
+ u32 *_abort,
+ u16 *_service)
This is used to receive data from either the reply part of a client call
or the request part of a service call. buf and size specify how much
@@ -873,6 +895,9 @@ The kernel interface functions are as follows:
If a remote ABORT is detected, the abort code received will be stored in
*_abort and ECONNABORTED will be returned.
+ The service ID that the call ended up with is returned into *_service.
+ This can be used to see if a call got a service upgrade.
+
(*) Abort a call.
void rxrpc_kernel_abort_call(struct socket *sock,
@@ -1020,6 +1045,30 @@ The kernel interface functions are as follows:
It returns 0 if the call was requeued and an error otherwise.
+ (*) Get call RTT.
+
+ u64 rxrpc_kernel_get_rtt(struct socket *sock, struct rxrpc_call *call);
+
+ Get the RTT time to the peer in use by a call. The value returned is in
+ nanoseconds.
+
+ (*) Check call still alive.
+
+ u32 rxrpc_kernel_check_life(struct socket *sock,
+ struct rxrpc_call *call);
+
+ This returns a number that is updated when ACKs are received from the peer
+ (notably including PING RESPONSE ACKs which we can elicit by sending PING
+ ACKs to see if the call still exists on the server). The caller should
+ compare the numbers of two calls to see if the call is still alive after
+ waiting for a suitable interval.
+
+ This allows the caller to work out if the server is still contactable and
+ if the call is still alive on the server whilst waiting for the server to
+ process a client operation.
+
+ This function may transmit a PING ACK.
+
=======================
CONFIGURABLE PARAMETERS
diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index 5e40e1f68873..82236a17b5e6 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -13,42 +13,42 @@ an example setup using a data-center-class switch ASIC chip. Other setups
with SR-IOV or soft switches, such as OVS, are possible.
-                             User-space tools
-
-       user space                   |
-      +-------------------------------------------------------------------+
-       kernel                       | Netlink
-                                    |
-                     +--------------+-------------------------------+
-                     |         Network stack                        |
-                     |           (Linux)                            |
-                     |                                              |
-                     +----------------------------------------------+
+ User-space tools
+
+ user space |
+ +-------------------------------------------------------------------+
+ kernel | Netlink
+ |
+ +--------------+-------------------------------+
+ | Network stack |
+ | (Linux) |
+ | |
+ +----------------------------------------------+
sw1p2 sw1p4 sw1p6
-                      sw1p1  + sw1p3 +  sw1p5 +         eth1
-                        +    |    +    |    +    |            +
-                        |    |    |    |    |    |            |
-                     +--+----+----+----+-+--+----+---+  +-----+-----+
-                     |         Switch driver         |  |    mgmt   |
-                     |        (this document)        |  |   driver  |
-                     |                               |  |           |
-                     +--------------+----------------+  +-----------+
-                                    |
-       kernel                       | HW bus (eg PCI)
-      +-------------------------------------------------------------------+
-       hardware                     |
-                     +--------------+---+------------+
-                     |         Switch device (sw1)   |
-                     |  +----+                       +--------+
-                     |  |    v offloaded data path   | mgmt port
-                     |  |    |                       |
-                     +--|----|----+----+----+----+---+
-                        |    |    |    |    |    |
-                        +    +    +    +    +    +
-                       p1   p2   p3   p4   p5   p6
-
-                             front-panel ports
+ sw1p1 + sw1p3 + sw1p5 + eth1
+ + | + | + | +
+ | | | | | | |
+ +--+----+----+----+----+----+---+ +-----+-----+
+ | Switch driver | | mgmt |
+ | (this document) | | driver |
+ | | | |
+ +--------------+----------------+ +-----------+
+ |
+ kernel | HW bus (eg PCI)
+ +-------------------------------------------------------------------+
+ hardware |
+ +--------------+----------------+
+ | Switch device (sw1) |
+ | +----+ +--------+
+ | | v offloaded data path | mgmt port
+ | | | |
+ +--|----|----+----+----+----+---+
+ | | | | | |
+ + + + + + +
+ p1 p2 p3 p4 p5 p6
+
+ front-panel ports
Fig 1.
diff --git a/Documentation/networking/vrf.txt b/Documentation/networking/vrf.txt
index 3918dae964d4..8ff7b4c8f91b 100644
--- a/Documentation/networking/vrf.txt
+++ b/Documentation/networking/vrf.txt
@@ -71,7 +71,12 @@ Setup
ip ru add iif vrf-blue table 10
3. Set the default route for the table (and hence default route for the VRF).
- ip route add table 10 unreachable default
+ ip route add table 10 unreachable default metric 4278198272
+
+ This high metric value ensures that the default unreachable route can
+ be overridden by a routing protocol suite. FRRouting interprets
+ kernel metrics as a combined admin distance (upper byte) and priority
+ (lower 3 bytes). Thus the above metric translates to [255/8192].
4. Enslave L3 interfaces to a VRF device.
ip link set dev eth1 master vrf-blue
@@ -256,7 +261,7 @@ older form without it.
For example:
$ ip route show vrf red
- prohibit default
+ unreachable default metric 4278198272
broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2
10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2
local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2
@@ -282,7 +287,7 @@ older form without it.
ff00::/8 dev red metric 256 pref medium
ff00::/8 dev eth1 metric 256 pref medium
ff00::/8 dev eth2 metric 256 pref medium
-
+ unreachable default dev lo metric 4278198272 error -101 pref medium
8. Route Lookup for a VRF
@@ -331,7 +336,7 @@ function vrf_create
ip link add ${VRF} type vrf table ${TBID}
if [ "${VRF}" != "mgmt" ]; then
- ip route add table ${TBID} unreachable default
+ ip route add table ${TBID} unreachable default metric 4278198272
fi
ip link set dev ${VRF} up
}