summaryrefslogtreecommitdiff
path: root/include/net/ip_tunnels.h
AgeCommit message (Collapse)Author
2017-04-21ip_tunnel: Allow policy-based routing through tunnelsCraig Gallek
This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. There is no concept of per-packet routing decisions through IPv4 tunnels, so this implementation does not need to work with per-packet route lookups as the v6 implementation may (with IP6_TNL_F_USE_ORIG_FWMARK). Further, since the v4 tunnel ioctls share datastructures (which can not be trivially modified) with the kernel's internal tunnel configuration structures, the mark attribute must be stored in the tunnel structure itself and passed as a parameter when creating or changing tunnel attributes. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03ip_tunnels: new IP_TUNNEL_INFO_BRIDGE flag for ip_tunnel_info modeRoopa Prabhu
New ip_tunnel_info flag to represent bridged tunnel metadata. Used by bridge driver later in the series to pass per vlan dst metadata to bridge ports. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08net: make ndo_get_stats64 a void functionstephen hemminger
The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18netns: make struct pernet_operations::id unsigned intAlexey Dobriyan
Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17ip_tunnel: add collect_md mode to IPIP tunnelAlexei Starovoitov
Similar to gre, vxlan, geneve tunnels allow IPIP tunnels to operate in 'collect metadata' mode. bpf_skb_[gs]et_tunnel_key() helpers can make use of it right away. ovs can use it as well in the future (once appropriate ovs-vport abstractions and user apis are added). Note that just like in other tunnels we cannot cache the dst, since tunnel_info metadata can be different for every packet. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()Amir Vadai
Add utility functions to convert a 32 bits key into a 64 bits tunnel and vice versa. These functions will be used instead of cloning code in GRE and VXLAN, and in tc act_iptunnel which will be introduced in a following patch in this patchset. Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-18gre: better support for ICMP messages for gre+ipv6Eric Dumazet
ipgre_err() can call ip6_err_gen_icmpv6_unreach() for proper support of ipv4+gre+icmp+ipv6+... frames, used for example by traceroute/mtr. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net: ipv4: Add ability to have GRE ignore DF bit in IPv4 payloadsPhilip Prindeville
In the presence of firewalls which improperly block ICMP Unreachable (including Fragmentation Required) messages, Path MTU Discovery is prevented from working. A workaround is to handle IPv4 payloads opaquely, ignoring the DF bit--as is done for other payloads like AppleTalk--and doing transparent fragmentation and reassembly. Redux includes the enforcement of mutual exclusion between this feature and Path MTU Discovery as suggested by Alexander Duyck. Cc: Alexander Duyck <alexander.duyck@gmail.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20net: Cleanup encap items in ip_tunnels.hTom Herbert
Consolidate all the ip_tunnel_encap definitions in one spot in the header file. Also, move ip_encap_hlen and ip_tunnel_encap from ip_tunnel.c to ip_tunnels.h so they call be called without a dependency on ip_tunnel module. Similarly, move iptun_encaps to ip_tunnel_core.c. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04gre: receive also TEB packets for lwtunnelsJiri Benc
For ipgre interfaces in collect metadata mode, receive also traffic with encapsulated Ethernet headers. The lwtunnel users are supposed to sort this out correctly. This allows to have mixed Ethernet + L3-only traffic on the same lwtunnel interface. This is the same way as VXLAN-GPE behaves. To keep backwards compatibility and prevent any surprises, gretap interfaces have priority in receiving packets with Ethernet headers. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skbAlexander Duyck
This patch updates the IP tunnel core function iptunnel_handle_offloads so that we return an int and do not free the skb inside the function. This actually allows us to clean up several paths in several tunnels so that we can free the skb at one point in the path without having to have a secondary path if we are supporting tunnel offloads. In addition it should resolve some double-free issues I have found in the tunnels paths as I believe it is possible for us to end up triggering such an event in the case of fou or gue. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14gre: eliminate holes in ip_tunnelstephen hemminger
The structure can be packed denser by doing minor rearrangement of existing elements. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06ip_tunnel: implement __iptunnel_pull_headerJiri Benc
Allow calling of iptunnel_pull_header without special casing ETH_P_TEB inner protocol. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20tunnels: Remove encapsulation offloads on decap.Jesse Gross
If a packet is either locally encapsulated or processed through GRO it is marked with the offloads that it requires. However, when it is decapsulated these tunnel offload indications are not removed. This means that if we receive an encapsulated TCP packet, aggregate it with GRO, decapsulate, and retransmit the resulting frame on a NIC that does not support encapsulation, we won't be able to take advantage of hardware offloads even though it is just a simple TCP packet at this point. This fixes the problem by stripping off encapsulation offload indications when packets are decapsulated. The performance impacts of this bug are significant. In a test where a Geneve encapsulated TCP stream is sent to a hypervisor, GRO'ed, decapsulated, and bridged to a VM performance is improved by 60% (5Gbps->8Gbps) as a result of avoiding unnecessary segmentation at the VM tap interface. Reported-by: Ramu Ramamurthy <sramamur@linux.vnet.ibm.com> Fixes: 68c33163 ("v4 GRE: Add TCP segmentation offload for GRE") Signed-off-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use itDaniel Borkmann
eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX for this, which makes the code a bit more generic and allows to remove BPF_TUNLEN_MAX from eBPF code. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-11ip_tunnel: add support for setting flow label via collect metadataDaniel Borkmann
This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label from call sites. Currently, there's no such option and it's always set to zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so that flow-based tunnels via collect metadata frontends can make use of it. vxlan and geneve will be converted to add flow label support separately. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08ip_tunnel, bpf: ip_tunnel_info_opts_{get, set} depends on CONFIG_INETDaniel Borkmann
Helpers like ip_tunnel_info_opts_{get,set}() are only available if CONFIG_INET is set, thus add an empty definition into the header for the !CONFIG_INET case, where already other empty inline helpers are defined. This avoids ifdef kludge inside filter.c, but also vxlan and geneve themself where this facility can only be used with, depend on INET being set. For the !INET case TUNNEL_OPTIONS_PRESENT would never be set in flags. Fixes: 14ca0751c96f ("bpf: support for access to tunnel options") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08bpf, vxlan, geneve, gre: fix usage of dst_cache on xmitDaniel Borkmann
The assumptions from commit 0c1d70af924b ("net: use dst_cache for vxlan device"), 468dfffcd762 ("geneve: add dst caching support") and 3c1cb4d2604c ("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage when ip_tunnel_info is used is unfortunately not always valid as assumed. While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre with different remote dsts, tos, etc, therefore they cannot be assumed as packet independent. Right now vxlan, geneve, gre would cache the dst for eBPF and every packet would reuse the same entry that was first created on the initial route lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have a different one. Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test in vxlan needs to be handeled differently in this context as it is currently inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable() helper is added for the three tunnel cases, which checks if we can use dst cache. Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device") Fixes: 468dfffcd762 ("geneve: add dst caching support") Fixes: 3c1cb4d2604c ("net/ipv4: add dst cache support for gre lwtunnels") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/phy/bcm7xxx.c drivers/net/phy/marvell.c drivers/net/vxlan.c All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18iptunnel: scrub packet in iptunnel_pull_headerJiri Benc
Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call skb_scrub_packet directly instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16net: add dst_cache to ovs vxlan lwtunnelPaolo Abeni
In case of UDP traffic with datagram length below MTU this give about 2% performance increase when tunneling over ipv4 and about 60% when tunneling over ipv6 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16ip_tunnel: replace dst_cache with generic implementationPaolo Abeni
The current ip_tunnel cache implementation is prone to a race that will cause the wrong dst to be cached on cuncurrent dst cache miss and ip tunnel update via netlink. Replacing with the generic implementation fix the issue. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-12net: ip_tunnel: remove 'csum_help' argument to iptunnel_handle_offloadsEdward Cree
All users now pass false, so we can remove it, and remove the code that was conditional upon it. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-10vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devicesDavid Wragg
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default MTU values are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. Instead, set the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Signed-off-by: David Wragg <david@weave.works> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-25ip_tunnel: Move stats update to iptunnel_xmit()Pravin B Shelar
By moving stats update into iptunnel_xmit(), we can simplify iptunnel_xmit() usage. With this change there is no need to call another function (iptunnel_xmit_stats()) to update stats in tunnel xmit code path. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16ip_tunnel: disable preemption when updating per-cpu tstatsJason A. Donenfeld
Drivers like vxlan use the recently introduced udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the packet, updates the struct stats using the usual u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats). udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch tstats, so drivers like vxlan, immediately after, call iptunnel_xmit_stats, which does the same thing - calls u64_stats_update_begin/end on this_cpu_ptr(dev->tstats). While vxlan is probably fine (I don't know?), calling a similar function from, say, an unbound workqueue, on a fully preemptable kernel causes real issues: [ 188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6 [ 188.435579] caller is debug_smp_processor_id+0x17/0x20 [ 188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2 [ 188.435607] Call Trace: [ 188.435611] [<ffffffff8234e936>] dump_stack+0x4f/0x7b [ 188.435615] [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0 [ 188.435619] [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20 The solution would be to protect the whole this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with disabling preemption and then reenabling it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24ipv4: send arp replies to the correct tunnelJiri Benc
When using ip lwtunnels, the additional data for xmit (basically, the actual tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in metadata dst. When replying to ARP requests, we need to send the reply to the same tunnel the request came from. This means we need to construct proper metadata dst for ARP replies. We could perform another route lookup to get a dst entry with the correct lwtstate. However, this won't always ensure that the outgoing tunnel is the same as the incoming one, and it won't work anyway for IPv4 duplicate address detection. The only thing to do is to "reverse" the ip_tunnel_info. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31ip-tunnel: Use API to access tunnel metadata options.Pravin B Shelar
Currently tun-info options pointer is used in few cases to pass options around. But tunnel options can be accessed using ip_tunnel_info_opts() API without using the pointer. Following patch removes the redundant pointer and consistently make use of API. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29ip_tunnels: record IP version in tunnel infoJiri Benc
There's currently nothing preventing directing packets with IPv6 encapsulation data to IPv4 tunnels (and vice versa). If this happens, IPv6 addresses are incorrectly interpreted as IPv4 ones. Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this in ip_tunnel_info. Reject packets at appropriate places if they are supposed to be encapsulated into an incompatible protocol. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29ip_tunnels: convert the mode field of ip_tunnel_info to flagsJiri Benc
The mode field holds a single bit of information only (whether the ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags. This allows more mode flags to be added. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-20ip_tunnels: use tos and ttl fields also for IPv6Jiri Benc
Rename the ipv4_tos and ipv4_ttl fields to just 'tos' and 'ttl', as they'll be used with IPv6 tunnels, too. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-20ip_tunnels: add IPv6 addresses to ip_tunnel_keyJiri Benc
Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the newly introduced padding after the IPv4 addresses needs to be zeroed out. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-20ip_tunnels: use offsetofendJiri Benc
Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-20ip_tunnels: use u8/u16/u32Jiri Benc
The ip_tunnels.h include file uses mixture of __u16 and u16 (etc.) types. Unify it to the non-underscore variants. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-20ip_tunnels: remove custom alignment and packingJiri Benc
The custom alignment of struct ip_tunnel_key is unnecessary. In struct sw_flow_key, it starts at offset 256, in struct ip_tunnel_info it's the first field. The structure is also packed even without the __packed keyword. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10ip_gre: Add support to collect tunnel metadata.Pravin B Shelar
Following patch create new tunnel flag which enable tunnel metadata collection on given device. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-23ip_tunnel: Call ip_tunnel_core_init() from inet_init()Thomas Graf
Convert the module_init() to a invocation from inet_init() since ip_tunnel_core is part of the INET built-in. Fixes: 3093fbe7ff4 ("route: Per route IP tunnel metadata via lightweight tunnel") Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22ip_tunnel: Provide tunnel metadata API for CONFIG_INET=nThomas Graf
Account for the configuration FIB_RULES=y && INET=n as FIB_RULES can be selected by IPV6 or DECNET without INET. Fixes: e7030878fc84 ("fib: Add fib rule match on tunnel id") Fixes: 3093fbe7ff4b ("route: Per route IP tunnel metadata via lightweight tunnel") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21fib: Add fib rule match on tunnel idThomas Graf
This add the ability to select a routing table based on the tunnel id which allows to maintain separate routing tables for each virtual tunnel network. ip rule add from all tunnel-id 100 lookup 100 ip rule add from all tunnel-id 200 lookup 200 A new static key controls the collection of metadata at tunnel level upon demand. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21route: Per route IP tunnel metadata via lightweight tunnelThomas Graf
This introduces a new IP tunnel lightweight tunnel type which allows to specify IP tunnel instructions per route. Only IPv4 is supported at this point. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21vxlan: Flow based tunnelingThomas Graf
Allows putting a VXLAN device into a new flow-based mode in which skbs with a ip_tunnel_info dst metadata attached will be encapsulated according to the instructions stored in there with the VXLAN device defaults taken into consideration. Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is set, the packet processing will populate a ip_tunnel_info struct for each packet received and attach it to the skb using the new metadata dst. The metadata structure will contain the outer header and tunnel header fields which have been stripped off. Layers further up in the stack such as routing, tc or netfitler can later match on these fields and perform forwarding. It is the responsibility of upper layers to ensure that the flag is set if the metadata is needed. The flag limits the additional cost of metadata collecting based on demand. This prepares the VXLAN device to be steered by the routing and other subsystems which allows to support encapsulation for a large number of tunnel endpoints and tunnel ids through a single net_device which improves the scalability. It also allows for OVS to leverage this mode which in turn allows for the removal of the OVS specific VXLAN code. Because the skb is currently scrubed in vxlan_rcv(), the attachment of the new dst metadata is postponed until after scrubing which requires the temporary addition of a new member to vxlan_metadata. This member is removed again in a later commit after the indirect VXLAN receive API has been removed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel genericThomas Graf
Rename the tunnel metadata data structures currently internal to OVS and make them generic for use by all IP tunnels. Both structures are kernel internal and will stay that way. Their members are exposed to user space through individual Netlink attributes by OVS. It will therefore be possible to extend/modify these structures without affecting user ABI. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02ipip,gre,vti,sit: implement ndo_get_iflinkNicolas Dichtel
Don't use dev->iflink anymore. CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19tunnels: advertise link netns via netlinkNicolas Dichtel
Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is added to rtnetlink messages. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15openvswitch: Support VXLAN Group Policy extensionThomas Graf
Introduces support for the group policy extension to the VXLAN virtual port. The extension is disabled by default and only enabled if the user has provided the respective configuration. ovs-vsctl add-port br0 vxlan0 -- \ set Interface vxlan0 type=vxlan options:exts=gbp The configuration interface to enable the extension is based on a new attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION which can carry additional extensions as needed in the future. The group policy metadata is stored as binary blob (struct ovs_vxlan_opts) internally just like Geneve options but transported as nested Netlink attributes to user space. Renames the existing TUNNEL_OPTIONS_PRESENT to TUNNEL_GENEVE_OPT with the binary value kept intact, a new flag TUNNEL_VXLAN_OPT is introduced. The attributes OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and existing OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS are implemented mutually exclusive. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12ip_tunnel: Ops registration for secondary encap (fou, gue)Tom Herbert
Instead of calling fou and gue functions directly from ip_tunnel use ops for these that were previously registered. This patch adds the logic to add and remove encapsulation operations for ip_tunnel, and modified fou (and gue) to register with ip_tunnels. This patch also addresses a circular dependency between ip_tunnel and fou that was causing link errors when CONFIG_NET_IP_TUNNEL=y and CONFIG_NET_FOU=m. References to fou an gue have been removed from ip_tunnel.c Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06openvswitch: Add support for Geneve tunneling.Jesse Gross
The Openvswitch implementation is completely agnostic to the options that are in use and can handle newly defined options without further work. It does this by simply matching on a byte array of options and allowing userspace to setup flows on this array. Signed-off-by: Jesse Gross <jesse@nicira.com> Singed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06net: Add Geneve tunneling protocol driverAndy Zhou
This adds a device level support for Geneve -- Generic Network Virtualization Encapsulation. The protocol is documented at http://tools.ietf.org/html/draft-gross-geneve-01 Only protocol layer Geneve support is provided by this driver. Openvswitch can be used for configuring, set up and tear down functional Geneve tunnels. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19net: Changes to ip_tunnel to support foo-over-udp encapsulationTom Herbert
This patch changes IP tunnel to support (secondary) encapsulation, Foo-over-UDP. Changes include: 1) Adding tun_hlen as the tunnel header length, encap_hlen as the encapsulation header length, and hlen becomes the grand total of these. 2) Added common netlink define to support FOU encapsulation. 3) Routines to perform FOU encapsulation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip"Dmitry Popov
Ipv4 tunnels created with "local any remote $ip" didn't work properly since 7d442fab0 (ipv4: Cache dst in tunnels). 99% of packets sent via those tunnels had src addr = 0.0.0.0. That was because only dst_entry was cached, although fl4.saddr has to be cached too. Every time ip_tunnel_xmit used cached dst_entry (tunnel_rtable_get returned non-NULL), fl4.saddr was initialized with tnl_params->saddr (= 0 in our case), and wasn't changed until iptunnel_xmit(). This patch adds saddr to ip_tunnel->dst_cache, fixing this issue. Reported-by: Sergey Popov <pinkbyte@gentoo.org> Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>