summaryrefslogtreecommitdiff
path: root/drivers/net/bareudp.c
AgeCommit message (Collapse)Author
2024-04-01ip_tunnel: convert __be16 tunnel flags to bitmapsAlexander Lobakin
Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT have been defined as __be16. Now all of those 16 bits are occupied and there's no more free space for new flags. It can't be simply switched to a bigger container with no adjustments to the values, since it's an explicit Endian storage, and on LE systems (__be16)0x0001 equals to (__be64)0x0001000000000000. We could probably define new 64-bit flags depending on the Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on LE, but that would introduce an Endianness dependency and spawn a ton of Sparse warnings. To mitigate them, all of those places which were adjusted with this change would be touched anyway, so why not define stuff properly if there's no choice. Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the value already coded and a fistful of <16 <-> bitmap> converters and helpers. The two flags which have a different bit position are SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as __cpu_to_be16(), but as (__force __be16), i.e. had different positions on LE and BE. Now they both have strongly defined places. Change all __be16 fields which were used to store those flags, to IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) -> unsigned long[1] for now, and replace all TUNNEL_* occurrences to their bitmap counterparts. Use the converters in the places which talk to the userspace, hardware (NFP) or other hosts (GRE header). The rest must explicitly use the new flags only. This must be done at once, otherwise there will be too many conversions throughout the code in the intermediate commits. Finally, disable the old __be16 flags for use in the kernel code (except for the two 'irregular' flags mentioned above), to prevent any accidental (mis)use of them. For the userspace, nothing is changed, only additions were made. Most noticeable bloat-o-meter difference (.text): vmlinux: 307/-1 (306) gre.ko: 62/0 (62) ip_gre.ko: 941/-217 (724) [*] ip_tunnel.ko: 390/-900 (-510) [**] ip_vti.ko: 138/0 (138) ip6_gre.ko: 534/-18 (516) [*] ip6_tunnel.ko: 118/-10 (108) [*] gre_flags_to_tnl_flags() grew, but still is inlined [**] ip_tunnel_find() got uninlined, hence such decrease The average code size increase in non-extreme case is 100-200 bytes per module, mostly due to sizeof(long) > sizeof(__be16), as %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers are able to expand the majority of bitmap_*() calls here into direct operations on scalars. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04net: bareudp: Remove generic .ndo_get_stats64Breno Leitao
Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is configured") moved the callback to dev_get_tstats64() to net core, so, unless the driver is doing some custom stats collection, it does not need to set .ndo_get_stats64. Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04net: bareudp: Do not allocate stats in the driverBreno Leitao
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the bareudp driver and leverage the network core allocation. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-07bareudp: use exit_batch_rtnl() methodEric Dumazet
exit_batch_rtnl() is called while RTNL is held, and devices to be unregistered can be queued in the dev_kill_list. This saves one rtnl_lock()/rtnl_unlock() pair, and one unregister_netdevice_many() call. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://lore.kernel.org/r/20240206144313.2050392-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26bareudp: use ports to lookup routeBeniamino Galvani
The source and destination ports should be taken into account when determining the route destination; they can affect the result, for example in case there are routing rules defined. Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231025094441.417464-1-b.galvani@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-23ipv6: add new arguments to udp_tunnel6_dst_lookup()Beniamino Galvani
We want to make the function more generic so that it can be used by other UDP tunnel implementations such as geneve and vxlan. To do that, add the following arguments: - source and destination UDP port; - ifindex of the output interface, needed by vxlan; - the tos, because in some cases it is not taken from struct ip_tunnel_info (for example, when it's inherited from the inner packet); - the dst cache, because not all tunnel types (e.g. vxlan) want to use the one from struct ip_tunnel_info. With these parameters, the function no longer needs the full struct ip_tunnel_info as argument and we can pass only the relevant part of it (struct ip_tunnel_key). This is similar to what already done for IPv4 in commit 72fc68c6356b ("ipv4: add new arguments to udp_tunnel_dst_lookup()"). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-23ipv6: remove "proto" argument from udp_tunnel6_dst_lookup()Beniamino Galvani
The function is now UDP-specific, the protocol is always IPPROTO_UDP. This is similar to what already done for IPv4 in commit 78f3655adcb5 ("ipv4: remove "proto" argument from udp_tunnel_dst_lookup()"). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-23ipv6: rename and move ip6_dst_lookup_tunnel()Beniamino Galvani
At the moment ip6_dst_lookup_tunnel() is used only by bareudp. Ideally, other UDP tunnel implementations should use it, but to do so the function needs to accept new parameters that are specific for UDP tunnels, such as the ports. Prepare for these changes by renaming the function to udp_tunnel6_dst_lookup() and move it to file net/ipv6/ip6_udp_tunnel.c. This is similar to what already done for IPv4 in commit bf3fcbf7e7a0 ("ipv4: rename and move ip_route_output_tunnel()"). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-16ipv4: add new arguments to udp_tunnel_dst_lookup()Beniamino Galvani
We want to make the function more generic so that it can be used by other UDP tunnel implementations such as geneve and vxlan. To do that, add the following arguments: - source and destination UDP port; - ifindex of the output interface, needed by vxlan; - the tos, because in some cases it is not taken from struct ip_tunnel_info (for example, when it's inherited from the inner packet); - the dst cache, because not all tunnel types (e.g. vxlan) want to use the one from struct ip_tunnel_info. With these parameters, the function no longer needs the full struct ip_tunnel_info as argument and we can pass only the relevant part of it (struct ip_tunnel_key). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-16ipv4: remove "proto" argument from udp_tunnel_dst_lookup()Beniamino Galvani
The function is now UDP-specific, the protocol is always IPPROTO_UDP. Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-16ipv4: rename and move ip_route_output_tunnel()Beniamino Galvani
At the moment ip_route_output_tunnel() is used only by bareudp. Ideally, other UDP tunnel implementations should use it, but to do so the function needs to accept new parameters that are specific for UDP tunnels, such as the ports. Prepare for these changes by renaming the function to udp_tunnel_dst_lookup() and move it to file net/ipv4/udp_tunnel_core.c. Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-16bareudp: use ipv6_mod_enabled to check if IPv6 enabledHangbin Liu
bareudp_create_sock() use AF_INET6 by default if IPv6 CONFIG enabled. But if user start kernel with ipv6.disable=1, the bareudp sock will created failed, which cause the interface open failed even with ethertype ip. e.g. # ip link add bareudp1 type bareudp dstport 2 ethertype ip # ip link set bareudp1 up RTNETLINK answers: Address family not supported by protocol Fix it by using ipv6_mod_enabled() to check if IPv6 enabled. There is no need to check IS_ENABLED(CONFIG_IPV6) as ipv6_mod_enabled() will return false when CONFIG_IPV6 no enabled in include/linux/ipv6.h. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20220315062618.156230-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14bareudp: Add extack support to bareudp_configure()Guillaume Nault
Add missing extacks for common configuration errors. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-13bareudp: Move definition of struct bareudp_conf to bareudp.cGuillaume Nault
This structure is used only in bareudp.c. While there, adjust include files: we need netdevice.h, not skbuff.h. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-13bareudp: Remove bareudp_dev_create()Guillaume Nault
There's no user for this function. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29net: bareudp: fix duplicate checks of data[] expressionsJean Sacren
Both !data[IFLA_BAREUDP_PORT] and !data[IFLA_BAREUDP_ETHERTYPE] are checked. We should remove the checks of data[IFLA_BAREUDP_PORT] and data[IFLA_BAREUDP_ETHERTYPE] that follow since they are always true. Put both statements together in group and balance the space on both sides of '=' sign. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-09bareudp: Fix invalid read beyond skb's linear dataGuillaume Nault
Data beyond the UDP header might not be part of the skb's linear data. Use skb_copy_bits() instead of direct access to skb->data+X, so that we read the correct bytes even on a fragmented skb. Fixes: 4b5f67232d95 ("net: Special handling for IP & MPLS.") Signed-off-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/7741c46545c6ef02e70c80a9b32814b22d9616b3.1628264975.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-28bareudp: allow redirecting bareudp packets to eth devicesGuillaume Nault
Even though bareudp transports L3 data (typically IP or MPLS), it needs to reset the mac_header pointer, so that other parts of the stack don't mistakenly access the outer header after the packet has been decapsulated. This allows to push an Ethernet header to bareudp packets and redirect them to an Ethernet device: $ tc filter add dev bareudp0 ingress matchall \ action vlan push_eth dst_mac 00:00:5e:00:53:01 \ src_mac 00:00:5e:00:53:00 \ action mirred egress redirect dev eth0 Without this patch, push_eth refuses to add an ethernet header because the skb appears to already have a MAC header. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30bareudp: allow UDP L4 GRO passthrouPaolo Abeni
Similar to the previous commit, let even geneve passthrou the L4 GRO packets Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-04udp: call udp_encap_enable for v6 sockets when enabling encapXin Long
When enabling encap for a ipv6 socket without udp_encap_needed_key increased, UDP GRO won't work for v4 mapped v6 address packets as sk will be NULL in udp4_gro_receive(). This patch is to enable it by increasing udp_encap_needed_key for v6 sockets in udp_tunnel_encap_enable(), and correspondingly decrease udp_encap_needed_key in udpv6_destroy_sock(). v1->v2: - add udp_encap_disable() and export it. v2->v3: - add the change for rxrpc and bareudp into one patch, as Alex suggested. v3->v4: - move rxrpc part to another patch. Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18bareudp: add NETIF_F_FRAGLIST flag for dev featuresXin Long
Like vxlan and geneve, bareudp also needs this dev feature to support some protocol's HW GSO. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11net: bareudp: simplify error paths calling dellinkJakub Kicinski
bareudp_dellink() only needs the device list to hand it to unregister_netdevice_queue(). We can pass NULL in, and unregister_netdevice_queue() will do the unregistering. There is no chance for batching on the error path, anyway. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/20210111052922.2145003-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-06net: bareudp: add missing error handling for bareudp_link_config()Jakub Kicinski
.dellink does not get called after .newlink fails, bareudp_newlink() must undo what bareudp_configure() has done if bareudp_link_config() fails. v2: call bareudp_dellink(), like bareudp_dev_create() does Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Link: https://lore.kernel.org/r/20210105190725.1736246-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-04bareudp: Fix use of incorrect min_headroom sizeTaehee Yoo
In the bareudp6_xmit_skb(), it calculates min_headroom. At that point, it uses struct iphdr, but it's not correct. So panic could occur. The struct ipv6hdr should be used. Test commands: ip netns add A ip netns add B ip link add veth0 netns A type veth peer name veth1 netns B ip netns exec A ip link set veth0 up ip netns exec A ip a a 2001:db8:0::1/64 dev veth0 ip netns exec B ip link set veth1 up ip netns exec B ip a a 2001:db8:0::2/64 dev veth1 for i in {10..1} do let A=$i-1 ip netns exec A ip link add bareudp$i type bareudp dstport $i \ ethertype 0x86dd ip netns exec A ip link set bareudp$i up ip netns exec A ip -6 a a 2001:db8:$i::1/64 dev bareudp$i ip netns exec A ip -6 r a 2001:db8:$i::2 encap ip6 src \ 2001:db8:$A::1 dst 2001:db8:$A::2 via 2001:db8:$i::2 \ dev bareudp$i ip netns exec B ip link add bareudp$i type bareudp dstport $i \ ethertype 0x86dd ip netns exec B ip link set bareudp$i up ip netns exec B ip -6 a a 2001:db8:$i::2/64 dev bareudp$i ip netns exec B ip -6 r a 2001:db8:$i::1 encap ip6 src \ 2001:db8:$A::2 dst 2001:db8:$A::1 via 2001:db8:$i::1 \ dev bareudp$i done ip netns exec A ping 2001:db8:7::2 Splat looks like: [ 66.436679][ C2] skbuff: skb_under_panic: text:ffffffff928614c8 len:454 put:14 head:ffff88810abb4000 data:ffff88810abb3ffa tail:0x1c0 end:0x3ec0 dev:veth0 [ 66.441626][ C2] ------------[ cut here ]------------ [ 66.443458][ C2] kernel BUG at net/core/skbuff.c:109! [ 66.445313][ C2] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 66.447606][ C2] CPU: 2 PID: 913 Comm: ping Not tainted 5.10.0+ #819 [ 66.450251][ C2] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 66.453713][ C2] RIP: 0010:skb_panic+0x15d/0x15f [ 66.455345][ C2] Code: 98 fe 4c 8b 4c 24 10 53 8b 4d 70 45 89 e0 48 c7 c7 60 8b 78 93 41 57 41 56 41 55 48 8b 54 24 20 48 8b 74 24 28 e8 b5 40 f9 ff <0f> 0b 48 8b 6c 24 20 89 34 24 e8 08 c9 98 fe 8b 34 24 48 c7 c1 80 [ 66.462314][ C2] RSP: 0018:ffff888119209648 EFLAGS: 00010286 [ 66.464281][ C2] RAX: 0000000000000089 RBX: ffff888003159000 RCX: 0000000000000000 [ 66.467216][ C2] RDX: 0000000000000089 RSI: 0000000000000008 RDI: ffffed10232412c0 [ 66.469768][ C2] RBP: ffff88810a53d440 R08: ffffed102328018d R09: ffffed102328018d [ 66.472297][ C2] R10: ffff888119400c67 R11: ffffed102328018c R12: 000000000000000e [ 66.474833][ C2] R13: ffff88810abb3ffa R14: 00000000000001c0 R15: 0000000000003ec0 [ 66.477361][ C2] FS: 00007f37c0c72f00(0000) GS:ffff888119200000(0000) knlGS:0000000000000000 [ 66.480214][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 66.482296][ C2] CR2: 000055a058808570 CR3: 000000011039e002 CR4: 00000000003706e0 [ 66.484811][ C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 66.487793][ C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 66.490424][ C2] Call Trace: [ 66.491469][ C2] <IRQ> [ 66.492374][ C2] ? eth_header+0x28/0x190 [ 66.494054][ C2] ? eth_header+0x28/0x190 [ 66.495401][ C2] skb_push.cold.99+0x22/0x22 [ 66.496700][ C2] eth_header+0x28/0x190 [ 66.497867][ C2] neigh_resolve_output+0x3de/0x720 [ 66.499615][ C2] ? __neigh_update+0x7e8/0x20a0 [ 66.501176][ C2] __neigh_update+0x8bd/0x20a0 [ 66.502749][ C2] ndisc_update+0x34/0xc0 [ 66.504010][ C2] ndisc_recv_na+0x8da/0xb80 [ 66.505041][ C2] ? pndisc_redo+0x20/0x20 [ 66.505888][ C2] ? rcu_read_lock_sched_held+0xc0/0xc0 [ 66.506965][ C2] ndisc_rcv+0x3a0/0x470 [ 66.507797][ C2] icmpv6_rcv+0xad9/0x1b00 [ 66.508645][ C2] ip6_protocol_deliver_rcu+0xcd6/0x1560 [ 66.509719][ C2] ip6_input_finish+0x5b/0xf0 [ 66.510615][ C2] ip6_input+0xcd/0x2d0 [ 66.511406][ C2] ? ip6_input_finish+0xf0/0xf0 [ 66.512327][ C2] ? rcu_read_lock_held+0x91/0xa0 [ 66.513279][ C2] ? ip6_protocol_deliver_rcu+0x1560/0x1560 [ 66.514414][ C2] ipv6_rcv+0xe8/0x300 [ ... ] Acked-by: Guillaume Nault <gnault@redhat.com> Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20201228152146.24270-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-04bareudp: set NETIF_F_LLTX flagTaehee Yoo
Like other tunneling interfaces, the bareudp doesn't need TXLOCK. So, It is good to set the NETIF_F_LLTX flag to improve performance and to avoid lockdep's false-positive warning. Test commands: ip netns add A ip netns add B ip link add veth0 netns A type veth peer name veth1 netns B ip netns exec A ip link set veth0 up ip netns exec A ip a a 10.0.0.1/24 dev veth0 ip netns exec B ip link set veth1 up ip netns exec B ip a a 10.0.0.2/24 dev veth1 for i in {2..1} do let A=$i-1 ip netns exec A ip link add bareudp$i type bareudp \ dstport $i ethertype ip ip netns exec A ip link set bareudp$i up ip netns exec A ip a a 10.0.$i.1/24 dev bareudp$i ip netns exec A ip r a 10.0.$i.2 encap ip src 10.0.$A.1 \ dst 10.0.$A.2 via 10.0.$i.2 dev bareudp$i ip netns exec B ip link add bareudp$i type bareudp \ dstport $i ethertype ip ip netns exec B ip link set bareudp$i up ip netns exec B ip a a 10.0.$i.2/24 dev bareudp$i ip netns exec B ip r a 10.0.$i.1 encap ip src 10.0.$A.2 \ dst 10.0.$A.1 via 10.0.$i.1 dev bareudp$i done ip netns exec A ping 10.0.2.2 Splat looks like: [ 96.992803][ T822] ============================================ [ 96.993954][ T822] WARNING: possible recursive locking detected [ 96.995102][ T822] 5.10.0+ #819 Not tainted [ 96.995927][ T822] -------------------------------------------- [ 96.997091][ T822] ping/822 is trying to acquire lock: [ 96.998083][ T822] ffff88810f753898 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960 [ 96.999813][ T822] [ 96.999813][ T822] but task is already holding lock: [ 97.001192][ T822] ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960 [ 97.002908][ T822] [ 97.002908][ T822] other info that might help us debug this: [ 97.004401][ T822] Possible unsafe locking scenario: [ 97.004401][ T822] [ 97.005784][ T822] CPU0 [ 97.006407][ T822] ---- [ 97.007010][ T822] lock(_xmit_NONE#2); [ 97.007779][ T822] lock(_xmit_NONE#2); [ 97.008550][ T822] [ 97.008550][ T822] *** DEADLOCK *** [ 97.008550][ T822] [ 97.010057][ T822] May be due to missing lock nesting notation [ 97.010057][ T822] [ 97.011594][ T822] 7 locks held by ping/822: [ 97.012426][ T822] #0: ffff888109a144f0 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x12f7/0x2b00 [ 97.014191][ T822] #1: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020 [ 97.016045][ T822] #2: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960 [ 97.017897][ T822] #3: ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960 [ 97.019684][ T822] #4: ffffffffbce2f600 (rcu_read_lock){....}-{1:2}, at: bareudp_xmit+0x31b/0x3690 [bareudp] [ 97.021573][ T822] #5: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020 [ 97.023424][ T822] #6: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960 [ 97.025259][ T822] [ 97.025259][ T822] stack backtrace: [ 97.026349][ T822] CPU: 3 PID: 822 Comm: ping Not tainted 5.10.0+ #819 [ 97.027609][ T822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 97.029407][ T822] Call Trace: [ 97.030015][ T822] dump_stack+0x99/0xcb [ 97.030783][ T822] __lock_acquire.cold.77+0x149/0x3a9 [ 97.031773][ T822] ? stack_trace_save+0x81/0xa0 [ 97.032661][ T822] ? register_lock_class+0x1910/0x1910 [ 97.033673][ T822] ? register_lock_class+0x1910/0x1910 [ 97.034679][ T822] ? rcu_read_lock_sched_held+0x91/0xc0 [ 97.035697][ T822] ? rcu_read_lock_bh_held+0xa0/0xa0 [ 97.036690][ T822] lock_acquire+0x1b2/0x730 [ 97.037515][ T822] ? __dev_queue_xmit+0x1f52/0x2960 [ 97.038466][ T822] ? check_flags+0x50/0x50 [ 97.039277][ T822] ? netif_skb_features+0x296/0x9c0 [ 97.040226][ T822] ? validate_xmit_skb+0x29/0xb10 [ 97.041151][ T822] _raw_spin_lock+0x30/0x70 [ 97.041977][ T822] ? __dev_queue_xmit+0x1f52/0x2960 [ 97.042927][ T822] __dev_queue_xmit+0x1f52/0x2960 [ 97.043852][ T822] ? netdev_core_pick_tx+0x290/0x290 [ 97.044824][ T822] ? mark_held_locks+0xb7/0x120 [ 97.045712][ T822] ? lockdep_hardirqs_on_prepare+0x12c/0x3e0 [ 97.046824][ T822] ? __local_bh_enable_ip+0xa5/0xf0 [ 97.047771][ T822] ? ___neigh_create+0x12a8/0x1eb0 [ 97.048710][ T822] ? trace_hardirqs_on+0x41/0x120 [ 97.049626][ T822] ? ___neigh_create+0x12a8/0x1eb0 [ 97.050556][ T822] ? __local_bh_enable_ip+0xa5/0xf0 [ 97.051509][ T822] ? ___neigh_create+0x12a8/0x1eb0 [ 97.052443][ T822] ? check_chain_key+0x244/0x5f0 [ 97.053352][ T822] ? rcu_read_lock_bh_held+0x56/0xa0 [ 97.054317][ T822] ? ip_finish_output2+0x6ea/0x2020 [ 97.055263][ T822] ? pneigh_lookup+0x410/0x410 [ 97.056135][ T822] ip_finish_output2+0x6ea/0x2020 [ ... ] Acked-by: Guillaume Nault <gnault@redhat.com> Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20201228152136.24215-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02bareudp: constify device_type declarationJonas Bonn
device_type may be declared as const. Signed-off-by: Jonas Bonn <jonas@norrbonn.se> Link: https://lore.kernel.org/r/20201202122324.564918-1-jonas@norrbonn.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-09net: switch to dev_get_tstats64Heiner Kallweit
Replace ip_tunnel_get_stats64() with the new identical core function dev_get_tstats64(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-06bareudp: use dev_sw_netstats_rx_add()Fabian Frederick
use new helper for netstats settings Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04tunnels: PMTU discovery support for directly bridged IP packetsStefano Brivio
It's currently possible to bridge Ethernet tunnels carrying IP packets directly to external interfaces without assigning them addresses and routes on the bridged network itself: this is the case for UDP tunnels bridged with a standard bridge or by Open vSwitch. PMTU discovery is currently broken with those configurations, because the encapsulation effectively decreases the MTU of the link, and while we are able to account for this using PMTU discovery on the lower layer, we don't have a way to relay ICMP or ICMPv6 messages needed by the sender, because we don't have valid routes to it. On the other hand, as a tunnel endpoint, we can't fragment packets as a general approach: this is for instance clearly forbidden for VXLAN by RFC 7348, section 4.3: VTEPs MUST NOT fragment VXLAN packets. Intermediate routers may fragment encapsulated VXLAN packets due to the larger frame size. The destination VTEP MAY silently discard such VXLAN fragments. The same paragraph recommends that the MTU over the physical network accomodates for encapsulations, but this isn't a practical option for complex topologies, especially for typical Open vSwitch use cases. Further, it states that: Other techniques like Path MTU discovery (see [RFC1191] and [RFC1981]) MAY be used to address this requirement as well. Now, PMTU discovery already works for routed interfaces, we get route exceptions created by the encapsulation device as they receive ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and we already rebuild those messages with the appropriate MTU and route them back to the sender. Add the missing bits for bridged cases: - checks in skb_tunnel_check_pmtu() to understand if it's appropriate to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and RFC 4443 section 2.4 for ICMPv6. This function is already called by UDP tunnels - a new function generating those ICMP or ICMPv6 replies. We can't reuse icmp_send() and icmp6_send() as we don't see the sender as a valid destination. This doesn't need to be generic, as we don't cover any other type of ICMP errors given that we only provide an encapsulation function to the sender While at it, make the MTU check in skb_tunnel_check_pmtu() accurate: we might receive GSO buffers here, and the passed headroom already includes the inner MAC length, so we don't have to account for it a second time (that would imply three MAC headers on the wire, but there are just two). This issue became visible while bridging IPv6 packets with 4500 bytes of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50 bytes of encapsulation headroom, we would advertise MTU as 3950, and we would reject fragmented IPv6 datagrams of 3958 bytes size on the wire. We're exclusively dealing with network MTU here, though, so we could get Ethernet frames up to 3964 octets in that case. v2: - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern) - split IPv4/IPv6 functions (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Resolved kernel/bpf/btf.c using instructions from merge commit 69138b34a7248d2396ab85c8652e20c0c39beaba Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28bareudp: forbid mixing IP and MPLS in multiproto modeGuillaume Nault
In multiproto mode, bareudp_xmit() accepts sending multicast MPLS and IPv6 packets regardless of the bareudp ethertype. In practice, this let an IP tunnel send multicast MPLS packets, or an MPLS tunnel send IPv6 packets. We need to restrict the test further, so that the multiproto mode only enables * IPv6 for IPv4 tunnels, * or multicast MPLS for unicast MPLS tunnels. To improve clarity, the protocol validation is moved to its own function, where each logical test has its own condition. v2: s/ntohs/htons/ Fixes: 4b5f67232d95 ("net: Special handling for IP & MPLS.") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21bareudp: Reverted support to enable & disable rx metadata collectionMartin Varghese
The commit fe80536acf83 ("bareudp: Added attribute to enable & disable rx metadata collection") breaks the the original(5.7) default behavior of bareudp module to collect RX metadadata at the receive. It was added to avoid the crash at the kernel neighbour subsytem when packet with metadata from bareudp is processed. But it is no more needed as the commit 394de110a733 ("net: Added pointer check for dst->ops->neigh_lookup in dst_neigh_lookup_skb") solves this crash. Fixes: fe80536acf83 ("bareudp: Added attribute to enable & disable rx metadata collection") Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28bareudp: Added attribute to enable & disable rx metadata collectionMartin
Metadata need not be collected in receive if the packet from bareudp device is not targeted to openvswitch. Signed-off-by: Martin <martin.varghese@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-18bareudp: Fixed multiproto mode configurationMartin
Code to handle multiproto configuration is missing. Fixes: 4b5f67232d95 ("net: Special handling for IP & MPLS") Signed-off-by: Martin <martin.varghese@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-16bareudp: Fixed configuration to avoid having garbage valuesMartin
Code to initialize the conf structure while gathering the configuration of the device was missing. Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Martin <martin.varghese@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07net: bareudp: avoid uninitialized variable warningArnd Bergmann
clang points out that building without IPv6 would lead to returning an uninitialized variable if a packet with family!=AF_INET is passed into bareudp_udp_encap_recv(): drivers/net/bareudp.c:139:6: error: variable 'err' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] if (family == AF_INET) ^~~~~~~~~~~~~~~~~ drivers/net/bareudp.c:146:15: note: uninitialized use occurs here if (unlikely(err)) { ^~~ include/linux/compiler.h:78:42: note: expanded from macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/net/bareudp.c:139:2: note: remove the 'if' if its condition is always true if (family == AF_INET) ^~~~~~~~~~~~~~~~~~~~~~ This cannot happen in practice, so change the condition in a way that gcc sees the IPv4 case as unconditionally true here. For consistency, change all the similar constructs in this file the same way, using "if(IS_ENABLED())" instead of #if IS_ENABLED()". Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11bareudp: Fixed bareudp receive handlingMartin Varghese
Reverted commit "2baecda bareudp: remove unnecessary udp_encap_enable() in bareudp_socket_create()" An explicit call to udp_encap_enable is needed as the setup_udp_tunnel_sock does not call udp_encap_enable if the if the socket is of type v6. Bareudp device uses v6 socket to receive v4 & v6 traffic CC: Taehee Yoo <ap420073@gmail.com> Fixes: 2baecda37f4e ("bareudp: remove unnecessary udp_encap_enable() in bareudp_socket_create()") Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08bareudp: remove unnecessary udp_encap_enable() in bareudp_socket_create()Taehee Yoo
In the current code, udp_encap_enable() is called in bareudp_socket_create(). But, setup_udp_tunnel_sock() internally calls udp_encap_enable(). So, udp_encap_enable() is unnecessary. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08bareudp: print error message when command failsTaehee Yoo
When bareudp netlink command fails, it doesn't print any error message. So, users couldn't know the exact reason. In order to tell the exact reason to the user, the extack error message is used in this patch. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08bareudp: add module aliasTaehee Yoo
In the current bareudp code, there is no module alias. So, RTNL couldn't load bareudp module automatically. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24bareudp: Fix uninitialized variable warnings.David S. Miller
drivers/net/bareudp.c: In function 'bareudp_xmit_skb': drivers/net/bareudp.c:346:9: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized] 346 | return err; | ^~~ drivers/net/bareudp.c: In function 'bareudp6_xmit_skb': drivers/net/bareudp.c:407:9: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized] 407 | return err; Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: Special handling for IP & MPLS.Martin Varghese
Special handling is needed in bareudp module for IP & MPLS as they support more than one ethertypes. MPLS has 2 ethertypes. 0x8847 for MPLS unicast and 0x8848 for MPLS multicast. While decapsulating MPLS packet from UDP packet the tunnel destination IP address is checked to determine the ethertype. The ethertype of the packet will be set to 0x8848 if the tunnel destination IP address is a multicast IP address. The ethertype of the packet will be set to 0x8847 if the tunnel destination IP address is a unicast IP address. IP has 2 ethertypes.0x0800 for IPV4 and 0x86dd for IPv6. The version field of the IP header tunnelled will be checked to determine the ethertype. This special handling to tunnel additional ethertypes will be disabled by default and can be enabled using a flag called multiproto. This flag can be used only with ethertypes 0x8847 and 0x0800. Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: UDP tunnel encapsulation module for tunnelling different protocols like ↵Martin Varghese
MPLS, IP, NSH etc. The Bareudp tunnel module provides a generic L3 encapsulation tunnelling module for tunnelling different protocols like MPLS, IP,NSH etc inside a UDP tunnel. Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>