summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-03-26ipmr: Make MFC fib notifiers commonYuval Mintz
Like vif notifications, move the notifier struct for MFC as well as its helpers into a common file; Currently they're only used by ipmr. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26ipmr: Make vif fib notifiers commonYuval Mintz
The fib-notifiers are tightly coupled with the vif_device which is already common. Move the notifier struct definition and helpers to the common file; Currently they're only used by ipmr. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: Convert sunrpc_net_opsKirill Tkhai
These pernet_operations look similar to rpcsec_gss_net_ops, they just create and destroy another caches. So, they also can be async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: Convert rpcsec_gss_net_opsKirill Tkhai
These pernet_operations initialize and destroy sunrpc_net_id refered per-net items. Only used global list is cache_list, and accesses already serialized. sunrpc_destroy_cache_detail() check for list_empty() without cache_list_lock, but when it's called from unregister_pernet_subsys(), there can't be callers in parallel, so we won't miss list_empty() in this case. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: sched, fix OOO packets with pfifo_fastJohn Fastabend
After the qdisc lock was dropped in pfifo_fast we allow multiple enqueue threads and dequeue threads to run in parallel. On the enqueue side the skb bit ooo_okay is used to ensure all related skbs are enqueued in-order. On the dequeue side though there is no similar logic. What we observe is with fewer queues than CPUs it is possible to re-order packets when two instances of __qdisc_run() are running in parallel. Each thread will dequeue a skb and then whichever thread calls the ndo op first will be sent on the wire. This doesn't typically happen because qdisc_run() is usually triggered by the same core that did the enqueue. However, drivers will trigger __netif_schedule() when queues are transitioning from stopped to awake using the netif_tx_wake_* APIs. When this happens netif_schedule() calls qdisc_run() on the same CPU that did the netif_tx_wake_* which is usually done in the interrupt completion context. This CPU is selected with the irq affinity which is unrelated to the enqueue operations. To resolve this we add a RUNNING bit to the qdisc to ensure only a single dequeue per qdisc is running. Enqueue and dequeue operations can still run in parallel and also on multi queue NICs we can still have a dequeue in-flight per qdisc, which is typically per CPU. Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Reported-by: Jakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: Use octal not symbolic permissionsJoe Perches
Prefer the direct use of octal for permissions. Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace and some typing. Miscellanea: o Whitespace neatening around these conversions. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: Drop NETDEV_UNREGISTER_FINALKirill Tkhai
Last user is gone after bdf5bd7f2132 "rds: tcp: remove register_netdevice_notifier infrastructure.", so we can remove this netdevice command. This allows to delete rtnl_lock() in netdev_run_todo(), which is hot path for net namespace unregistration. dev_change_net_namespace() and netdev_wait_allrefs() have rcu_barrier() before NETDEV_UNREGISTER_FINAL call, and the source commits say they were introduced to delemit the call with NETDEV_UNREGISTER, but this patch leaves them on the places, since they require additional analysis, whether we need in them for something else. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: Make NETDEV_XXX commands enum { }Kirill Tkhai
This patch is preparation to drop NETDEV_UNREGISTER_FINAL. Since the cmd is used in usnic_ib_netdev_event_to_string() to get cmd name, after plain removing NETDEV_UNREGISTER_FINAL from everywhere, we'd have holes in event2str[] in this function. Instead of that, let's make NETDEV_XXX commands names available for everyone, and to define netdev_cmd_to_name() in the way we won't have to shaffle names after their numbers are changed. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net/utils: Introduce inet_addr_is_anySagi Grimberg
Can be useful to check INET_ANY address for both ipv4/ipv6 addresses. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-25tipc: tipc_disc_addr_trial_msg() can be statickbuild test robot
Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Jon Maloy jon.maloy@ericsson.com Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25ipv6: the entire IPv6 header chain must fit the first fragmentPaolo Abeni
While building ipv6 datagram we currently allow arbitrary large extheaders, even beyond pmtu size. The syzbot has found a way to exploit the above to trigger the following splat: kernel BUG at ./include/linux/skbuff.h:2073! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4230 Comm: syzkaller672661 Not tainted 4.16.0-rc2+ #326 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__skb_pull include/linux/skbuff.h:2073 [inline] RIP: 0010:__ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP: 0018:ffff8801bc18f0f0 EFLAGS: 00010293 RAX: ffff8801b17400c0 RBX: 0000000000000738 RCX: ffffffff84f01828 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801b415ac18 RBP: ffff8801bc18f360 R08: ffff8801b4576844 R09: 0000000000000000 R10: ffff8801bc18f380 R11: ffffed00367aee4e R12: 00000000000000d6 R13: ffff8801b415a740 R14: dffffc0000000000 R15: ffff8801b45767c0 FS: 0000000001535880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000002000b000 CR3: 00000001b4123001 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6_finish_skb include/net/ipv6.h:969 [inline] udp_v6_push_pending_frames+0x269/0x3b0 net/ipv6/udp.c:1073 udpv6_sendmsg+0x2a96/0x3400 net/ipv6/udp.c:1343 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg+0xca/0x110 net/socket.c:640 ___sys_sendmsg+0x320/0x8b0 net/socket.c:2046 __sys_sendmmsg+0x1ee/0x620 net/socket.c:2136 SYSC_sendmmsg net/socket.c:2167 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2162 do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x4404c9 RSP: 002b:00007ffdce35f948 EFLAGS: 00000217 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004404c9 RDX: 0000000000000003 RSI: 0000000020001f00 RDI: 0000000000000003 RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 0000000020000080 R11: 0000000000000217 R12: 0000000000401df0 R13: 0000000000401e80 R14: 0000000000000000 R15: 0000000000000000 Code: ff e8 1d 5e b9 fc e9 15 e9 ff ff e8 13 5e b9 fc e9 44 e8 ff ff e8 29 5e b9 fc e9 c0 e6 ff ff e8 3f f3 80 fc 0f 0b e8 38 f3 80 fc <0f> 0b 49 8d 87 80 00 00 00 4d 8d 87 84 00 00 00 48 89 85 20 fe RIP: __skb_pull include/linux/skbuff.h:2073 [inline] RSP: ffff8801bc18f0f0 RIP: __ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP: ffff8801bc18f0f0 As stated by RFC 7112 section 5: When a host fragments an IPv6 datagram, it MUST include the entire IPv6 Header Chain in the First Fragment. So this patch addresses the issue dropping datagrams with excessive extheader length. It also updates the error path to report to the calling socket nonnegative pmtu values. The issue apparently predates git history. v1 -> v2: cleanup error path, as per Eric's suggestion Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+91e6f9932ff122fa4410@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25netlink: make sure nladdr has correct size in netlink_connect()Alexander Potapenko
KMSAN reports use of uninitialized memory in the case when |alen| is smaller than sizeof(struct sockaddr_nl), and therefore |nladdr| isn't fully copied from the userspace. Signed-off-by: Alexander Potapenko <glider@google.com> Fixes: 1da177e4c3f41524 ("Linux-2.6.12-rc2") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25net/ipv4: disable SMC TCP option with SYN CookiesHans Wippel
Currently, the SMC experimental TCP option in a SYN packet is lost on the server side when SYN Cookies are active. However, the corresponding SYNACK sent back to the client contains the SMC option. This causes an inconsistent view of the SMC capabilities on the client and server. This patch disables the SMC option in the SYNACK when SYN Cookies are active to avoid this issue. Fixes: 60e2a7780793b ("tcp: TCP experimental option for SMC") Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25net: permit skb_segment on head_frag frag_list skbYonghong Song
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at function skb_segment(), line 3667. The bpf program attaches to clsact ingress, calls bpf_skb_change_proto to change protocol from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect to send the changed packet out. 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb, 3473 netdev_features_t features) 3474 { 3475 struct sk_buff *segs = NULL; 3476 struct sk_buff *tail = NULL; ... 3665 while (pos < offset + len) { 3666 if (i >= nfrags) { 3667 BUG_ON(skb_headlen(list_skb)); 3668 3669 i = 0; 3670 nfrags = skb_shinfo(list_skb)->nr_frags; 3671 frag = skb_shinfo(list_skb)->frags; 3672 frag_skb = list_skb; ... call stack: ... #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525 #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7 #4 [ffff883ffef03668] die at ffffffff8101deb2 #5 [ffff883ffef03698] do_trap at ffffffff8101a700 #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0 #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab [exception RIP: skb_segment+3044] RIP: ffffffff817e4dd4 RSP: ffff883ffef03860 RFLAGS: 00010216 RAX: 0000000000002bf6 RBX: ffff883feb7aaa00 RCX: 0000000000000011 RDX: ffff883fb87910c0 RSI: 0000000000000011 RDI: ffff883feb7ab500 RBP: ffff883ffef03928 R8: 0000000000002ce2 R9: 00000000000027da R10: 000001ea00000000 R11: 0000000000002d82 R12: ffff883f90a1ee80 R13: ffff883fb8791120 R14: ffff883feb7abc00 R15: 0000000000002ce2 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7 Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Don't pick fixed hash implementation for NFT_SET_EVAL sets, otherwise userspace hits EOPNOTSUPP with valid rules using the meter statement, from Florian Westphal. 2) If you send a batch that flushes the existing ruleset (that contains a NAT chain) and the new ruleset definition comes with a new NAT chain, don't bogusly hit EBUSY. Also from Florian. 3) Missing netlink policy attribute validation, from Florian. 4) Detach conntrack template from skbuff if IP_NODEFRAG is set on, from Paolo Abeni. 5) Cache device names in flowtable object, otherwise we may end up walking over devices going aways given no rtnl_lock is held. 6) Fix incorrect net_device ingress with ingress hooks. 7) Fix crash when trying to read more data than available in UDP packets from the nf_socket infrastructure, from Subash. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-24netfilter: nf_socket: Fix out of bounds access in nf_sk_lookup_slow_v{4,6}Subash Abhinov Kasiviswanathan
skb_header_pointer will copy data into a buffer if data is non linear, otherwise it will return a pointer in the linear section of the data. nf_sk_lookup_slow_v{4,6} always copies data of size udphdr but later accesses memory within the size of tcphdr (th->doff) in case of TCP packets. This causes a crash when running with KASAN with the following call stack - BUG: KASAN: stack-out-of-bounds in xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178 Read of size 2 at addr ffffffe3d417a87c by task syz-executor/28971 CPU: 2 PID: 28971 Comm: syz-executor Tainted: G B W O 4.9.65+ #1 Call trace: [<ffffff9467e8d390>] dump_backtrace+0x0/0x428 arch/arm64/kernel/traps.c:76 [<ffffff9467e8d7e0>] show_stack+0x28/0x38 arch/arm64/kernel/traps.c:226 [<ffffff946842d9b8>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffff946842d9b8>] dump_stack+0xd4/0x124 lib/dump_stack.c:51 [<ffffff946811d4b0>] print_address_description+0x68/0x258 mm/kasan/report.c:248 [<ffffff946811d8c8>] kasan_report_error mm/kasan/report.c:347 [inline] [<ffffff946811d8c8>] kasan_report.part.2+0x228/0x2f0 mm/kasan/report.c:371 [<ffffff946811df44>] kasan_report+0x5c/0x70 mm/kasan/report.c:372 [<ffffff946811bebc>] check_memory_region_inline mm/kasan/kasan.c:308 [inline] [<ffffff946811bebc>] __asan_load2+0x84/0x98 mm/kasan/kasan.c:739 [<ffffff94694d6f04>] __tcp_hdrlen include/linux/tcp.h:35 [inline] [<ffffff94694d6f04>] xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178 Fix this by copying data into appropriate size headers based on protocol. Fixes: a583636a83ea ("inet: refactor inet[6]_lookup functions to take skb") Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-24batman-adv: fix packet loss for broadcasted DHCP packets to a serverLinus Lüssing
DHCP connectivity issues can currently occur if the following conditions are met: 1) A DHCP packet from a client to a server 2) This packet has a multicast destination 3) This destination has a matching entry in the translation table (FF:FF:FF:FF:FF:FF for IPv4, 33:33:00:01:00:02/33:33:00:01:00:03 for IPv6) 4) The orig-node determined by TT for the multicast destination does not match the orig-node determined by best-gateway-selection In this case the DHCP packet will be dropped. The "gateway-out-of-range" check is supposed to only be applied to unicasted DHCP packets to a specific DHCP server. In that case dropping the the unicasted frame forces the client to retry via a broadcasted one, but now directed to the new best gateway. A DHCP packet with broadcast/multicast destination is already ensured to always be delivered to the best gateway. Dropping a multicasted DHCP packet here will only prevent completing DHCP as there is no other fallback. So far, it seems the unicast check was implicitly performed by expecting the batadv_transtable_search() to return NULL for multicast destinations. However, a multicast address could have always ended up in the translation table and in fact is now common. To fix this potential loss of a DHCP client-to-server packet to a multicast address this patch adds an explicit multicast destination check to reliably bail out of the gateway-out-of-range check for such destinations. The issue and fix were tested in the following three node setup: - Line topology, A-B-C - A: gateway client, DHCP client - B: gateway server, hop-penalty increased: 30->60, DHCP server - C: gateway server, code modifications to announce FF:FF:FF:FF:FF:FF Without this patch, A would never transmit its DHCP Discover packet due to an always "out-of-range" condition. With this patch, a full DHCP handshake between A and B was possible again. Fixes: be7af5cf9cae ("batman-adv: refactoring gateway handling code") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-03-24batman-adv: fix multicast-via-unicast transmission with AP isolationLinus Lüssing
For multicast frames AP isolation is only supposed to be checked on the receiving nodes and never on the originating one. Furthermore, the isolation or wifi flag bits should only be intepreted as such for unicast and never multicast TT entries. By injecting flags to the multicast TT entry claimed by a single target node it was verified in tests that this multicast address becomes unreachable, leading to packet loss. Omitting the "src" parameter to the batadv_transtable_search() call successfully skipped the AP isolation check and made the target reachable again. Fixes: 1d8ab8d3c176 ("batman-adv: Modified forwarding behaviour for multicast packets") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-03-23net/sched: act_vlan: declare push_vid with host byte orderDavide Caratti
use u16 in place of __be16 to suppress the following sparse warnings: net/sched/act_vlan.c:150:26: warning: incorrect type in assignment (different base types) net/sched/act_vlan.c:150:26: expected restricted __be16 [usertype] push_vid net/sched/act_vlan.c:150:26: got unsigned short net/sched/act_vlan.c:151:21: warning: restricted __be16 degrades to integer net/sched/act_vlan.c:208:26: warning: incorrect type in assignment (different base types) net/sched/act_vlan.c:208:26: expected unsigned short [unsigned] [usertype] tcfv_push_vid net/sched/act_vlan.c:208:26: got restricted __be16 [usertype] push_vid Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23net/sched: remove tcf_idr_cleanup()Davide Caratti
tcf_idr_cleanup() is no more used, so remove it. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23ipv6: fix possible deadlock in rt6_age_examine_exception()Eric Dumazet
syzbot reported a LOCKDEP splat [1] in rt6_age_examine_exception() rt6_age_examine_exception() is called while rt6_exception_lock is held. This lock is the lower one in the lock hierarchy, thus we can not call dst_neigh_lookup() function, as it can fallback to neigh_create() We should instead do a pure RCU lookup. As a bonus we avoid a pair of atomic operations on neigh refcount. [1] WARNING: possible circular locking dependency detected 4.16.0-rc4+ #277 Not tainted syz-executor7/4015 is trying to acquire lock: (&ndev->lock){++--}, at: [<00000000416dce19>] __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928 but task is already holding lock: (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&tbl->lock){++-.}: __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline] _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312 __neigh_create+0x87e/0x1d90 net/core/neighbour.c:528 neigh_create include/net/neighbour.h:315 [inline] ip6_neigh_lookup+0x9a7/0xba0 net/ipv6/route.c:228 dst_neigh_lookup include/net/dst.h:405 [inline] rt6_age_examine_exception net/ipv6/route.c:1609 [inline] rt6_age_exceptions+0x381/0x660 net/ipv6/route.c:1645 fib6_age+0xfb/0x140 net/ipv6/ip6_fib.c:2033 fib6_clean_node+0x389/0x580 net/ipv6/ip6_fib.c:1919 fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1845 fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1893 fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1970 __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1986 fib6_clean_all net/ipv6/ip6_fib.c:1997 [inline] fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2053 ndisc_netdev_event+0x3c2/0x4a0 net/ipv6/ndisc.c:1781 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707 call_netdevice_notifiers net/core/dev.c:1725 [inline] __dev_notify_flags+0x262/0x430 net/core/dev.c:6960 dev_change_flags+0xf5/0x140 net/core/dev.c:6994 devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080 inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919 sock_do_ioctl+0xef/0x390 net/socket.c:957 sock_ioctl+0x36b/0x610 net/socket.c:1081 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 SYSC_ioctl fs/ioctl.c:701 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> #2 (rt6_exception_lock){+.-.}: __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:315 [inline] rt6_flush_exceptions+0x21/0x210 net/ipv6/route.c:1367 fib6_del_route net/ipv6/ip6_fib.c:1677 [inline] fib6_del+0x624/0x12c0 net/ipv6/ip6_fib.c:1761 __ip6_del_rt+0xc7/0x120 net/ipv6/route.c:2980 ip6_del_rt+0x132/0x1a0 net/ipv6/route.c:2993 __ipv6_dev_ac_dec+0x3b1/0x600 net/ipv6/anycast.c:332 ipv6_dev_ac_dec net/ipv6/anycast.c:345 [inline] ipv6_sock_ac_close+0x2b4/0x3e0 net/ipv6/anycast.c:200 inet6_release+0x48/0x70 net/ipv6/af_inet6.c:433 sock_release+0x8d/0x1e0 net/socket.c:594 sock_close+0x16/0x20 net/socket.c:1149 __fput+0x327/0x7e0 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x199/0x270 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x9bb/0x1ad0 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:968 get_signal+0x73a/0x16d0 kernel/signal.c:2469 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> #1 (&(&tb->tb6_lock)->rlock){+.-.}: __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:315 [inline] __ip6_ins_rt+0x56/0x90 net/ipv6/route.c:1007 ip6_route_add+0x141/0x190 net/ipv6/route.c:2955 addrconf_prefix_route+0x44f/0x620 net/ipv6/addrconf.c:2359 fixup_permanent_addr net/ipv6/addrconf.c:3368 [inline] addrconf_permanent_addr net/ipv6/addrconf.c:3391 [inline] addrconf_notify+0x1ad2/0x2310 net/ipv6/addrconf.c:3460 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707 call_netdevice_notifiers net/core/dev.c:1725 [inline] __dev_notify_flags+0x15d/0x430 net/core/dev.c:6958 dev_change_flags+0xf5/0x140 net/core/dev.c:6994 do_setlink+0xa22/0x3bb0 net/core/rtnetlink.c:2357 rtnl_newlink+0xf37/0x1a50 net/core/rtnetlink.c:2965 rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4641 netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2444 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4659 netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline] netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xca/0x110 net/socket.c:639 ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047 __sys_sendmsg+0xe5/0x210 net/socket.c:2081 SYSC_sendmsg net/socket.c:2092 [inline] SyS_sendmsg+0x2d/0x50 net/socket.c:2088 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> #0 (&ndev->lock){++--}: lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920 __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline] _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312 __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928 ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961 pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392 pneigh_ifdown net/core/neighbour.c:695 [inline] neigh_ifdown+0x149/0x250 net/core/neighbour.c:294 rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874 addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633 addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707 call_netdevice_notifiers net/core/dev.c:1725 [inline] __dev_notify_flags+0x262/0x430 net/core/dev.c:6960 dev_change_flags+0xf5/0x140 net/core/dev.c:6994 devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080 inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919 packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066 sock_do_ioctl+0xef/0x390 net/socket.c:957 sock_ioctl+0x36b/0x610 net/socket.c:1081 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 SYSC_ioctl fs/ioctl.c:701 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 other info that might help us debug this: Chain exists of: &ndev->lock --> rt6_exception_lock --> &tbl->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&tbl->lock); lock(rt6_exception_lock); lock(&tbl->lock); lock(&ndev->lock); *** DEADLOCK *** 2 locks held by syz-executor7/4015: #0: (rtnl_mutex){+.+.}, at: [<00000000a2f16daa>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 #1: (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292 stack backtrace: CPU: 0 PID: 4015 Comm: syz-executor7 Not tainted 4.16.0-rc4+ #277 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223 check_prev_add kernel/locking/lockdep.c:1863 [inline] check_prevs_add kernel/locking/lockdep.c:1976 [inline] validate_chain kernel/locking/lockdep.c:2417 [inline] __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3431 lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920 __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline] _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312 __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928 ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961 pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392 pneigh_ifdown net/core/neighbour.c:695 [inline] neigh_ifdown+0x149/0x250 net/core/neighbour.c:294 rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874 addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633 addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707 call_netdevice_notifiers net/core/dev.c:1725 [inline] __dev_notify_flags+0x262/0x430 net/core/dev.c:6960 dev_change_flags+0xf5/0x140 net/core/dev.c:6994 devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080 inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919 packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066 sock_do_ioctl+0xef/0x390 net/socket.c:957 sock_ioctl+0x36b/0x610 net/socket.c:1081 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 SYSC_ioctl fs/ioctl.c:701 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: c757faa8bfa2 ("ipv6: prepare fib6_age() for exception table") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tipc: obtain node identity from interface by defaultJon Maloy
Selecting and explicitly configuring a TIPC node identity may be unwanted in some cases. In this commit we introduce a default setting if the identity has not been set at the moment the first bearer is enabled. We do this by using a raw copy of a unique identifier from the used interface: MAC address in the case of an L2 bearer, IPv4/IPv6 address in the case of a UDP bearer. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tipc: handle collisions of 32-bit node address hash valuesJon Maloy
When a 32-bit node address is generated from a 128-bit identifier, there is a risk of collisions which must be discovered and handled. We do this as follows: - We don't apply the generated address immediately to the node, but do instead initiate a 1 sec trial period to allow other cluster members to discover and handle such collisions. - During the trial period the node periodically sends out a new type of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast, to all the other nodes in the cluster. - When a node is receiving such a message, it must check that the presented 32-bit identifier either is unused, or was used by the very same peer in a previous session. In both cases it accepts the request by not responding to it. - If it finds that the same node has been up before using a different address, it responds with a DSC_TRIAL_FAIL_MSG containing that address. - If it finds that the address has already been taken by some other node, it generates a new, unused address and returns it to the requester. - During the trial period the requesting node must always be prepared to accept a failure message, i.e., a message where a peer suggests a different (or equal) address to the one tried. In those cases it must apply the suggested value as trial address and restart the trial period. This algorithm ensures that in the vast majority of cases a node will have the same address before and after a reboot. If a legacy user configures the address explicitly, there will be no trial period and messages, so this protocol addition is completely backwards compatible. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tipc: add 128-bit node identifierJon Maloy
We add a 128-bit node identity, as an alternative to the currently used 32-bit node address. For the sake of compatibility and to minimize message header changes we retain the existing 32-bit address field. When not set explicitly by the user, this field will be filled with a hash value generated from the much longer node identity, and be used as a shorthand value for the latter. We permit either the address or the identity to be set by configuration, but not both, so when the address value is set by a legacy user the corresponding 128-bit node identity is generated based on the that value. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tipc: remove direct accesses to own_addr field in struct tipc_netJon Maloy
As a preparation to changing the addressing structure of TIPC we replace all direct accesses to the tipc_net::own_addr field with the function dedicated for this, tipc_own_addr(). There are no changes to program logics in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tipc: allow closest-first lookup algorithm when legacy address is configuredJon Maloy
The removal of an internal structure of the node address has an unwanted side effect. - Currently, if a user is sending an anycast message with destination domain 0, the tipc_namebl_translate() function will use the 'closest- first' algorithm to first look for a node local destination, and only when no such is found, will it resort to the cluster global 'round- robin' lookup algorithm. - Current users can get around this, and enforce unconditional use of global round-robin by indicating a destination as Z.0.0 or Z.C.0. - This option disappears when we make the node address flat, since the lookup algorithm has no way of recognizing this case. So, as long as there are node local destinations, the algorithm will always select one of those, and there is nothing the sender can do to change this. We solve this by eliminating the 'closest-first' option, which was never a good idea anyway, for non-legacy users, but only for those. To distinguish between legacy users and non-legacy users we introduce a new flag 'legacy_addr_format' in struct tipc_core, to be set when the user configures a legacy-style Z.C.N node address. Hence, when a legacy user indicates a zero lookup domain 'closest-first' is selected, and in all other cases we use 'round-robin'. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tipc: remove restrictions on node address valuesJon Maloy
Nominally, TIPC organizes network nodes into a three-level network hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This hierarchy is reflected in the node address format, - it is sub-divided into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id. However, the 'zone' and 'cluster' levels have in reality never been fully implemented,and never will be. The result of this has been that the first 20 bits the node identity structure have been wasted, and the usable node identity range within a cluster has been limited to 12 bits. This is starting to become a problem. In the following commits, we will need to be able to connect between nodes which are using the whole 32-bit value space of the node address. We therefore remove the restrictions on which values can be assigned to node identity, -it is from now on only a 32-bit integer with no assumed internal structure. Isolation between clusters is now achieved only by setting different values for the 'network id' field used during neighbor discovery, in practice leading to the latter becoming the new cluster identity. The rules for accepting discovery requests/responses from neighboring nodes now become: - If the user is using legacy address format on both peers, reception of discovery messages is subject to the legacy lookup domain check in addition to the cluster id check. - Otherwise, the discovery request/response is always accepted, provided both peers have the same network id. This secures backwards compatibility for users who have been using zone or cluster identities as cluster separators, instead of the intended 'network id'. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tipc: some cleanups in the file discover.cJon Maloy
To facilitate the coming changes in the neighbor discovery functionality we make some renaming and refactoring of that code. The functional changes in this commit are trivial, e.g., that we move the message sending call in tipc_disc_timeout() outside the spinlock protected region. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tipc: refactor function tipc_enable_bearer()Jon Maloy
As a preparation for the next commits we try to reduce the footprint of the function tipc_enable_bearer(), while hopefully making is simpler to follow. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23net: Convert rxrpc_net_opsKirill Tkhai
These pernet_operations modifies rxrpc_net_id-pointed per-net entities. There is external link to AF_RXRPC in fs/afs/Kconfig, but it seems there is no other pernet_operations interested in that per-net entities. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23net: Convert udp_sysctl_opsKirill Tkhai
These pernet_operations just initialize udp4 defaults. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23ip_tunnel: Emit events for post-register MTU changesPetr Machata
For tunnels created with IFLA_MTU, MTU of the netdevice is set by rtnl_create_link() (called from rtnl_newlink()) before the device is registered. However without IFLA_MTU that's not done. rtnl_newlink() proceeds by calling struct rtnl_link_ops.newlink, which via ip_tunnel_newlink() calls register_netdevice(), and that emits NETDEV_REGISTER. Thus any listeners that inspect the netdevice get the MTU of 0. After ip_tunnel_newlink() corrects the MTU after registering the netdevice, but since there's no event, the listeners don't get to know about the MTU until something else happens--such as a NETDEV_UP event. That's not ideal. So instead of setting the MTU directly, go through dev_set_mtu(), which takes care of distributing the necessary NETDEV_PRECHANGEMTU and NETDEV_CHANGEMTU events. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23net: bridge: fix direct access to bridge vlan_enabled and use helperNikolay Aleksandrov
We need to use br_vlan_enabled() helper otherwise we'll break builds without bridge vlans: net/bridge//br_if.c: In function ‘br_mtu’: net/bridge//br_if.c:458:8: error: ‘const struct net_bridge’ has no member named ‘vlan_enabled’ if (br->vlan_enabled) ^ net/bridge//br_if.c:462:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ scripts/Makefile.build:324: recipe for target 'net/bridge//br_if.o' failed Fixes: 419d14af9e07 ("bridge: Allow max MTU when multiple VLANs present") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: RX path for ktlsDave Watson
Add rx path for tls software implementation. recvmsg, splice_read, and poll implemented. An additional sockopt TLS_RX is added, with the same interface as TLS_TX. Either TLX_RX or TLX_TX may be provided separately, or together (with two different setsockopt calls with appropriate keys). Control messages are passed via CMSG in a similar way to transmit. If no cmsg buffer is passed, then only application data records will be passed to userspace, and EIO is returned for other types of alerts. EBADMSG is passed for decryption errors, and EMSGSIZE is passed for framing too big, and EBADMSG for framing too small (matching openssl semantics). EINVAL is returned for TLS versions that do not match the original setsockopt call. All are unrecoverable. strparser is used to parse TLS framing. Decryption is done directly in to userspace buffers if they are large enough to support it, otherwise sk_cow_data is called (similar to ipsec), and buffers are decrypted in place and copied. splice_read always decrypts in place, since no buffers are provided to decrypt in to. sk_poll is overridden, and only returns POLLIN if a full TLS message is received. Otherwise we wait for strparser to finish reading a full frame. Actual decryption is only done during recvmsg or splice_read calls. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Refactor variable namesDave Watson
Several config variables are prefixed with tx, drop the prefix since these will be used for both tx and rx. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Pass error code explicitly to tls_err_abortDave Watson
Pass EBADMSG explicitly to tls_err_abort. Receive path will pass additional codes - EMSGSIZE if framing is larger than max TLS record size, EINVAL if TLS version mismatch. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Move cipher info to a separate structDave Watson
Separate tx crypto parameters to a separate cipher_context struct. The same parameters will be used for rx using the same struct. tls_advance_record_sn is modified to only take the cipher info. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Generalize zerocopy_from_iterDave Watson
Refactor zerocopy_from_iter to take arguments for pages and size, such that it can be used for both tx and rx. RX will also support zerocopy direct to output iter, as long as the full message can be copied at once (a large enough userspace buffer was provided). Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23bridge: Allow max MTU when multiple VLANs presentChas Williams
If the bridge is allowing multiple VLANs, some VLANs may have different MTUs. Instead of choosing the minimum MTU for the bridge interface, choose the maximum MTU of the bridge members. With this the user only needs to set a larger MTU on the member ports that are participating in the large MTU VLANS. Signed-off-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23mac80211: notify driver for change in multicast ratesPradeep Kumar Chitrapu
With drivers implementing rate control in driver or firmware rate_control_send_low() may not get called, and thus the driver needs to know about changes in the multicast rate. Add and use a new BSS change flag for this. Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-23xfrm: Fix transport mode skb control buffer usage.Steffen Klassert
A recent commit introduced a new struct xfrm_trans_cb that is used with the sk_buff control buffer. Unfortunately it placed the structure in front of the control buffer and overlooked that the IPv4/IPv6 control buffer is still needed for some layer 4 protocols. As a result the IPv4/IPv6 control buffer is overwritten with this structure. Fix this by setting a apropriate header in front of the structure. Fixes acf568ee859f ("xfrm: Reinject transport-mode packets ...") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-22net: Replace ip_ra_lock with per-net mutexKirill Tkhai
Since ra_chain is per-net, we may use per-net mutexes to protect them in ip_ra_control(). This improves scalability. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Make ip_ra_chain per struct netKirill Tkhai
This is optimization, which makes ip_call_ra_chain() iterate less sockets to find the sockets it's looking for. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Revert "ipv4: fix a deadlock in ip_ra_control"Kirill Tkhai
This reverts commit 1215e51edad1. Since raw_close() is used on every RAW socket destruction, the changes made by 1215e51edad1 scale sadly. This clearly seen on endless unshare(CLONE_NEWNET) test, and cleanup_net() kwork spends a lot of time waiting for rtnl_lock() introduced by this commit. Previous patch moved IP_ROUTER_ALERT out of rtnl_lock(), so we revert this patch. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Move IP_ROUTER_ALERT out of lock_sock(sk)Kirill Tkhai
ip_ra_control() does not need sk_lock. Who are the another users of ip_ra_chain? ip_mroute_setsockopt() doesn't take sk_lock, while parallel IP_ROUTER_ALERT syscalls are synchronized by ip_ra_lock. So, we may move this command out of sk_lock. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Revert "ipv4: get rid of ip_ra_lock"Kirill Tkhai
This reverts commit ba3f571d5dde. The commit was made after 1215e51edad1 "ipv4: fix a deadlock in ip_ra_control", and killed ip_ra_lock, which became useless after rtnl_lock() made used to destroy every raw ipv4 socket. This scales very bad, and next patch in series reverts 1215e51edad1. ip_ra_lock will be used again. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22gre: fix TUNNEL_SEQ bit check on sequence numberingColin Ian King
The current logic of flags | TUNNEL_SEQ is always non-zero and hence sequence numbers are always incremented no matter the setting of the TUNNEL_SEQ bit. Fix this by using & instead of |. Detected by CoverityScan, CID#1466039 ("Operands don't affect result") Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22tipc: step sk->sk_drops when rcv buffer is fullGhantaKrishnamurthy MohanKrishna
Currently when tipc is unable to queue a received message on a socket, the message is rejected back to the sender with error TIPC_ERR_OVERLOAD. However, the application on this socket has no knowledge about these discards. In this commit, we try to step the sk_drops counter when tipc is unable to queue a received message. Export sk_drops using tipc socket diagnostics. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22tipc: implement socket diagnostics for AF_TIPCGhantaKrishnamurthy MohanKrishna
This commit adds socket diagnostics capability for AF_TIPC in netlink family NETLINK_SOCK_DIAG in a new kernel module (diag.ko). The following are key design considerations: - config TIPC_DIAG has default y, like INET_DIAG. - only requests with flag NLM_F_DUMP is supported (dump all). - tipc_sock_diag_req message is introduced to send filter parameters. - the response attributes are of TLV, some nested. To avoid exposing data structures between diag and tipc modules and avoid code duplication, the following additions are required: - export tipc_nl_sk_walk function to reuse socket iterator. - export tipc_sk_fill_sock_diag to fill the tipc diag attributes. - create a sock_diag response message in __tipc_add_sock_diag defined in diag.c and use the above exported tipc_sk_fill_sock_diag to fill response. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>