summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-05-24ipv6: Move exception bucket to fib6_nhDavid Ahern
Similar to the pcpu routes exceptions are really per nexthop, so move rt6i_exception_bucket from fib6_info to fib6_nh. To avoid additional increases to the size of fib6_nh for a 1-bit flag, use the lowest bit in the allocated memory pointer for the flushed flag. Add helpers for retrieving the bucket pointer to mask off the flag. The cleanup of the exception bucket is moved to fib6_nh_release. fib6_nh_flush_exceptions can now be called from 2 contexts: 1. deleting a fib entry 2. deleting a fib6_nh For 1., fib6_nh_flush_exceptions is called for a specific fib6_info that is getting deleted. All exceptions in the cache using the entry are deleted. For 2, the fib6_nh itself is getting destroyed so fib6_nh_flush_exceptions is called for a NULL fib6_info which means flush all entries. The pmtu.sh selftest exercises the affected code paths - from creating exceptions to cleaning them up on device delete. All tests pass without any rcu locking or memleak warnings. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24ipv6: Refactor exception functionsDavid Ahern
Before moving exception bucket from fib6_info to fib6_nh, refactor rt6_flush_exceptions, rt6_remove_exception_rt, rt6_mtu_change_route, and rt6_update_exception_stamp_rt. In all 3 cases, move the primary logic into a new helper that starts with fib6_nh_. The latter 3 functions still take a fib6_info; this will be changed to fib6_nh in the next patch. In the case of rt6_mtu_change_route, move the fib6_metric_locked out as a standalone check - no need to call the new function if the fib entry has the mtu locked. Also, add fib6_info to rt6_mtu_change_arg as a way of passing the fib entry to the new helper. No functional change intended. The goal here is to make the next patch easier to review by moving existing lookup logic for each to new helpers. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24ipv6: Refactor fib6_drop_pcpu_fromDavid Ahern
Move the existing pcpu walk in fib6_drop_pcpu_from to a new helper, __fib6_drop_pcpu_from, that can be invoked per fib6_nh with a reference to the from entries that need to be evicted. If the passed in 'from' is non-NULL then only entries associated with that fib6_info are removed (e.g., case where fib entry is deleted); if the 'from' is NULL are entries are flushed (e.g., fib6_nh is deleted). For fib6_info entries with builtin fib6_nh (ie., current code) there is no change in behavior. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24ipv6: Move pcpu cached routes to fib6_nhDavid Ahern
rt6_info are specific instances of a fib entry and are tied to a device and gateway - ie., a nexthop. Before nexthop objects, IPv6 fib entries have separate fib6_info for each nexthop in a multipath route, so the location of the pcpu cache in the fib6_info struct worked. However, with nexthop objects a fib6_info can point to a set of nexthops (yet another alignment of ipv6 with ipv4). Accordingly, the pcpu cache needs to be moved to the fib6_nh struct so the cached entries are local to the nexthop specification used to create the rt6_info. Initialization and free of the pcpu entries moved to fib6_nh_init and fib6_nh_release. Change in location only, from fib6_info down to fib6_nh; no other functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 104Thomas Gleixner
Based on 1 normalized pattern(s): this sctp implementation is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version this sctp implementation is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with gnu cc see the file copying if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 42 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091649.683323110@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 103Thomas Gleixner
Based on 1 normalized pattern(s): the sctp implementation is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version the sctp implementation is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with gnu cc see the file copying if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091649.592169384@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 93Thomas Gleixner
Based on 1 normalized pattern(s): this code is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520075212.233647300@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 77Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation or any later at your option extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 5 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520075210.769496418@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 72Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 or any later at your option as published by the free software foundation extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520071859.749329557@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 62Thomas Gleixner
Based on 1 normalized pattern(s): released under the gpl version 2 or later and 1 additional normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520071858.828691433@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 61Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 675 mass ave cambridge ma 02139 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 441 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520071858.739733335@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 53Thomas Gleixner
Based on 1 normalized pattern(s): this code may be copied under the gpl v 2 or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520071858.029737698@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 41Thomas Gleixner
Based on 1 normalized pattern(s): this module is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 18 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170858.008906948@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public licence as published by the free software foundation either version 2 of the licence or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 114 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170857.552531963@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24{nl,mac}80211: allow 4addr AP operation on crypto controlled devicesManikanta Pubbisetty
As per the current design, in the case of sw crypto controlled devices, it is the device which advertises the support for AP/VLAN iftype based on it's ability to tranmsit packets encrypted in software (In VLAN functionality, group traffic generated for a specific VLAN group is always encrypted in software). Commit db3bdcb9c3ff ("mac80211: allow AP_VLAN operation on crypto controlled devices") has introduced this change. Since 4addr AP operation also uses AP/VLAN iftype, this conditional way of advertising AP/VLAN support has broken 4addr AP mode operation on crypto controlled devices which do not support VLAN functionality. In the case of ath10k driver, not all firmwares have support for VLAN functionality but all can support 4addr AP operation. Because AP/VLAN support is not advertised for these devices, 4addr AP operations are also blocked. Fix this by allowing 4addr operation on devices which do not support AP/VLAN iftype but can support 4addr AP operation (decision is based on the wiphy flag WIPHY_FLAG_4ADDR_AP). Cc: stable@vger.kernel.org Fixes: db3bdcb9c3ff ("mac80211: allow AP_VLAN operation on crypto controlled devices") Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-05-24mac80211: fix rate reporting inside cfg80211_calculate_bitrate_he()John Crispin
The reported rate is not scaled down correctly. After applying this patch, the function will behave just like the v/ht equivalents. Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com> Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-05-24mac80211: remove set but not used variable 'old'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: net/mac80211/key.c: In function 'ieee80211_set_tx_key': net/mac80211/key.c:271:24: warning: variable 'old' set but not used [-Wunused-but-set-variable] It is not used since introduction in commit 96fc6efb9ad9 ("mac80211: IEEE 802.11 Extended Key ID support") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-05-24mac80211: handle deauthentication/disassociation from TDLS peerYu Wang
When receiving a deauthentication/disassociation frame from a TDLS peer, a station should not disconnect the current AP, but only disable the current TDLS link if it's enabled. Without this change, a TDLS issue can be reproduced by following the steps as below: 1. STA-1 and STA-2 are connected to AP, bidirection traffic is running between STA-1 and STA-2. 2. Set up TDLS link between STA-1 and STA-2, stay for a while, then teardown TDLS link. 3. Repeat step #2 and monitor the connection between STA and AP. During the test, one STA may send a deauthentication/disassociation frame to another, after TDLS teardown, with reason code 6/7, which means: Class 2/3 frame received from nonassociated STA. On receive this frame, the receiver STA will disconnect the current AP and then reconnect. It's not a expected behavior, purpose of this frame should be disabling the TDLS link, not the link with AP. Cc: stable@vger.kernel.org Signed-off-by: Yu Wang <yyuwang@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree: 1) Fix crash when dumping rules after conversion to RCU, from Florian Westphal. 2) Fix incorrect hook reinjection from nf_queue in case NF_REPEAT, from Jagdish Motwani. 3) Fix check for route existence in fib extension, from Phil Sutter. 4) Fix use after free in ip_vs_in() hook, from YueHaibing. 5) Check for veth existence from netfilter selftests, from Jeffrin Jose T. 6) Checksum corruption in UDP NAT helpers due to typo, from Florian Westphal. 7) Pass up packets to classic forwarding path regardless of IPv4 DF bit, patch for the flowtable infrastructure from Florian. 8) Set liberal TCP tracking for flows that are placed in the flowtable, in case they need to go back to classic forwarding path, also from Florian. 9) Don't add flow with sequence adjustment to flowtable, from Florian. 10) Skip IPv4 options from IPv6 datapath in flowtable, from Florian. 11) Add selftest for the flowtable infrastructure, from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23hsr: fix don't prune the master node from the node_dbAndreas Oetken
Don't prune the master node in the hsr_prune_nodes function. Neither time_in[HSR_PT_SLAVE_A] nor time_in[HSR_PT_SLAVE_B] will ever be updated by hsr_register_frame_in for the master port. Thus, the master node will be repeatedly pruned leading to repeated packet loss. This bug never appeared because the hsr_prune_nodes function was only called once. Since commit 5150b45fd355 ("net: hsr: Fix node prune function for forget time expiry") this issue is fixed unveiling the issue described above. Fixes: 5150b45fd355 ("net: hsr: Fix node prune function for forget time expiry") Signed-off-by: Andreas Oetken <andreas.oetken@siemens.com> Tested-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23devlink: add warning in case driver does not set port typeJiri Pirko
Prevent misbehavior of drivers who would not set port type for longer period of time. Drivers should always set port type. Do WARN if that happens. Note that it is perfectly fine to temporarily not have the type set, during initialization and port type change. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4/igmp: fix build error if !CONFIG_IP_MULTICASTEric Dumazet
ip_sf_list_clear_all() needs to be defined even if !CONFIG_IP_MULTICAST Fixes: 3580d04aa674 ("ipv4/igmp: fix another memory leak in igmpv3_del_delrec()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4/igmp: fix another memory leak in igmpv3_del_delrec()Eric Dumazet
syzbot reported memory leaks [1] that I have back tracked to a missing cleanup from igmpv3_del_delrec() when (im->sfmode != MCAST_INCLUDE) Add ip_sf_list_clear_all() and kfree_pmc() helpers to explicitely handle the cleanups before freeing. [1] BUG: memory leak unreferenced object 0xffff888123e32b00 (size 64): comm "softirq", pid 0, jiffies 4294942968 (age 8.010s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 e0 00 00 01 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000006105011b>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline] [<000000006105011b>] slab_post_alloc_hook mm/slab.h:439 [inline] [<000000006105011b>] slab_alloc mm/slab.c:3326 [inline] [<000000006105011b>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<000000004bba8073>] kmalloc include/linux/slab.h:547 [inline] [<000000004bba8073>] kzalloc include/linux/slab.h:742 [inline] [<000000004bba8073>] ip_mc_add1_src net/ipv4/igmp.c:1961 [inline] [<000000004bba8073>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2085 [<00000000a46a65a0>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2475 [<000000005956ca89>] do_ip_setsockopt.isra.0+0x1795/0x1930 net/ipv4/ip_sockglue.c:957 [<00000000848e2d2f>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1246 [<00000000b9db185c>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616 [<000000003028e438>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3130 [<0000000015b65589>] __sys_setsockopt+0x98/0x120 net/socket.c:2078 [<00000000ac198ef0>] __do_sys_setsockopt net/socket.c:2089 [inline] [<00000000ac198ef0>] __se_sys_setsockopt net/socket.c:2086 [inline] [<00000000ac198ef0>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086 [<000000000a770437>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301 [<00000000d3adb93b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 9c8bb163ae78 ("igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22hv_sock: perf: loop in send() to maximize bandwidthSunil Muthuswamy
Currently, the hv_sock send() iterates once over the buffer, puts data into the VMBUS channel and returns. It doesn't maximize on the case when there is a simultaneous reader draining data from the channel. In such a case, the send() can maximize the bandwidth (and consequently minimize the cpu cycles) by iterating until the channel is found to be full. Perf data: Total Data Transfer: 10GB/iteration Single threaded reader/writer, Linux hvsocket writer with Windows hvsocket reader Packet size: 64KB CPU sys time was captured using the 'time' command for the writer to send 10GB of data. 'Send Buffer Loop' is with the patch applied. The values below are over 10 iterations. |--------------------------------------------------------| | | Current | Send Buffer Loop | |--------------------------------------------------------| | | Throughput | CPU sys | Throughput | CPU sys | | | (MB/s) | time (s) | (MB/s) | time (s) | |--------------------------------------------------------| | Min | 407 | 7.048 | 401 | 5.958 | |--------------------------------------------------------| | Max | 455 | 7.563 | 542 | 6.993 | |--------------------------------------------------------| | Avg | 440 | 7.411 | 451 | 6.639 | |--------------------------------------------------------| | Median | 446 | 7.417 | 447 | 6.761 | |--------------------------------------------------------| Observation: 1. The avg throughput doesn't really change much with this change for this scenario. This is most probably because the bottleneck on throughput is somewhere else. 2. The average system (or kernel) cpu time goes down by 10%+ with this change, for the same amount of data transfer. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22hv_sock: perf: Allow the socket buffer size options to influence the actual ↵Sunil Muthuswamy
socket buffers Currently, the hv_sock buffer size is static and can't scale to the bandwidth requirements of the application. This change allows the applications to influence the socket buffer sizes using the SO_SNDBUF and the SO_RCVBUF socket options. Few interesting points to note: 1. Since the VMBUS does not allow a resize operation of the ring size, the socket buffer size option should be set prior to establishing the connection for it to take effect. 2. Setting the socket option comes with the cost of that much memory being reserved/allocated by the kernel, for the lifetime of the connection. Perf data: Total Data Transfer: 1GB Single threaded reader/writer Results below are summarized over 10 iterations. Linux hvsocket writer + Windows hvsocket reader: |---------------------------------------------------------------------------------------------| |Packet size -> | 128B | 1KB | 4KB | 64KB | |---------------------------------------------------------------------------------------------| |SO_SNDBUF size | | Throughput in MB/s (min/max/avg/median): | | v | | |---------------------------------------------------------------------------------------------| | Default | 109/118/114/116 | 636/774/701/700 | 435/507/480/476 | 410/491/462/470 | | 16KB | 110/116/112/111 | 575/705/662/671 | 749/900/854/869 | 592/824/692/676 | | 32KB | 108/120/115/115 | 703/823/767/772 | 718/878/850/866 | 1593/2124/2000/2085 | | 64KB | 108/119/114/114 | 592/732/683/688 | 805/934/903/911 | 1784/1943/1862/1843 | |---------------------------------------------------------------------------------------------| Windows hvsocket writer + Linux hvsocket reader: |---------------------------------------------------------------------------------------------| |Packet size -> | 128B | 1KB | 4KB | 64KB | |---------------------------------------------------------------------------------------------| |SO_RCVBUF size | | Throughput in MB/s (min/max/avg/median): | | v | | |---------------------------------------------------------------------------------------------| | Default | 69/82/75/73 | 313/343/333/336 | 418/477/446/445 | 659/701/676/678 | | 16KB | 69/83/76/77 | 350/401/375/382 | 506/548/517/516 | 602/624/615/615 | | 32KB | 62/83/73/73 | 471/529/496/494 | 830/1046/935/939 | 944/1180/1070/1100 | | 64KB | 64/70/68/69 | 467/533/501/497 | 1260/1590/1430/1431 | 1605/1819/1670/1660 | |---------------------------------------------------------------------------------------------| Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv6: Fix redirect with VRFDavid Ahern
IPv6 redirect is broken for VRF. __ip6_route_redirect walks the FIB entries looking for an exact match on ifindex. With VRF the flowi6_oif is updated by l3mdev_update_flow to the l3mdev index and the FLOWI_FLAG_SKIP_NH_OIF set in the flags to tell the lookup to skip the device match. For redirects the device match is requires so use that flag to know when the oif needs to be reset to the skb device index. Fixes: ca254490c8df ("net: Add VRF support to IPv6 stack") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22neighbor: Add tracepoint to __neigh_createDavid Ahern
Add tracepoint to __neigh_create to enable debugging of new entries. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22net: Set strict_start_type for routes and rulesDavid Ahern
New userspace on an older kernel can send unknown and unsupported attributes resulting in an incompelete config which is almost always wrong for routing (few exceptions are passthrough settings like the protocol that installed the route). Set strict_start_type in the policies for IPv4 and IPv6 routes and rules to detect new, unsupported attributes and fail the route add. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4: Rename and export nh_update_mtuDavid Ahern
Rename nh_update_mtu to fib_nhc_update_mtu and export for use by the nexthop code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4: export fib_info_update_nh_saddrDavid Ahern
Add scope as input argument versus relying on fib_info reference in fib_nh, and export fib_info_update_nh_saddr. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4: export fib_flushDavid Ahern
As nexthops are deleted, fib entries referencing it are marked dead. Export fib_flush so those entries can be removed in a timely manner. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4: export fib_check_nhDavid Ahern
Change fib_check_nh to take net, table and scope as input arguments over struct fib_config and export for use by nexthop code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4: Add function to send route updatesDavid Ahern
Add fib_info_notify_update to walk the fib and send RTM_NEWROUTE notifications with NLM_F_REPLACE set for entries linked to a fib_info that have nh_updated flag set. This helper will be used by the nexthop code to notify userspace of routes that are impacted when a nexthop config is updated via replace. The new function and its helper are similar to how fib_flush and fib_table_flush work for address delete and link down events. This notification is needed for legacy apps that do not understand the new nexthop object. Apps that are nexthop aware can use the RTA_NH_ID attribute in the route notification to just ignore it. In the future this should be wrapped in a sysctl to allow OS'es that are fully updated to avoid the notificaton storm. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv6: export function to send route updatesDavid Ahern
Add fib6_rt_update to send RTM_NEWROUTE with NLM_F_REPLACE set. This helper will be used by the nexthop code to notify userspace of routes that are impacted when a nexthop config is updated via replace. This notification is needed for legacy apps that do not understand the new nexthop object. Apps that are nexthop aware can use the RTA_NH_ID attribute in the route notification to just ignore it. In the future this should be wrapped in a sysctl to allow OS'es that are fully updated to avoid the notificaton storm. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv6: Add hook to bump sernum for a route to stubsDavid Ahern
Add hook to ipv6 stub to bump the sernum up to the root node for a route. This is needed by the nexthop code when a nexthop config changes. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv6: Add delete route hook to stubsDavid Ahern
Add ip6_del_rt to the IPv6 stub. The hook is needed by the nexthop code to remove entries linked to a nexthop that is getting deleted. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22net/tls: don't ignore netdev notifications if no TLS featuresJakub Kicinski
On device surprise removal path (the notifier) we can't bail just because the features are disabled. They may have been enabled during the lifetime of the device. This bug leads to leaking netdev references and use-after-frees if there are active connections while device features are cleared. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22net/tls: fix state removal with feature flags offJakub Kicinski
TLS offload drivers shouldn't (and currently don't) block the TLS offload feature changes based on whether there are active offloaded connections or not. This seems to be a good idea, because we want the admin to be able to disable the TLS offload at any time, and there is no clean way of disabling it for active connections (TX side is quite problematic). So if features are cleared existing connections will stay offloaded until they close, and new connections will not attempt offload to a given device. However, the offload state removal handling is currently broken if feature flags get cleared while there are active TLS offloads. RX side will completely bail from cleanup, even on normal remove path, leaving device state dangling, potentially causing issues when the 5-tuple is reused. It will also fail to release the netdev reference. Remove the RX-side warning message, in next release cycle it should be printed when features are disabled, rather than when connection dies, but for that we need a more efficient method of finding connection of a given netdev (a'la BPF offload code). Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22net/tls: avoid NULL-deref on resync during device removalJakub Kicinski
When netdev with active kTLS sockets in unregistered notifier callback walks the offloaded sockets and cleans up offload state. RX data may still be processed, however, and if resync was requested prior to device removal we would hit a NULL pointer dereference on ctx->netdev use. Make sure resync is under the device offload lock and NULL-check the netdev pointer. This should be safe, because the pointer is set to NULL either in the netdev notifier (under said lock) or when socket is completely dead and no resync can happen. The other access to ctx->netdev in tls_validate_xmit_skb() does not dereference the pointer, it just checks it against other device pointer, so it should be pretty safe (perhaps we can add a READ_ONCE/WRITE_ONCE there, if paranoid). Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22Validate required parameters in inet6_validate_link_afMaxim Mikityanskiy
inet6_set_link_af requires that at least one of IFLA_INET6_TOKEN or IFLA_INET6_ADDR_GET_MODE is passed. If none of them is passed, it returns -EINVAL, which may cause do_setlink() to fail in the middle of processing other commands and give the following warning message: A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check. Check the presence of at least one of them in inet6_validate_link_af to detect invalid parameters at an early stage, before do_setlink does anything. Also validate the address generation mode at an early stage. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22net: Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.Felipe Gasper
This adds the ability for Netlink to report a socket's UID along with the other UNIX diagnostic information that is already available. This will allow diagnostic tools greater insight into which users control which socket. To test this, do the following as a non-root user: unshare -U -r bash nc -l -U user.socket.$$ & .. and verify from within that same session that Netlink UNIX socket diagnostics report the socket's UID as 0. Also verify that Netlink UNIX socket diagnostics report the socket's UID as the user's UID from an unprivileged process in a different session. Verify the same from a root process. Signed-off-by: Felipe Gasper <felipe@felipegasper.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Clear up some recent tipc regressions because of registration ordering. Fix from Junwei Hu. 2) tipc's TLV_SET() can read past the end of the supplied buffer during the copy. From Chris Packham. 3) ptp example program doesn't match the kernel, from Richard Cochran. 4) Outgoing message type fix in qrtr, from Bjorn Andersson. 5) Flow control regression in stmmac, from Tan Tee Min. 6) Fix inband autonegotiation in phylink, from Russell King. 7) Fix sk_bound_dev_if handling in rawv6_bind(), from Mike Manning. 8) Fix usbnet crash after disconnect, from Kloetzke Jan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits) usbnet: fix kernel crash after disconnect selftests: fib_rule_tests: use pre-defined DEV_ADDR net-next: net: Fix typos in ip-sysctl.txt ipv6: Consider sk_bound_dev_if when binding a raw socket to an address net: phylink: ensure inband AN works correctly usbnet: ipheth: fix racing condition net: stmmac: dma channel control register need to be init first net: stmmac: fix ethtool flow control not able to get/set net: qrtr: Fix message type of outgoing packets networking: : fix typos in code comments ptp: Fix example program to match kernel. fddi: fix typos in code comments selftests: fib_rule_tests: enable forwarding before ipv4 from/iif test selftests: fib_rule_tests: fix local IPv4 address typo tipc: Avoid copying bytes beyond the supplied data 2/2] net: xilinx_emaclite: use readx_poll_timeout() in mdio wait function 1/2] net: axienet: use readx_poll_timeout() in mdio wait function vlan: Mark expected switch fall-through macvlan: Mark expected switch fall-through net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query ...
2019-05-22netfilter: nft_flow_offload: IPCB is only valid for ipv4 familyFlorian Westphal
Guard this with a check vs. ipv4, IPCB isn't valid in ipv6 case. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-22netfilter: nft_flow_offload: don't offload when sequence numbers need adjustmentFlorian Westphal
We can't deal with tcp sequence number rewrite in flow_offload. While at it, simplify helper check, we only need to know if the extension is present, we don't need the helper data. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-22netfilter: nft_flow_offload: set liberal tracking mode for tcpFlorian Westphal
Without it, whenever a packet has to be pushed up the stack (e.g. because of mtu mismatch), then conntrack will flag packets as invalid, which in turn breaks NAT. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-22netfilter: nf_flow_table: ignore DF bit settingFlorian Westphal
Its irrelevant if the DF bit is set or not, we must pass packet to stack in either case. If the DF bit is set, we must pass it to stack so the appropriate ICMP error can be generated. If the DF is not set, we must pass it to stack for fragmentation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-21batman-adv: fix for leaked TVLV handler.Jeremy Sowden
A handler for BATADV_TVLV_ROAM was being registered when the translation-table was initialized, but not unregistered when the translation-table was freed. Unregister it. Fixes: 122edaa05940 ("batman-adv: tvlv - convert roaming adv packet to use tvlv unicast packets") Reported-by: syzbot+d454a826e670502484b8@syzkaller.appspotmail.com Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Sven Eckelmann <sven@narfation.org Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2019-05-21batman-adv: Start new development cycleSimon Wunderlich
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2019-05-21ipv6: Consider sk_bound_dev_if when binding a raw socket to an addressMike Manning
IPv6 does not consider if the socket is bound to a device when binding to an address. The result is that a socket can be bound to eth0 and then bound to the address of eth1. If the device is a VRF, the result is that a socket can only be bound to an address in the default VRF. Resolve by considering the device if sk_bound_dev_if is set. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-21Merge tag 'spdx-5.2-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull SPDX update from Greg KH: "Here is a series of patches that add SPDX tags to different kernel files, based on two different things: - SPDX entries are added to a bunch of files that we missed a year ago that do not have any license information at all. These were either missed because the tool saw the MODULE_LICENSE() tag, or some EXPORT_SYMBOL tags, and got confused and thought the file had a real license, or the files have been added since the last big sweep, or they were Makefile/Kconfig files, which we didn't touch last time. - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan tools can determine the license text in the file itself. Where this happens, the license text is removed, in order to cut down on the 700+ different ways we have in the kernel today, in a quest to get rid of all of these. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers. The reason for these "large" patches is if we were to continue to progress at the current rate of change in the kernel, adding license tags to individual files in different subsystems, we would be finished in about 10 years at the earliest. There will be more series of these types of patches coming over the next few weeks as the tools and reviewers crunch through the more "odd" variants of how to say "GPLv2" that developers have come up with over the years, combined with other fun oddities (GPL + a BSD disclaimer?) that are being unearthed, with the goal for the whole kernel to be cleaned up. These diffstats are not small, 3840 files are touched, over 10k lines removed in just 24 patches" * tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3 ...