summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-12-20cls_bpf: fix offload assumptions after callback conversionJakub Kicinski
cls_bpf used to take care of tracking what offload state a filter is in, i.e. it would track if offload request succeeded or not. This information would then be used to issue correct requests to the driver, e.g. requests for statistics only on offloaded filters, removing only filters which were offloaded, using add instead of replace if previous filter was not added etc. This tracking of offload state no longer functions with the new callback infrastructure. There could be multiple entities trying to offload the same filter. Throw out all the tracking and corresponding commands and simply pass to the drivers both old and new bpf program. Drivers will have to deal with offload state tracking by themselves. Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20bridge: Use helpers to handle MAC addressAndy Shevchenko
Use %pM to print MAC mac_pton() to convert it from ASCII to binary format, and ether_addr_copy() to copy. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: Fix double free and memory corruption in get_net_ns_by_id()Eric W. Biederman
(I can trivially verify that that idr_remove in cleanup_net happens after the network namespace count has dropped to zero --EWB) Function get_net_ns_by_id() does not check for net::count after it has found a peer in netns_ids idr. It may dereference a peer, after its count has already been finaly decremented. This leads to double free and memory corruption: put_net(peer) rtnl_lock() atomic_dec_and_test(&peer->count) [count=0] ... __put_net(peer) get_net_ns_by_id(net, id) spin_lock(&cleanup_list_lock) list_add(&net->cleanup_list, &cleanup_list) spin_unlock(&cleanup_list_lock) queue_work() peer = idr_find(&net->netns_ids, id) | get_net(peer) [count=1] | ... | (use after final put) v ... cleanup_net() ... spin_lock(&cleanup_list_lock) ... list_replace_init(&cleanup_list, ..) ... spin_unlock(&cleanup_list_lock) ... ... ... ... put_net(peer) ... atomic_dec_and_test(&peer->count) [count=0] ... spin_lock(&cleanup_list_lock) ... list_add(&net->cleanup_list, &cleanup_list) ... spin_unlock(&cleanup_list_lock) ... queue_work() ... rtnl_unlock() rtnl_lock() ... for_each_net(tmp) { ... id = __peernet2id(tmp, peer) ... spin_lock_irq(&tmp->nsid_lock) ... idr_remove(&tmp->netns_ids, id) ... ... ... net_drop_ns() ... net_free(peer) ... } ... | v cleanup_net() ... (Second free of peer) Also, put_net() on the right cpu may reorder with left's cpu list_replace_init(&cleanup_list, ..), and then cleanup_list will be corrupted. Since cleanup_net() is executed in worker thread, while put_net(peer) can happen everywhere, there should be enough time for concurrent get_net_ns_by_id() to pick the peer up, and the race does not seem to be unlikely. The patch fixes the problem in standard way. (Also, there is possible problem in peernet2id_alloc(), which requires check for net::count under nsid_lock and maybe_get_net(peer), but in current stable kernel it's used under rtnl_lock() and it has to be safe. Openswitch begun to use peernet2id_alloc(), and possibly it should be fixed too. While this is not in stable kernel yet, so I'll send a separate message to netdev@ later). Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Fixes: 0c7aecd4bde4 "netns: add rtnl cmd to add and get peer netns ids" Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20ip6_vti: adjust vti mtu according to mtu of lower deviceAlexey Kodanev
LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams over ip6_vti that require fragmentation and the underlying device has an MTU smaller than 1500 plus some extra space for headers. This happens because ip6_vti, by default, sets MTU to ETH_DATA_LEN and not updating it depending on a destination address or link parameter. Further attempts to send UDP packets may succeed because pmtu gets updated on ICMPV6_PKT_TOOBIG in vti6_err(). In case the lower device has larger MTU size, e.g. 9000, ip6_vti works but not using the possible maximum size, output packets have 1500 limit. The above cases require manual MTU setup after ip6_vti creation. However ip_vti already updates MTU based on lower device with ip_tunnel_bind_dev(). Here is the example when the lower device MTU is set to 9000: # ip a sh ltp_ns_veth2 ltp_ns_veth2@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 ... inet 10.0.0.2/24 scope global ltp_ns_veth2 inet6 fd00::2/64 scope global # ip li add vti6 type vti6 local fd00::2 remote fd00::1 # ip li show vti6 vti6@NONE: <POINTOPOINT,NOARP> mtu 1500 ... link/tunnel6 fd00::2 peer fd00::1 After the patch: # ip li add vti6 type vti6 local fd00::2 remote fd00::1 # ip li show vti6 vti6@NONE: <POINTOPOINT,NOARP> mtu 8832 ... link/tunnel6 fd00::2 peer fd00::1 Reported-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20esp: Don't require synchronous crypto fallback on offloading anymore.Steffen Klassert
We support asynchronous crypto on layer 2 ESP now. So no need to force synchronous crypto fallback on offloading anymore. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-20xfrm: Allow to use the layer2 IPsec GSO codepath for software crypto.Steffen Klassert
We now have support for asynchronous crypto operations in the layer 2 TX path. This was the missing part to allow the GSO codepath for software crypto, so allow this codepath now. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-20net: Add asynchronous callbacks for xfrm on layer 2.Steffen Klassert
This patch implements asynchronous crypto callbacks and a backlog handler that can be used when IPsec is done at layer 2 in the TX path. It also extends the skb validate functions so that we can update the driver transmit return codes based on async crypto operation or to indicate that we queued the packet in a backlog queue. Joint work with: Aviv Heller <avivh@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-20xfrm: Separate ESP handling from segmentation for GRO packets.Steffen Klassert
We change the ESP GSO handlers to only segment the packets. The ESP handling and encryption is defered to validate_xmit_xfrm() where this is done for non GRO packets too. This makes the code more robust and prepares for asynchronous crypto handling. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-19ipv4: fib: Fix metrics match when deleting a routePhil Sutter
The recently added fib_metrics_match() causes a regression for routes with both RTAX_FEATURES and RTAX_CC_ALGO if the latter has TCP_CONG_NEEDS_ECN flag set: | # ip link add d0 type dummy | # ip link set d0 up | # ip route add 172.29.29.0/24 dev d0 features ecn congctl dctcp | # ip route del 172.29.29.0/24 dev d0 features ecn congctl dctcp | RTNETLINK answers: No such process During route insertion, fib_convert_metrics() detects that the given CC algo requires ECN and hence sets DST_FEATURE_ECN_CA bit in RTAX_FEATURES. During route deletion though, fib_metrics_match() compares stored RTAX_FEATURES value with that from userspace (which obviously has no knowledge about DST_FEATURE_ECN_CA) and fails. Fixes: 5f9ae3d9e7e4a ("ipv4: do metrics match when looking up and deleting a route") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19net_sched: properly check for empty skb array on error pathCong Wang
First, the check of &q->ring.queue against NULL is wrong, it is always false. We should check the value rather than the address. Secondly, we need the same check in pfifo_fast_reset() too, as both ->reset() and ->destroy() are called in qdisc_destroy(). Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Reported-by: syzbot <syzkaller@googlegroups.com> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19tipc: fix list sorting bug in function tipc_group_update_member()Jon Maloy
When, during a join operation, or during message transmission, a group member needs to be added to the group's 'congested' list, we sort it into the list in ascending order, according to its current advertised window size. However, we miss the case when the member is already on that list. This will have the result that the member, after the window size has been decremented, might be at the wrong position in that list. This again may have the effect that we during broadcast and multicast transmissions miss the fact that a destination is not yet ready for reception, and we end up sending anyway. From this point on, the behavior during the remaining session is unpredictable, e.g., with underflowing window sizes. We now correct this bug by unconditionally removing the member from the list before (re-)sorting it in. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-12-18 Here's the first bluetooth-next pull request for the 4.16 kernel. - hci_ll: multiple cleanups & fixes - Remove Gustavo Padovan from the MAINTAINERS file - Support BLE Adversing while connected (if the controller can do it) - DT updates for TI chips - Various other smaller cleanups & fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19ip6_tunnel: get the min mtu properly in ip6_tnl_xmitXin Long
Now it's using IPV6_MIN_MTU as the min mtu in ip6_tnl_xmit, but IPV6_MIN_MTU actually only works when the inner packet is ipv6. With IPV6_MIN_MTU for ipv4 packets, the new pmtu for inner dst couldn't be set less than 1280. It would cause tx_err and the packet to be dropped when the outer dst pmtu is close to 1280. Jianlin found it by running ipv4 traffic with the topo: (client) gre6 <---> eth1 (route) eth2 <---> gre6 (server) After changing eth2 mtu to 1300, the performance became very low, or the connection was even broken. The issue also affects ip4ip6 and ip6ip6 tunnels. So if the inner packet is ipv4, 576 should be considered as the min mtu. Note that for ip4ip6 and ip6ip6 tunnels, the inner packet can only be ipv4 or ipv6, but for gre6 tunnel, it may also be ARP. This patch using 576 as the min mtu for non-ipv6 packet works for all those cases. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19ip6_gre: remove the incorrect mtu limit for ipgre tapXin Long
The same fix as the patch "ip_gre: remove the incorrect mtu limit for ipgre tap" is also needed for ip6_gre. Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19ip_gre: remove the incorrect mtu limit for ipgre tapXin Long
ipgre tap driver calls ether_setup(), after commit 61e84623ace3 ("net: centralize net_device min/max MTU checking"), the range of mtu is [min_mtu, max_mtu], which is [68, 1500] by default. It causes the dev mtu of the ipgre tap device to not be greater than 1500, this limit value is not correct for ipgre tap device. Besides, it's .change_mtu already does the right check. So this patch is just to set max_mtu as 0, and leave the check to it's .change_mtu. Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19net: Disable GRO_HW when generic XDP is installed on a device.Michael Chan
Hardware should not aggregate any packets when generic XDP is installed. Cc: Ariel Elior <Ariel.Elior@cavium.com> Cc: everest-linux-l2@cavium.com Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19net: Introduce NETIF_F_GRO_HW.Michael Chan
Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware GRO. With this flag, we can now independently turn on or off hardware GRO when GRO is on. Previously, drivers were using NETIF_F_GRO to control hardware GRO and so it cannot be independently turned on or off without affecting GRO. Hardware GRO (just like GRO) guarantees that packets can be re-segmented by TSO/GSO to reconstruct the original packet stream. Logically, GRO_HW should depend on GRO since it a subset, but we will let individual drivers enforce this dependency as they see fit. Since NETIF_F_GRO is not propagated between upper and lower devices, NETIF_F_GRO_HW should follow suit since it is a subset of GRO. In other words, a lower device can independent have GRO/GRO_HW enabled or disabled and no feature propagation is required. This will preserve the current GRO behavior. This can be changed later if we decide to propagate GRO/ GRO_HW/RXCSUM from upper to lower devices. Cc: Ariel Elior <Ariel.Elior@cavium.com> Cc: everest-linux-l2@cavium.com Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19sock: Move the socket inuse to namespace.Tonghao Zhang
In some case, we want to know how many sockets are in use in different _net_ namespaces. It's a key resource metric. This patch add a member in struct netns_core. This is a counter for socket-inuse in the _net_ namespace. The patch will add/sub counter in the sk_alloc, sk_clone_lock and __sk_free. This patch will not counter the socket created in kernel. It's not very useful for userspace to know how many kernel sockets we created. The main reasons for doing this are that: 1. When linux calls the 'do_exit' for process to exit, the functions 'exit_task_namespaces' and 'exit_task_work' will be called sequentially. 'exit_task_namespaces' may have destroyed the _net_ namespace, but 'sock_release' called in 'exit_task_work' may use the _net_ namespace if we counter the socket-inuse in sock_release. 2. socket and sock are in pair. More important, sock holds the _net_ namespace. We counter the socket-inuse in sock, for avoiding holding _net_ namespace again in socket. It's a easy way to maintain the code. Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com> Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19sock: Change the netns_core member name.Tonghao Zhang
Change the member name will make the code more readable. This patch will be used in next patch. Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com> Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19Merge tag 'mac80211-for-davem-2017-12-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A few more fixes: * hwsim: - set To-DS bit in some frames missing it - fix sleeping in atomic * nl80211: - doc cleanup - fix locking in an error path * build: - don't append to created certs C files - ship certificate pre-hexdumped ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19cfg80211: Scan results to also report the per chain signal strengthSunil Dutt
This commit enhances the scan results to report the per chain signal strength based on the latest BSS update. This provides similar information to what is already available through STA information. Signed-off-by: Sunil Dutt <usdutt@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-19nl80211: send deauth reason if locally generatedDavid Spinadel
Send disconnection reason code to user space even if it's locally generated, since some tests that check reason code may fail because of the current behavior. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-19Revert "mac80211: Add TXQ scheduling API"Johannes Berg
This reverts commit e937b8da5a591f141fe41aa48a2e898df9888c95. Turns out that a new driver (mt76) is coming in through Kalle's tree, and will conflict with this. It also has some conflicting requirements, so we'll revisit this later. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-19Revert "mac80211: Add airtime account and scheduling to TXQs"Johannes Berg
This reverts commit b0d52ad821843a6c5badebd80feef9f871904fa6. We need to revert the TXQ scheduling API due to conflicts with a new driver, and this depends on that API. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-19cfg80211: ship certificates as hex filesJohannes Berg
Not only does this remove the need for the hexdump code in most normal kernel builds (still there for the extra directory), but it also removes the need to ship binary files, which apparently is somewhat problematic, as Randy reported. While at it, also add the generated files to clean-files. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-19cfg80211: always rewrite generated files from scratchThierry Reding
Currently the certs C code generation appends to the generated files, which is most likely a leftover from commit 715a12334764 ("wireless: don't write C files on failures"). This causes duplicate code in the generated files if the certificates have their timestamps modified between builds and thereby trigger the generation rules. Fixes: 715a12334764 ("wireless: don't write C files on failures") Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-19xfrm: Reinject transport-mode packets through taskletHerbert Xu
This is an old bugbear of mine: https://www.mail-archive.com/netdev@vger.kernel.org/msg03894.html By crafting special packets, it is possible to cause recursion in our kernel when processing transport-mode packets at levels that are only limited by packet size. The easiest one is with DNAT, but an even worse one is where UDP encapsulation is used in which case you just have to insert an UDP encapsulation header in between each level of recursion. This patch avoids this problem by reinjecting tranport-mode packets through a tasklet. Fixes: b05e106698d9 ("[IPV4/6]: Netfilter IPsec input hooks") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-19bpf: make function xdp_do_generic_redirect_map() staticXiongwei Song
The function xdp_do_generic_redirect_map() is only used in this file, so make it static. Clean up sparse warning: net/core/filter.c:2687:5: warning: no previous prototype for 'xdp_do_generic_redirect_map' [-Wmissing-prototypes] Signed-off-by: Xiongwei Song <sxwjean@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-18net: erspan: reload pointer after pskb_may_pullWilliam Tu
pskb_may_pull() can change skb->data, so we need to re-load pkt_md and ershdr at the right place. Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support") Fixes: f551c91de262 ("net: erspan: introduce erspan v2 for ip_gre") Signed-off-by: William Tu <u9012063@gmail.com> Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18net: erspan: fix wrong return valueWilliam Tu
If pskb_may_pull return failed, return PACKET_REJECT instead of -ENOMEM. Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support") Fixes: f551c91de262 ("net: erspan: introduce erspan v2 for ip_gre") Signed-off-by: William Tu <u9012063@gmail.com> Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18net/ncsi: Don't take any action on HNCDSC AENSamuel Mendoza-Jonas
The current HNCDSC handler takes the status flag from the AEN packet and will update or change the current channel based on this flag and the current channel status. However the flag from the HNCDSC packet merely represents the host link state. While the state of the host interface is potentially interesting information it should not affect the state of the NCSI link. Indeed the NCSI specification makes no mention of any recommended action related to the host network controller driver state. Update the HNCDSC handler to record the host network driver status but take no other action. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaksNikolay Aleksandrov
The early call to br_stp_change_bridge_id in bridge's newlink can cause a memory leak if an error occurs during the newlink because the fdb entries are not cleaned up if a different lladdr was specified, also another minor issue is that it generates fdb notifications with ifindex = 0. Another unrelated memory leak is the bridge sysfs entries which get added on NETDEV_REGISTER event, but are not cleaned up in the newlink error path. To remove this special case the call to br_stp_change_bridge_id is done after netdev register and we cleanup the bridge on changelink error via br_dev_delete to plug all leaks. This patch makes netlink bridge destruction on newlink error the same as dellink and ioctl del which is necessary since at that point we have a fully initialized bridge device. To reproduce the issue: $ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1 RTNETLINK answers: Invalid argument $ rmmod bridge [ 1822.142525] ============================================================================= [ 1822.143640] BUG bridge_fdb_cache (Tainted: G O ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown() [ 1822.144821] ----------------------------------------------------------------------------- [ 1822.145990] Disabling lock debugging due to kernel taint [ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100 [ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G B O 4.15.0-rc2+ #87 [ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1822.150008] Call Trace: [ 1822.150510] dump_stack+0x78/0xa9 [ 1822.151156] slab_err+0xb1/0xd3 [ 1822.151834] ? __kmalloc+0x1bb/0x1ce [ 1822.152546] __kmem_cache_shutdown+0x151/0x28b [ 1822.153395] shutdown_cache+0x13/0x144 [ 1822.154126] kmem_cache_destroy+0x1c0/0x1fb [ 1822.154669] SyS_delete_module+0x194/0x244 [ 1822.155199] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 1822.155773] entry_SYSCALL_64_fastpath+0x23/0x9a [ 1822.156343] RIP: 0033:0x7f929bd38b17 [ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0 [ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17 [ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0 [ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11 [ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80 [ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090 [ 1822.161278] INFO: Object 0x000000007645de29 @offset=0 [ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128 Fixes: 30313a3d5794 ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device") Fixes: 5b8d5429daa0 ("bridge: netlink: register netdevice before executing changelink") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18sctp: add SCTP_CID_RECONF conversion in sctp_cnameXin Long
Whenever a new type of chunk is added, the corresp conversion in sctp_cname should be added. Otherwise, in some places, pr_debug will print it as "unknown chunk". Fixes: cc16f00f6529 ("sctp: add support for generating stream reconf ssn reset request chunk") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18sctp: fix the issue that a __u16 variable may overflow in sctp_ulpq_renegeXin Long
Now when reneging events in sctp_ulpq_renege(), the variable freed could be increased by a __u16 value twice while freed is of __u16 type. It means freed may overflow at the second addition. This patch is to fix it by using __u32 type for 'freed', while at it, also to remove 'if (chunk)' check, as all renege commands are generated in sctp_eat_data and it can't be NULL. Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18tipc: remove leaving group member from all listsJon Maloy
A group member going into state LEAVING should never go back to any other state before it is finally deleted. However, this might happen if the socket needs to send out a RECLAIM message during this interval. Since we forget to remove the leaving member from the group's 'active' or 'pending' list, the member might be selected for reclaiming, change state to RECLAIMING, and get stuck in this state instead of being deleted. This might lead to suppression of the expected 'member down' event to the receiver. We fix this by removing the member from all lists, except the RB tree, at the moment it goes into state LEAVING. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18tipc: fix lost member events bugJon Maloy
Group messages are not supposed to be returned to sender when the destination socket disappears. This is done correctly for regular traffic messages, by setting the 'dest_droppable' bit in the header. But we forget to do that in group protocol messages. This has the effect that such messages may sometimes bounce back to the sender, be perceived as a legitimate peer message, and wreak general havoc for the rest of the session. In particular, we have seen that a member in state LEAVING may go back to state RECLAIMED or REMITTED, hence causing suppression of an otherwise expected 'member down' event to the user. We fix this by setting the 'dest_droppable' bit even in group protocol messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2017-12-17 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a corner case in generic XDP where we have non-linear skbs but enough tailroom in the skb to not miss to linearizing there, from Song. 2) Fix BPF JIT bugs in s390x and ppc64 to not recache skb data when BPF context is not skb, from Daniel. 3) Fix a BPF JIT bug in sparc64 where recaching skb data after helper call would use the wrong register for the skb, from Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18Merge 4.15-rc4 into staging-nextGreg Kroah-Hartman
We want the staging fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16ipv6: icmp6: Allow icmp messages to be looped backBrendan McGrath
One example of when an ICMPv6 packet is required to be looped back is when a host acts as both a Multicast Listener and a Multicast Router. A Multicast Router will listen on address ff02::16 for MLDv2 messages. Currently, MLDv2 messages originating from a Multicast Listener running on the same host as the Multicast Router are not being delivered to the Multicast Router. This is due to dst.input being assigned the default value of dst_discard. This results in the packet being looped back but discarded before being delivered to the Multicast Router. This patch sets dst.input to ip6_input to ensure a looped back packet is delivered to the Multicast Router. Signed-off-by: Brendan McGrath <redmcg@redmandi.dyndns.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three sets of overlapping changes, two in the packet scheduler and one in the meson-gxl PHY driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-16Merge tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "This has two stable bugfixes, one to fix a BUG_ON() when nfs_commit_inode() is called with no outstanding commit requests and another to fix a race in the SUNRPC receive codepath. Additionally, there are also fixes for an NFS client deadlock and an xprtrdma performance regression. Summary: Stable bugfixes: - NFS: Avoid a BUG_ON() in nfs_commit_inode() by not waiting for a commit in the case that there were no commit requests. - SUNRPC: Fix a race in the receive code path Other fixes: - NFS: Fix a deadlock in nfs client initialization - xprtrdma: Fix a performance regression for small IOs" * tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Fix a race in the receive code path nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests xprtrdma: Spread reply processing over more CPUs nfs: fix a deadlock in nfs client initialization
2017-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Clamp timeouts to INT_MAX in conntrack, from Jay Elliot. 2) Fix broken UAPI for BPF_PROG_TYPE_PERF_EVENT, from Hendrik Brueckner. 3) Fix locking in ieee80211_sta_tear_down_BA_sessions, from Johannes Berg. 4) Add missing barriers to ptr_ring, from Michael S. Tsirkin. 5) Don't advertise gigabit in sh_eth when not available, from Thomas Petazzoni. 6) Check network namespace when delivering to netlink taps, from Kevin Cernekee. 7) Kill a race in raw_sendmsg(), from Mohamed Ghannam. 8) Use correct address in TCP md5 lookups when replying to an incoming segment, from Christoph Paasch. 9) Add schedule points to BPF map alloc/free, from Eric Dumazet. 10) Don't allow silly mtu values to be used in ipv4/ipv6 multicast, also from Eric Dumazet. 11) Fix SKB leak in tipc, from Jon Maloy. 12) Disable MAC learning on OVS ports of mlxsw, from Yuval Mintz. 13) SKB leak fix in skB_complete_tx_timestamp(), from Willem de Bruijn. 14) Add some new qmi_wwan device IDs, from Daniele Palmas. 15) Fix static key imbalance in ingress qdisc, from Jiri Pirko. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) net: qcom/emac: Reduce timeout for mdio read/write net: sched: fix static key imbalance in case of ingress/clsact_init error net: sched: fix clsact init error path ip_gre: fix wrong return value of erspan_rcv net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support pkt_sched: Remove TC_RED_OFFLOADED from uapi net: sched: Move to new offload indication in RED net: sched: Add TCA_HW_OFFLOAD net: aquantia: Increment driver version net: aquantia: Fix typo in ethtool statistics names net: aquantia: Update hw counters on hw init net: aquantia: Improve link state and statistics check interval callback net: aquantia: Fill in multicast counter in ndev stats from hardware net: aquantia: Fill ndev stat couters from hardware net: aquantia: Extend stat counters to 64bit values net: aquantia: Fix hardware DMA stream overload on large MRRS net: aquantia: Fix actual speed capabilities reporting sock: free skb in skb_complete_tx_timestamp on error s390/qeth: update takeover IPs after configuration change s390/qeth: lock IP table while applying takeover changes ...
2017-12-15net: sched: fix static key imbalance in case of ingress/clsact_init errorJiri Pirko
Move static key increments to the beginning of the init function so they pair 1:1 with decrements in ingress/clsact_destroy, which is called in case ingress/clsact_init fails. Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: sched: fix clsact init error pathJiri Pirko
Since in qdisc_create, the destroy op is called when init fails, we don't do cleanup in init and leave it up to destroy. This fixes use-after-free when trying to put already freed block. Fixes: 6e40cf2d4dee ("net: sched: use extended variants of block_get/put in ingress and clsact qdiscs") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15SUNRPC: Fix a race in the receive code pathTrond Myklebust
We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot race with the call to xprt_complete_rqst(). Reported-by: Chuck Lever <chuck.lever@oracle.com> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..") Cc: stable@vger.kernel.org # 4.14+ Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-15xprtrdma: Spread reply processing over more CPUsChuck Lever
Commit d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler directly from RECV completion") introduced a performance regression for NFS I/O small enough to not need memory registration. In multi- threaded benchmarks that generate primarily small I/O requests, IOPS throughput is reduced by nearly a third. This patch restores the previous level of throughput. Because workqueues are typically BOUND (in particular ib_comp_wq, nfsiod_workqueue, and rpciod_workqueue), NFS/RDMA workloads tend to aggregate on the CPU that is handling Receive completions. The usual approach to addressing this problem is to create a QP and CQ for each CPU, and then schedule transactions on the QP for the CPU where you want the transaction to complete. The transaction then does not require an extra context switch during completion to end up on the same CPU where the transaction was started. This approach doesn't work for the Linux NFS/RDMA client because currently the Linux NFS client does not support multiple connections per client-server pair, and the RDMA core API does not make it straightforward for ULPs to determine which CPU is responsible for handling Receive completions for a CQ. So for the moment, record the CPU number in the rpcrdma_req before the transport sends each RPC Call. Then during Receive completion, queue the RPC completion on that same CPU. Additionally, move all RPC completion processing to the deferred handler so that even RPCs with simple small replies complete on the CPU that sent the corresponding RPC Call. Fixes: d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-15ip_gre: fix wrong return value of erspan_rcvHaishuang Yan
If pskb_may_pull return failed, return PACKET_REJECT instead of -ENOMEM. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sctp: support sysctl to allow users to use stream interleaveXin Long
This is the last patch for support of stream interleave, after this patch, users could enable stream interleave by systcl -w net.sctp.intl_enable=1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sctp: update mid instead of ssn when doing stream and asoc resetXin Long
When using idata and doing stream and asoc reset, setting ssn with 0 could only clear the 1st 16 bits of mid. So to make this work for both data and idata, it sets mid with 0 instead of ssn, and also mid_uo for unordered idata also need to be cleared, as said in section 2.3.2 of RFC8260. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sctp: add stream interleave support in stream schedulerXin Long
As Marcelo said in the stream scheduler patch: Support for I-DATA chunks, also described in RFC8260, with user message interleaving is straightforward as it just requires the schedulers to probe for the feature and ignore datamsg boundaries when dequeueing. All needs to do is just to ignore datamsg boundaries when dequeueing. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>