summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-10-18inet: frags: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@osg.samsung.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Florian Westphal <fw@strlen.de> Cc: linux-wpan@vger.kernel.org Cc: netdev@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Stefan Schmidt <stefan@osg.samsung.com> # for ieee802154 Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18inet/connection_sock: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Cc: dccp@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18netfilter: ipset: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. This introduces a pointer back to the struct ip_set, which is used instead of the struct timer_list .data field. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: simran singhal <singhalsimran0@gmail.com> Cc: Muhammad Falak R Wani <falakreyaz@gmail.com> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18net: sched: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Add pointer back to Qdisc. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18net: can: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-can@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18xfrm: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() helper to pass the timer pointer explicitly. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18net/rose: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-hams@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18net/lapb: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: Hans Liljestrand <ishkamiel@gmail.com> Cc: "Reshetova, Elena" <elena.reshetova@intel.com> Cc: linux-x25@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18net/decnet: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "David S. Miller" <davem@davemloft.net> Cc: Johannes Berg <johannes.berg@intel.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: linux-decnet-user@lists.sourceforge.net Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18net: dsa: split dsa_port's netdev memberVivien Didelot
The dsa_port structure has a "netdev" member, which can be used for either the master device, or the slave device, depending on its type. It is true that today, CPU port are not exposed to userspace, thus the port's netdev member can be used to point to its master interface. But it is still slightly confusing, so split it into more explicit "master" and "slave" members inside an anonymous union. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18net: dsa: rename dsa_master_get_slaveVivien Didelot
The dsa_master_get_slave is slightly confusing since the idiomatic "get" term often suggests reference counting, in symmetry to "put". Rename it to dsa_master_find_slave to make the look up operation clear. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18net: dsa: add slave to master helperVivien Didelot
Many part of the DSA slave code require to get the master device assigned to a slave device. Remove dsa_master_netdev() in favor of a dsa_slave_to_master() helper which does that. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18net: dsa: add slave to port helperVivien Didelot
Many portions of DSA core code require to get the dsa_port structure corresponding to a slave net_device. For this purpose, introduce a dsa_slave_to_port() helper. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18net: dsa: add slave notify helperVivien Didelot
Both DSA slave create and destroy functions call call_dsa_notifiers with respectively DSA_PORT_REGISTER and DSA_PORT_UNREGISTER and the same dsa_notifier_register_info structure. Wrap this in a dsa_slave_notify helper so prevent cluttering these functions. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18net: dsa: use port's cpu_dp when creating a slaveVivien Didelot
When dsa_slave_create is called, the related port already has a CPU port assigned to it, available in its cpu_dp member. Use it instead of the unique tree cpu_dp. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18netlink: fix netlink_ack() extack raceJohannes Berg
It seems that it's possible to toggle NETLINK_F_EXT_ACK through setsockopt() while another thread/CPU is building a message inside netlink_ack(), which could then trigger the WARN_ON()s I added since if it goes from being turned off to being turned on between allocating and filling the message, the skb could end up being too small. Avoid this whole situation by storing the value of this flag in a separate variable and using that throughout the function instead. Fixes: 2d4bc93368f5 ("netlink: extended ACK reporting") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18netlink: use NETLINK_CB(in_skb).sk instead of looking it upJohannes Berg
When netlink_ack() reports an allocation error to the sending socket, there's no need to look up the sending socket since it's available in the SKB's CB. Use that instead of going to the trouble of looking it up. Note that the pointer is only available since Eric Biederman's commit 3fbc290540a1 ("netlink: Make the sending netlink socket availabe in NETLINK_CB") which is far newer than the original lookup code (Oct 2003) (though the field was called 'ssk' in that commit and only got renamed to 'sk' later, I'd actually argue 'ssk' was better - or perhaps it should've been 'source_sk' - since there are so many different 'sk's involved.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18bpf: cpumap xdp_buff to skb conversion and allocationJesper Dangaard Brouer
This patch makes cpumap functional, by adding SKB allocation and invoking the network stack on the dequeuing CPU. For constructing the SKB on the remote CPU, the xdp_buff in converted into a struct xdp_pkt, and it mapped into the top headroom of the packet, to avoid allocating separate mem. For now, struct xdp_pkt is just a cpumap internal data structure, with info carried between enqueue to dequeue. If a driver doesn't have enough headroom it is simply dropped, with return code -EOVERFLOW. This will be picked up the xdp tracepoint infrastructure, to allow users to catch this. V2: take into account xdp->data_meta V4: - Drop busypoll tricks, keeping it more simple. - Skip RPS and Generic-XDP-recursive-reinjection, suggested by Alexei V5: correct RCU read protection around __netif_receive_skb_core. V6: Setting TASK_RUNNING vs TASK_INTERRUPTIBLE based on talk with Rik van Riel Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18bpf: XDP_REDIRECT enable use of cpumapJesper Dangaard Brouer
This patch connects cpumap to the xdp_do_redirect_map infrastructure. Still no SKB allocation are done yet. The XDP frames are transferred to the other CPU, but they are simply refcnt decremented on the remote CPU. This served as a good benchmark for measuring the overhead of remote refcnt decrement. If driver page recycle cache is not efficient then this, exposes a bottleneck in the page allocator. A shout-out to MST's ptr_ring, which is the secret behind is being so efficient to transfer memory pointers between CPUs, without constantly bouncing cache-lines between CPUs. V3: Handle !CONFIG_BPF_SYSCALL pointed out by kbuild test robot. V4: Make Generic-XDP aware of cpumap type, but don't allow redirect yet, as implementation require a separate upstream discussion. V5: - Fix a maybe-uninitialized pointed out by kbuild test robot. - Restrict bpf-prog side access to cpumap, open when use-cases appear - Implement cpu_map_enqueue() as a more simple void pointer enqueue V6: - Allow cpumap type for usage in helper bpf_redirect_map, general bpf-prog side restriction moved to earlier patch. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signalsDavid Howells
Make AF_RXRPC accept MSG_WAITALL as a flag to sendmsg() to tell it to ignore signals whilst loading up the message queue, provided progress is being made in emptying the queue at the other side. Progress is defined as the base of the transmit window having being advanced within 2 RTT periods. If the period is exceeded with no progress, sendmsg() will return anyway, indicating how much data has been copied, if any. Once the supplied buffer is entirely decanted, the sendmsg() will return. Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18rxrpc: Provide functions for allowing cleaner handling of signalsDavid Howells
Provide a couple of functions to allow cleaner handling of signals in a kernel service. They are: (1) rxrpc_kernel_get_rtt() This allows the kernel service to find out the RTT time for a call, so as to better judge how large a timeout to employ. Note, though, that whilst this returns a value in nanoseconds, the timeouts can only actually be in jiffies. (2) rxrpc_kernel_check_life() This returns a number that is updated when ACKs are received from the peer (notably including PING RESPONSE ACKs which we can elicit by sending PING ACKs to see if the call still exists on the server). The caller should compare the numbers of two calls to see if the call is still alive. These can be used to provide an extending timeout rather than returning immediately in the case that a signal occurs that would otherwise abort an RPC operation. The timeout would be extended if the server is still responsive and the call is still apparently alive on the server. For most operations this isn't that necessary - but for FS.StoreData it is: OpenAFS writes the data to storage as it comes in without making a backup, so if we immediately abort it when partially complete on a CTRL+C, say, we have no idea of the state of the file after the abort. Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18rxrpc: Support service upgrade from a kernel serviceDavid Howells
Provide support for a kernel service to make use of the service upgrade facility. This involves: (1) Pass an upgrade request flag to rxrpc_kernel_begin_call(). (2) Make rxrpc_kernel_recv_data() return the call's current service ID so that the caller can detect service upgrade and see what the service was upgraded to. Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18KEYS: Fix race between updating and finding a negative keyDavid Howells
Consolidate KEY_FLAG_INSTANTIATED, KEY_FLAG_NEGATIVE and the rejection error into one field such that: (1) The instantiation state can be modified/read atomically. (2) The error can be accessed atomically with the state. (3) The error isn't stored unioned with the payload pointers. This deals with the problem that the state is spread over three different objects (two bits and a separate variable) and reading or updating them atomically isn't practical, given that not only can uninstantiated keys change into instantiated or rejected keys, but rejected keys can also turn into instantiated keys - and someone accessing the key might not be using any locking. The main side effect of this problem is that what was held in the payload may change, depending on the state. For instance, you might observe the key to be in the rejected state. You then read the cached error, but if the key semaphore wasn't locked, the key might've become instantiated between the two reads - and you might now have something in hand that isn't actually an error code. The state is now KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE or a negative error code if the key is negatively instantiated. The key_is_instantiated() function is replaced with key_is_positive() to avoid confusion as negative keys are also 'instantiated'. Additionally, barriering is included: (1) Order payload-set before state-set during instantiation. (2) Order state-read before payload-read when using the key. Further separate barriering is necessary if RCU is being used to access the payload content after reading the payload pointers. Fixes: 146aa8b1453b ("KEYS: Merge the type-specific data with the payload data") Cc: stable@vger.kernel.org # v4.4+ Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Eric Biggers <ebiggers@google.com>
2017-10-18mac80211: validate user rate mask before configuring driverJohannes Berg
Ben reported that when the user rate mask is rejected for not matching any basic rate, the driver had already been configured. This is clearly an oversight in my original change, fix this by doing the validation before calling the driver. Reported-by: Ben Greear <greearb@candelatech.com> Fixes: e8e4f5280ddd ("mac80211: reject/clear user rate mask if not usable") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-10-18cfg80211: fix connect/disconnect edge casesJohannes Berg
If we try to connect while already connected/connecting, but this fails, we set ssid_len=0 but leave current_bss hanging, leading to errors. Check all of this better, first of all ensuring that we can't try to connect to a different SSID while connected/ing; ensure that prev_bssid is set for re-association attempts even in the case of the driver supporting the connect() method, and don't reset ssid_len in the failure cases. While at it, also reset ssid_len while disconnecting unless we were connected and expect a disconnected event, and warn on a successful connection without ssid_len being set. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-10-18mac80211: use constant time comparison with keysJason A. Donenfeld
Otherwise we risk leaking information via timing side channel. Fixes: fdf7cb4185b6 ("mac80211: accept key reinstall without changing anything") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-10-17ethtool: add ethtool_intersect_link_masksAlan Brady
This function provides a way to intersect two link masks together to find the common ground between them. For example in i40e, the driver first generates link masks for what is supported by the PHY type. The driver then gets the link masks for what the NVM supports. The resulting intersection between them yields what can truly be supported. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-17net: export netdev_txq_to_tc to allow sch_mqprio to compile as moduleHenrik Austad
In commit 32302902ff09 ("mqprio: Reserve last 32 classid values for HW traffic classes and misc IDs") sch_mqprio started using netdev_txq_to_tc to find the correct tc instead of dev->tc_to_txq[] However, when mqprio is compiled as a module, it cannot resolve the symbol, leading to this error: ERROR: "netdev_txq_to_tc" [net/sched/sch_mqprio.ko] undefined! This adds an EXPORT_SYMBOL() since the other user in the kernel (netif_set_xps_queue) is also EXPORT_SYMBOL() (and not _GPL) or in a sysfs-callback. Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Henrik Austad <haustad@cisco.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-17batman-adv: Avoid spurious warnings from bat_v neigh_cmp implementationSven Eckelmann
The neighbor compare API implementation for B.A.T.M.A.N. V checks whether the neigh_ifinfo for this neighbor on a specific interface exists. A warning is printed when it isn't found. But it is not called inside a lock which would prevent that this information is lost right before batadv_neigh_ifinfo_get. It must therefore be expected that batadv_v_neigh_(cmp|is_sob) might not be able to get the requested neigh_ifinfo. A WARN_ON for such a situation seems not to be appropriate because this will only flood the kernel logs. The warnings must therefore be removed. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-10-16tipc: fix rebasing errorJon Maloy
In commit 2f487712b893 ("tipc: guarantee that group broadcast doesn't bypass group unicast") there was introduced a last-minute rebasing error that broke non-group communication. We fix this here. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16Merge tag 'mac80211-for-davem-2017-10-16' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Just a single fix, for a WoWLAN-related part of CVE-2017-13080. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: core: rcu-ify rtnl af_opsFlorian Westphal
rtnl af_ops currently rely on rtnl mutex: unregister (called from module exit functions) takes the rtnl mutex and all users that do af_ops lookup also take the rtnl mutex. IOW, parallel rmmod will block until doit() callback is done. As none of the af_ops implementation sleep we can use rcu instead. doit functions that need the af_ops can now use rcu instead of the rtnl mutex provided the mutex isn't needed for other reasons. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16rtnetlink: place link af dump into own helperFlorian Westphal
next patch will rcu-ify rtnl af_ops, i.e. allow af_ops lookup and function calls with rcu read lock held instead of rtnl mutex. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16tcp: cdg: make struct tcp_cdg staticColin Ian King
The structure tcp_cdg is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol 'tcp_cdg' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16dev_ioctl: add missing NETDEV_CHANGE_TX_QUEUE_LEN event notificationXin Long
When changing dev tx_queue_len via netlink or net-sysfs, a NETDEV_CHANGE_TX_QUEUE_LEN event notification will be called. But dev_ioctl missed this event notification, which could cause no userspace notification would be sent. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net/sched: cls_flower: Set egress_dev mark when calling into the HW driverOr Gerlitz
Commit 7091d8c '(net/sched: cls_flower: Add offload support using egress Hardware device') made sure (when fl_hw_replace_filter is called) to put the egress_dev mark on persisent structure instance. Hence, following calls into the HW driver for stats and deletion will note it and act accordingly. With commit de4784ca030f this property is lost and hence when called, the HW driver failes to operate (stats, delete) on the offloaded flow. Fix it by setting the egress_dev flag whenever the ingress device is different from the hw device since this is exactly the condition under which we're calling into the HW driver through the egress port net-device. Fixes: de4784ca030f ('net: sched: get rid of struct tc_to_netdev') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: dccp: mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that for options.c file, I placed the "fall through" comment on its own line, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16ieee802154: netlink: fix typo of the name of struct genl_opsXue Liu
Signed-off-by: Xue Liu <liuxuenetmail@gmail.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2017-10-16ipv6: only update __use and lastusetime once per jiffy at mostWei Wang
In order to not dirty the cacheline too often, we try to only update dst->__use and dst->lastusetime at most once per jiffy. As dst->lastusetime is only used by ipv6 garbage collector, it should be good enough time resolution. And __use is only used in ipv6_route_seq_show() to show how many times a dst has been used. And as __use is not atomic_t right now, it does not show the precise number of usage times anyway. So we think it should be OK to only update it at most once per jiffy. According to my latest syn flood test on a machine with intel Xeon 6th gen processor and 2 10G mlx nics bonded together, each with 8 rx queues on 2 NUMA nodes: With this patch, the packet process rate increases from ~3.49Mpps to ~3.75Mpps with a 7% increase rate. Note: dst_use() is being renamed to dst_hold_and_use() to better specify the purpose of the function. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@googl.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16ipv6: check fn before doing FIB6_SUBTREE(fn)Wei Wang
In fib6_locate(), we need to first make sure fn is not NULL before doing FIB6_SUBTREE(fn) to avoid crash. This fixes the following static checker warning: net/ipv6/ip6_fib.c:1462 fib6_locate() warn: variable dereferenced before check 'fn' (see line 1459) net/ipv6/ip6_fib.c 1458 if (src_len) { 1459 struct fib6_node *subtree = FIB6_SUBTREE(fn); ^^^^^^^^^^^^^^^^ We shifted this dereference 1460 1461 WARN_ON(saddr == NULL); 1462 if (fn && subtree) ^^ before the check for NULL. 1463 fn = fib6_locate_1(subtree, saddr, src_len, 1464 offsetof(struct rt6_info, rt6i_src) Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16tun: call dev_get_valid_name() before register_netdevice()Cong Wang
register_netdevice() could fail early when we have an invalid dev name, in which case ->ndo_uninit() is not called. For tun device, this is a problem because a timer etc. are already initialized and it expects ->ndo_uninit() to clean them up. We could move these initializations into a ->ndo_init() so that register_netdevice() knows better, however this is still complicated due to the logic in tun_detach(). Therefore, I choose to just call dev_get_valid_name() before register_netdevice(), which is quicker and much easier to audit. And for this specific case, it is already enough. Fixes: 96442e42429e ("tuntap: choose the txq based on rxq") Reported-by: Dmitry Alexeev <avekceeb@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: sched: propagate q and parent from caller down to tcf_fill_nodeJiri Pirko
The callers have this info, they will pass it down to tcf_fill_node. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: sched: use tcf_block_q helper to get q pointer for sch_tree_lockJiri Pirko
Use tcf_block_q helper to get q pointer to be used for direct call of sch_tree_lock/unlock instead of tcf_tree_lock/unlock. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: sched: tcindex, fw, flow: use tcf_block_q helper to get struct QdiscJiri Pirko
Use helper to get q pointer per block. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: sched: cls_u32: use block instead of q in tc_u_commonJiri Pirko
tc_u_common is now per-q. With blocks, it has to be converted to be per-block. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: sched: ematch: obtain net pointer from blocksJiri Pirko
Instead of using tp->q, use block to get the net pointer. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: sched: store net pointer in block and introduce qdisc_net helperJiri Pirko
Store net pointer in the block structure. Along the way, introduce qdisc_net helper which allows to easily obtain net pointer for qdisc instance. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: sched: store Qdisc pointer in struct blockJiri Pirko
Prepare for removal of tp->q and store Qdisc pointer in the block structure. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16mqprio: Reserve last 32 classid values for HW traffic classes and misc IDsAlexander Duyck
This patch makes a slight tweak to mqprio in order to bring the classid values used back in line with what is used for mq. The general idea is to reserve values :ffe0 - :ffef to identify hardware traffic classes normally reported via dev->num_tc. By doing this we can maintain a consistent behavior with mq for classid where :1 - :ffdf will represent a physical qdisc mapped onto a Tx queue represented by classid - 1, and the traffic classes will be mapped onto a known subset of classid values reserved for our virtual qdiscs. Note I reserved the range from :fff0 - :ffff since this way we might be able to reuse these classid values with clsact and ingress which would mean that for mq, mqprio, ingress, and clsact we should be able to maintain a similar classid layout. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: enable interface alias removal via rtnlNicolas Dichtel
IFLA_IFALIAS is defined as NLA_STRING. It means that the minimal length of the attribute is 1 ("\0"). However, to remove an alias, the attribute length must be 0 (see dev_set_alias()). Let's define the type to NLA_BINARY to allow 0-length string, so that the alias can be removed. Example: $ ip l s dummy0 alias foo $ ip l l dev dummy0 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:20:30:4f:a7:f3 brd ff:ff:ff:ff:ff:ff alias foo Before the patch: $ ip l s dummy0 alias "" RTNETLINK answers: Numerical result out of range After the patch: $ ip l s dummy0 alias "" $ ip l l dev dummy0 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:20:30:4f:a7:f3 brd ff:ff:ff:ff:ff:ff CC: Oliver Hartkopp <oliver@hartkopp.net> CC: Stephen Hemminger <stephen@networkplumber.org> Fixes: 96ca4a2cc145 ("net: remove ifalias on empty given alias") Reported-by: Julien FLoret <julien.floret@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>