summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-06-01net: dsa: make function ksz_rcv staticColin Ian King
function ksz_rcv can be made static as it does not need to be in global scope. Reformat arguments to make it checkpatch warning free too. Cleans up sparse warning: "symbol 'ksz_rcv' was not declared. Should it be static?" Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01netlink: don't send unknown nsidNicolas Dichtel
The NETLINK_F_LISTEN_ALL_NSID otion enables to listen all netns that have a nsid assigned into the netns where the netlink socket is opened. The nsid is sent as metadata to userland, but the existence of this nsid is checked only for netns that are different from the socket netns. Thus, if no nsid is assigned to the socket netns, NETNSA_NSID_NOT_ASSIGNED is reported to the userland. This value is confusing and useless. After this patch, only valid nsid are sent to userland. Reported-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01ipv4: route: restore skb_dst_set in inet_rtm_getrouteRoopa Prabhu
recent updates to inet_rtm_getroute dropped skb_dst_set in inet_rtm_getroute. This patch restores it because it is needed to release the dst correctly. Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") Reported-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31dsa: add support for Microchip KSZ tail taggingWoojung Huh
Adding support for the Microchip KSZ switch family tail tagging. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31bpf: track stack depth of classic bpf programsAlexei Starovoitov
To track stack depth of classic bpf programs we only need to analyze ST|STX instructions, since check_load_and_stores() verifies that programs can load from stack only after write. We also need to change the way cBPF stack slots map to eBPF stack, since typical classic programs are using slots 0 and 1, so they need to map to stack offsets -4 and -8 respectively in order to take advantage of small stack interpreter and JITs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31mpls: fix clearing of dead nh_flags on link upRoopa Prabhu
recent fixes to use WRITE_ONCE for nh_flags on link up, accidently ended up leaving the deadflags on a nh. This patch fixes the WRITE_ONCE to use freshly evaluated nh_flags. Fixes: 39eb8cd17588 ("net: mpls: rt_nhn_alive and nh_flags should be accessed using READ_ONCE") Reported-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31rtnetlink: use the new rtnl_get_event() interfaceVlad Yasevich
Small clean-up to rtmsg_ifinfo() to use the rtnl_get_event() interface instead of using 'internal' values directly. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31net: dsa: remove dev arg of dsa_register_switchVivien Didelot
The current dsa_register_switch function takes a useless struct device pointer argument, which always equals ds->dev. Drivers either call it with ds->dev, or with the same device pointer passed to dsa_switch_alloc, which ends up being assigned to ds->dev. This patch removes the second argument of the dsa_register_switch and _dsa_register_switch functions. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31tcp: reinitialize MTU probing when setting MSS in a TCP repairDouglas Caetano dos Santos
MTU probing initialization occurred only at connect() and at SYN or SYN-ACK reception, but the former sets MSS to either the default or the user set value (through TCP_MAXSEG sockopt) and the latter never happens with repaired sockets. The result was that, with MTU probing enabled and unless TCP_MAXSEG sockopt was used before connect(), probing would be stuck at tcp_base_mss value until tcp_probe_interval seconds have passed. Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()NeilBrown
If you attempt a TCP mount from an host that is unreachable in a way that triggers an immediate error from kernel_connect(), that error does not propagate up, instead EAGAIN is reported. This results in call_connect_status receiving the wrong error. A case that it easy to demonstrate is to attempt to mount from an address that results in ENETUNREACH, but first deleting any default route. Without this patch, the mount.nfs process is persistently runnable and is hard to kill. With this patch it exits as it should. The problem is caused by the fact that xs_tcp_force_close() eventually calls xprt_wake_pending_tasks(xprt, -EAGAIN); which causes an error return of -EAGAIN. so when xs_tcp_setup_sock() calls xprt_wake_pending_tasks(xprt, status); the status is ignored. Fixes: 4efdd92c9211 ("SUNRPC: Remove TCP client connection reset hack") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-30net: dsa: remove dsa_port_is_bridgedVivien Didelot
The helper is only used once and makes the code more complicated that it should. Remove it and reorganize the variables so that it fits on 80 columns. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30net: mpls: remove unnecessary initialization of errDavid Ahern
err is initialized to EINVAL and not used before it is set again. Remove the unnecessary initialization. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30net: mpls: Make nla_get_via in af_mpls.cDavid Ahern
nla_get_via is only used in af_mpls.c. Remove declaration from internal.h and move up in af_mpls.c before first use. Code move only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30net: mpls: Add extack messages for route add and delete failuresDavid Ahern
Add error messages for failures in adding and deleting mpls routes. This covers most of the annoying EINVAL errors. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30net: mpls: Pull common label check into helperDavid Ahern
mpls_route_add and mpls_route_del have the same checks on the label. Move to a helper. Avoid duplicate extack messages in the next patch. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30net: Fill in extack for mpls lwt encapDavid Ahern
Fill in extack for errors in build_state for mpls lwt encap including passing extack to nla_get_labels and adding error messages for failures in it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30net: add extack arg to lwtunnel build stateDavid Ahern
Pass extack arg down to lwtunnel_build_state and the build_state callbacks. Add messages for failures in lwtunnel_build_state, and add the extarg to nla_parse where possible in the build_state callbacks. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30net: lwtunnel: Add extack to encap attr validationDavid Ahern
Pass extack down to lwtunnel_valid_encap_type and lwtunnel_valid_encap_type_attr. Add messages for unknown or unsupported encap types. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30net: ipv4: Add extack message for invalid prefix or lengthDavid Ahern
Add extack error message for invalid prefix length and invalid prefix. Example of the latter is a route spec containing 172.16.100.1/24, where the /24 mask means the lower 8-bits should be 0. Amazing how easy that one is to overlook when an EINVAL is returned. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30net: ipv4: refactor key and length checksDavid Ahern
fib_table_insert and fib_table_delete have the same checks on the prefix and length. Refactor into a helper. Avoids duplicate extack messages in the next patch. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30mac80211: Invoke TX LED in more code pathsBjorn Andersson
ieee80211_tx_status() is only one of the possible ways a driver can report a handled packet, some drivers call this for every packet while others calls it rarely or never. In order to invoke the TX LED in the non-status reporting cases this patch pushes the call to ieee80211_led_tx() into ieee80211_report_used_skb(), which is shared between the various code paths. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-05-30skbuff/mac80211: introduce and use skb_put_zero()Johannes Berg
This pattern was introduced a number of times in mac80211 just now, and since it's present in a number of other places it makes sense to add a little helper for it. This just adds the helper and transforms the mac80211 code, a later patch will transform other places. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-05-30mac80211: fix TX aggregation start/stop callback raceJohannes Berg
When starting or stopping an aggregation session, one of the steps is that the driver calls back to mac80211 that the start/stop can proceed. This is handled by queueing up a fake SKB and processing it from the normal iface/sdata work. Since this isn't flushed when disassociating, the following race is possible: * associate * start aggregation session * driver callback * disassociate * associate again to the same AP * callback processing runs, leading to a WARN_ON() that the TID hadn't requested aggregation If the second association isn't to the same AP, there would only be a message printed ("Could not find station: <addr>"), but the same race could happen. Fix this by not going the whole detour with a fake SKB etc. but simply looking up the aggregation session in the driver callback, marking it with a START_CB/STOP_CB bit and then scheduling the regular aggregation work that will now process these bits as well. This also simplifies the code and gets rid of the whole problem with allocation failures of said skb, which could have left the session in limbo. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-05-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Conntrack SCTP CRC32c checksum mangling may operate on non-linear skbuff, patch from Davide Caratti. 2) nf_tables rb-tree set backend does not handle element re-addition after deletion in the same transaction, leading to infinite loop. 3) Atomically unclear the IPS_SRC_NAT_DONE_BIT on nat module removal, from Liping Zhang. 4) Conntrack hashtable resizing while ctnetlink dump is progress leads to a dead reference to released objects in the lists, also from Liping. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-29netfilter: cttimeout: use nf_ct_iterate_cleanup_net to unlink timeout objsLiping Zhang
Similar to nf_conntrack_helper, we can use nf_ct_iterare_cleanup_net to remove these copy & paste code. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_ct_helper: use nf_ct_iterate_destroy to unlink helper objsLiping Zhang
When we unlink the helper objects, we will iterate the nf_conntrack_hash, iterate the unconfirmed list, handle the hash resize situation, etc. Actually this logic is same as the nf_ct_iterate_destroy, so we can use it to remove these copy & paste code. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: add lookup variant for fixed size hashtablePablo Neira Ayuso
This patch provides a faster variant of the lookup function for 2 and 4 byte keys. Optimizing the one byte case is not worth, as the set backend selection will always select the bitmap set type for such case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: add non-resizable hashtable implementationPablo Neira Ayuso
This patch adds a simple non-resizable hashtable implementation. If the user specifies the set size, then this new faster hashtable flavour is selected. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: allow large allocations for new setsPablo Neira Ayuso
The new fixed size hashtable backend implementation may result in a large array of buckets that would spew splats from mm. Update this code to fall back on vmalloc in case the memory allocation order is too costly. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: add nft_hash_buckets()Pablo Neira Ayuso
Add nft_hash_buckets() helper function to calculate the number of hashtable buckets based on the elements. This function can be reused from the follow up patch to add non-resizable hashtables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: pass set description to ->privsizePablo Neira Ayuso
The new non-resizable hashtable variant needs this to calculate the size of the bucket array. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: select set backend flavour depending on descriptionPablo Neira Ayuso
This patch adds the infrastructure to support several implementations of the same set type. This selection will be based on the set description and the features available for this set. This allow us to select set backend implementation that will result in better performance numbers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: use nft_rhash prefix for resizable set backendPablo Neira Ayuso
This patch prepares the introduction of a non-resizable hashtable implementation that is significantly faster. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: no size estimation if number of set elements is unknownPablo Neira Ayuso
This size estimation is ignored by the existing set backend selection logic, since this estimation structure is stack allocated, set this to ~0 to make it easier to catch bugs in future changes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: unnecessary forward declarationPablo Neira Ayuso
Replace struct rhashtable_params forward declaration by the structure definition itself. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nat: destroy nat mappings on module exit path onlyFlorian Westphal
We don't need pernetns cleanup anymore. If the netns is being destroyed, conntrack netns exit will kill all entries in this namespace, and neither conntrack hash table nor bysource hash are per namespace. For the rmmod case, we have to make sure we remove all entries from the nat bysource table, so call the new nf_ct_iterate_destroy in module exit path. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: restart iteration on resizeFlorian Westphal
We could some conntracks when a resize occurs in parallel. Avoid this by sampling generation seqcnt and doing a restart if needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: add nf_ct_iterate_destroyFlorian Westphal
sledgehammer to be used on module unload (to remove affected conntracks from all namespaces). It will also flag all unconfirmed conntracks as dying, i.e. they will not be committed to main table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: don't call iter for non-confirmed conntracksFlorian Westphal
nf_ct_iterate_cleanup_net currently calls iter() callback also for conntracks on the unconfirmed list, but this is unsafe. Acesses to nf_conn are fine, but some users access the extension area in the iter() callback, but that does only work reliably for confirmed conntracks (ct->ext can be reallocated at any time for unconfirmed conntrack). The seond issue is that there is a short window where a conntrack entry is neither on the list nor in the table: To confirm an entry, it is first removed from the unconfirmed list, then insert into the table. Fix this by iterating the unconfirmed list first and marking all entries as dying, then wait for rcu grace period. This makes sure all entries that were about to be confirmed either are in the main table, or will be dropped soon. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: rename nf_ct_iterate_cleanupFlorian Westphal
There are several places where we needlesly call nf_ct_iterate_cleanup, we should instead iterate the full table at module unload time. This is a leftover from back when the conntrack table got duplicated per net namespace. So rename nf_ct_iterate_cleanup to nf_ct_iterate_cleanup_net. A later patch will then add a non-net variant. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_rt: make local functions staticstephen hemminger
Resolves warnings: net/netfilter/nft_rt.c:26:6: warning: no previous prototype for ‘nft_rt_get_eval’ [-Wmissing-prototypes] net/netfilter/nft_rt.c:75:5: warning: no previous prototype for ‘nft_rt_get_init’ [-Wmissing-prototypes] net/netfilter/nft_rt.c:106:5: warning: no previous prototype for ‘nft_rt_get_dump’ [-Wmissing-prototypes] Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: dup: resolve warnings about missing prototypesstephen hemminger
Missing include file causes: net/netfilter/nf_dup_netdev.c:26:6: warning: no previous prototype for ‘nf_fwd_netdev_egress’ [-Wmissing-prototypes] net/netfilter/nf_dup_netdev.c:40:6: warning: no previous prototype for ‘nf_dup_netdev_egress’ [-Wmissing-prototypes] Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: ipt_CLUSTERIP: switch to nf_register_net_hookFlorian Westphal
one of the last remaining users of the old api, hopefully followup commit can remove it soon. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: ctnetlink: delete extra spaceslinzhang
This patch cleans up extra spaces. Signed-off-by: linzhang <xiaolou4617@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-27rtnl: Add support for netdev event to link messagesVlad Yasevich
When netdev events happen, a rtnetlink_event() handler will send messages for every event in it's white list. These messages contain current information about a particular device, but they do not include the iformation about which event just happened. So, it is impossible to tell what just happend for these events. This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT that would have an encoding of event that triggered this message. This would allow the the message consumer to easily determine if it needs to perform certain actions. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Overlapping changes in drivers/net/phy/marvell.c, bug fix in 'net' restricting a HW workaround alongside cleanups in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix state pruning in bpf verifier wrt. alignment, from Daniel Borkmann. 2) Handle non-linear SKBs properly in SCTP ICMP parsing, from Davide Caratti. 3) Fix bit field definitions for rss_hash_type of descriptors in mlx5 driver, from Jesper Brouer. 4) Defer slave->link updates until bonding is ready to do a full commit to the new settings, from Nithin Sujir. 5) Properly reference count ipv4 FIB metrics to avoid use after free situations, from Eric Dumazet and several others including Cong Wang and Julian Anastasov. 6) Fix races in llc_ui_bind(), from Lin Zhang. 7) Fix regression of ESP UDP encapsulation for TCP packets, from Steffen Klassert. 8) Fix mdio-octeon driver Kconfig deps, from Randy Dunlap. 9) Fix regression in setting DSCP on ipv6/GRE encapsulation, from Peter Dawson. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits) ipv4: add reference counting to metrics net: ethernet: ax88796: don't call free_irq without request_irq first ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets sctp: fix ICMP processing if skb is non-linear net: llc: add lock_sock in llc_ui_bind to avoid a race condition bonding: Don't update slave->link until ready to commit test_bpf: Add a couple of tests for BPF_JSGE. bpf: add various verifier test cases bpf: fix wrong exposure of map_flags into fdinfo for lpm bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data bpf: properly reset caller saved regs after helper call and ld_abs/ind bpf: fix incorrect pruning decision when alignment must be tracked arp: fixed -Wuninitialized compiler warning tcp: avoid fastopen API to be used on AF_UNSPEC net: move somaxconn init from sysctl code net: fix potential null pointer dereference geneve: fix fill_info when using collect_metadata virtio-net: enable TSO/checksum offloads for Q-in-Q vlans be2net: Fix offload features for Q-in-Q packets vlan: Fix tcp checksum offloads in Q-in-Q vlans ...
2017-05-26bridge: Export multicast enabled stateIdo Schimmel
During enslavement to a bridge, after the CHANGEUPPER is sent, the multicast enabled state of the bridge isn't propagated down to the offloading driver unless it's changed. This patch allows such drivers to query the multicast enabled state from the bridge, so that they'll be able to correctly configure their flood tables during port enslavement. In case multicast is disabled, unregistered multicast packets can be treated as broadcast and be flooded through all the bridge ports. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26bridge: Export VLAN filtering stateIdo Schimmel
It's useful for drivers supporting bridge offload to be able to query the bridge's VLAN filtering state. Currently, upon enslavement to a bridge master, the offloading driver will only learn about the bridge's VLAN filtering state after the bridge device was already linked with its slave. Being able to query the bridge's VLAN filtering state allows such drivers to forbid enslavement in case resource couldn't be allocated for a VLAN-aware bridge and also choose the correct initialization routine for the enslaved port, which is dependent on the bridge type. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26ipv4: add reference counting to metricsEric Dumazet
Andrey Konovalov reported crashes in ipv4_mtu() I could reproduce the issue with KASAN kernels, between 10.246.7.151 and 10.246.7.152 : 1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 & 2) At the same time run following loop : while : do ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 done Cong Wang attempted to add back rt->fi in commit 82486aa6f1b9 ("ipv4: restore rt->fi for reference counting") but this proved to add some issues that were complex to solve. Instead, I suggested to add a refcount to the metrics themselves, being a standalone object (in particular, no reference to other objects) I tried to make this patch as small as possible to ease its backport, instead of being super clean. Note that we believe that only ipv4 dst need to take care of the metric refcount. But if this is wrong, this patch adds the basic infrastructure to extend this to other families. Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang for his efforts on this problem. Fixes: 2860583fe840 ("ipv4: Kill rt->fi") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>