summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-12-18netfilter: nf_tables: Speed up selective rule dumpsPhil Sutter
If just a table name was given, nf_tables_dump_rules() continued over the list of tables even after a match was found. The simple fix is to exit the loop if it reached the bottom and ctx->table was not NULL. When iterating over the table's chains, the same problem as above existed. But worse than that, if a chain name was given the hash table wasn't used to find the corresponding chain. Fix this by introducing a helper function iterating over a chain's rules (and taking care of the cb->args handling), then introduce a shortcut to it if a chain name was given. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: nf_nat_sip: fix RTP/RTCP source port translationsAlin Nastac
Each media stream negotiation between 2 SIP peers will trigger creation of 4 different expectations (2 RTP and 2 RTCP): - INVITE will create expectations for the media packets sent by the called peer - reply to the INVITE will create expectations for media packets sent by the caller The dport used by these expectations usually match the ones selected by the SIP peers, but they might get translated due to conflicts with another expectation. When such event occur, it is important to do this translation in both directions, dport translation on the receiving path and sport translation on the sending path. This commit fixes the sport translation when the peer requiring it is also the one that starts the media stream. In this scenario, first media stream packet is forwarded from LAN to WAN and will rely on nf_nat_sip_expected() to do the necessary sport translation. However, the expectation matched by this packet does not contain the necessary information for doing SNAT, this data being stored in the paired expectation created by the sender's SIP message (INVITE or reply to it). Signed-off-by: Alin Nastac <alin.nastac@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: nat: remove nf_nat_l4proto structFlorian Westphal
This removes the (now empty) nf_nat_l4proto struct, all its instances and all the no longer needed runtime (un)register functionality. nf_nat_need_gre() can be axed as well: the module that calls it (to load the no-longer-existing nat_gre module) also calls other nat core functions. GRE nat is now always available if kernel is built with it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: nat: remove l4proto->manip_pktFlorian Westphal
This removes the last l4proto indirection, the two callers, the l3proto packet mangling helpers for ipv4 and ipv6, now call the nf_nat_l4proto_manip_pkt() helper. nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though they contain no functionality anymore to not clutter this patch. Next patch will remove the empty files and the nf_nat_l4proto struct. nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the other nat manip functionality as well, not just udp and udplite. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: nat: remove l4proto->nlattr_to_rangeFlorian Westphal
all protocols did set this to nf_nat_l4proto_nlattr_to_range, so just call it directly. The important difference is that we'll now also call it for protocols that we don't support (i.e., nf_nat_proto_unknown did not provide .nlattr_to_range). However, there should be no harm, even icmp provided this callback. If we don't implement a specific l4nat for this, nothing would make use of this information, so adding a big switch/case construct listing all supported l4protocols seems a bit pointless. This change leaves a single function pointer in the l4proto struct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: nat: remove l4proto->in_rangeFlorian Westphal
With exception of icmp, all of the l4 nat protocols set this to nf_nat_l4proto_in_range. Get rid of this and just check the l4proto in the caller. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: nat: fold in_range indirection into callerFlorian Westphal
No need for indirections here, we only support ipv4 and ipv6 and the called functions are very small. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: nat: remove l4proto->unique_tupleFlorian Westphal
fold remaining users (icmp, icmpv6, gre) into nf_nat_l4proto_unique_tuple. The static-save of old incarnation of resolved key in gre and icmp is removed as well, just use the prandom based offset like the others. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: nat: un-export nf_nat_l4proto_unique_tupleFlorian Westphal
almost all l4proto->unique_tuple implementations just call this helper, so make ->unique_tuple() optional and call its helper directly if the l4proto doesn't override it. This is an intermediate step to get rid of ->unique_tuple completely. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: remove NF_NAT_RANGE_PROTO_RANDOM supportFlorian Westphal
Historically this was net_random() based, and was then converted to a hash based algorithm (private boot seed + hash of endpoint addresses) due to concerns of leaking net_random() bits. RANDOM_FULLY mode was added later to avoid problems with hash based mode (see commit 34ce324019e76, "netfilter: nf_nat: add full port randomization support" for details). Just make prandom_u32() the default search starting point and get rid of ->secure_port() altogether. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: remove unused parameters in nf_ct_l4proto_[un]register_sysctl()Yafang Shao
These parameters aren't used now. So remove them. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: nat: limit port clash resolution attemptsFlorian Westphal
In case almost or all available ports are taken, clash resolution can take a very long time, resulting in soft lockup. This can happen when many to-be-natted hosts connect to same destination:port (e.g. a proxy) and all connections pass the same SNAT. Pick a random offset in the acceptable range, then try ever smaller number of adjacent port numbers, until either the limit is reached or a useable port was found. This results in at most 248 attempts (128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset) instead of 64000+, Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: nat: remove unnecessary 'else if' branchXiaozhou Liu
Since a pseudo-random starting point is used in finding a port in the default case, that 'else if' branch above is no longer a necessity. So remove it to simplify code. Signed-off-by: Xiaozhou Liu <liuxiaozhou@bytedance.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-16bridge: support for ndo_fdb_getRoopa Prabhu
This patch implements ndo_fdb_get for the bridge fdb. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: rtnetlink: support for fdb getRoopa Prabhu
This patch adds support for fdb get similar to route get. arguments can be any of the following (similar to fdb add/del/dump): [bridge, mac, vlan] or [bridge_port, mac, vlan, flags=[NTF_MASTER]] or [dev, mac, [vni|vlan], flags=[NTF_SELF]] Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: dsa: ksz: Add STP multicast handlingMarek Vasut
In case the destination address is link local, add override bit into the switch tag to let such a packet through the switch even if the port is blocked. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Cc: Woojung Huh <woojung.huh@microchip.com> Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: dsa: ksz: Factor out common tag codeTristram Ha
Factor out common code from the tag_ksz , so that the code can be used with other KSZ family switches which use differenly sized tags. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Signed-off-by: Marek Vasut <marex@denx.de> Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Cc: Woojung Huh <woojung.huh@microchip.com> Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: dsa: ksz: Rename NET_DSA_TAG_KSZ to _KSZ9477Tristram Ha
Rename the tag Kconfig option and related macros in preparation for addition of new KSZ family switches with different tag formats. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Signed-off-by: Marek Vasut <marex@denx.de> Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Cc: Woojung Huh <woojung.huh@microchip.com> Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16Revert "net: dccp: initialize (addr,port) listening hashtable"David S. Miller
This reverts commit ec49d83f245453515a9b6e88324e27bbcb69fbae. Cause build failures when DCCP is modular. ERROR: "inet_hashinfo2_init" [net/dccp/dccp.ko] undefined! Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16neighbor: Add protocol attributeDavid Ahern
Similar to routes and rules, add protocol attribute to neighbor entries for easier tracking of how each was created. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: dccp: initialize (addr,port) listening hashtablePeter Oskolkov
Commit d9fbc7f6431f "net: tcp: prefer listeners bound to an address" removes port-only listener lookups. This caused segfaults in DCCP lookups because DCCP did not initialize the (addr,port) hashtable. This patch adds said initialization. The only non-trivial issue here is the size of the new hashtable. It seemed reasonable to make it match the size of the port-only hashtable (= INET_LHTABLE_SIZE) that was used previously. Other parameters to inet_hashinfo2_init() match those used in TCP. Tested: syzcaller issues fixed; the second patch in the patchset tests that DCCP lookups work correctly. Fixes: d9fbc7f6431f "net: tcp: prefer listeners bound to an address" Reported-by: syzcaller <syzkaller@googlegroups.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15l2tp: Add protocol field decompressionSam Protsenko
When Protocol Field Compression (PFC) is enabled, the "Protocol" field in PPP packet will be received without leading 0x00. See section 6.5 in RFC 1661 for details. So let's decompress protocol field if needed, the same way it's done in drivers/net/ppp/pptp.c. In case when "nopcomp" pppd option is not enabled, PFC (pcomp) can be negotiated during LCP handshake, and L2TP driver in kernel will receive PPP packets with compressed Protocol field, which in turn leads to next error: Protocol Rejected (unsupported protocol 0x2145) because instead of Protocol=0x0021 in PPP packet there will be Protocol=0x21. This patch unwraps it back to 0x0021, which fixes the issue. Sending the compressed Protocol field will be implemented in subsequent patch, this one is self-sufficient. Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: clear skb->tstamp in forwarding pathsEric Dumazet
Sergey reported that forwarding was no longer working if fq packet scheduler was used. This is caused by the recent switch to EDT model, since incoming packets might have been timestamped by __net_timestamp() __net_timestamp() uses ktime_get_real(), while fq expects packets using CLOCK_MONOTONIC base. The fix is to clear skb->tstamp in forwarding paths. Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sergey Matyukevich <geomatsi@gmail.com> Tested-by: Sergey Matyukevich <geomatsi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15udp: use indirect call wrappers for GRO socket lookupPaolo Abeni
This avoids another indirect call for UDP GRO. Again, the test for the IPv6 variant is performed first. v1 -> v2: - adapted to INDIRECT_CALL_ changes Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: use indirect call wrappers at GRO transport layerPaolo Abeni
This avoids an indirect call in the receive path for TCP and UDP packets. TCP takes precedence on UDP, so that we have a single additional conditional in the common case. When IPV6 is build as module, all gro symbols except UDPv6 are builtin, while the latter belong to the ipv6 module, so we need some special care. v1 -> v2: - adapted to INDIRECT_CALL_ changes v2 -> v3: - fix build issue with CONFIG_IPV6=m Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: use indirect call wrappers at GRO network layerPaolo Abeni
This avoids an indirect calls for L3 GRO receive path, both for ipv4 and ipv6, if the latter is not compiled as a module. Note that when IPv6 is compiled as builtin, it will be checked first, so we have a single additional compare for the more common path. v1 -> v2: - adapted to INDIRECT_CALL_ changes Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: ipv4: do not handle duplicate fragments as overlappingMichal Kubecek
Since commit 7969e5c40dfd ("ip: discard IPv4 datagrams with overlapping segments.") IPv4 reassembly code drops the whole queue whenever an overlapping fragment is received. However, the test is written in a way which detects duplicate fragments as overlapping so that in environments with many duplicate packets, fragmented packets may be undeliverable. Add an extra test and for (potentially) duplicate fragment, only drop the new fragment rather than the whole queue. Only starting offset and length are checked, not the contents of the fragments as that would be too expensive. For similar reason, linear list ("run") of a rbtree node is not iterated, we only check if the new fragment is a subset of the interval covered by existing consecutive fragments. v2: instead of an exact check iterating through linear list of an rbtree node, only check if the new fragment is subset of the "run" (suggested by Eric Dumazet) Fixes: 7969e5c40dfd ("ip: discard IPv4 datagrams with overlapping segments.") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: sched: simplify the qdisc_leaf codeTonghao Zhang
Except for returning, the var leaf is not used in the qdisc_leaf(). For simplicity, remove it. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15ipv6: Fix handling of LLA with VRF and sockets bound to VRFDavid Ahern
A recent commit allows sockets bound to a VRF to receive ipv6 link local packets. However, it only works for UDP and worse TCP connection attempts to the LLA with the only listener bound to the VRF just hang where as before the client gets a reset and connection refused. Fix by adjusting ir_iif for LL addresses and packets received through a device enslaved to a VRF. Fixes: 6f12fa775530 ("vrf: mark skb for multicast or link-local as enslaved to VRF") Reported-by: Donald Sharp <sharpd@cumulusnetworks.com> Cc: Mike Manning <mmanning@vyatta.att-mail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15ipconfig: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2018-12-15 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix liveness propagation of callee saved registers, from Jakub. 2) fix overflow in bpf_jit_limit knob, from Daniel. 3) bpf_flow_dissector api fix, from Stanislav. 4) bpf_perf_event api fix on powerpc, from Sandipan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net: tcp6: prefer listeners bound to an addressPeter Oskolkov
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net: tcp: prefer listeners bound to an addressPeter Oskolkov
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net: udp6: prefer listeners bound to an addressPeter Oskolkov
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net: udp: prefer listeners bound to an addressPeter Oskolkov
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14tipc: check tsk->group in tipc_wait_for_cond()Cong Wang
tipc_wait_for_cond() drops socket lock before going to sleep, but tsk->group could be freed right after that release_sock(). So we have to re-check and reload tsk->group after it wakes up. After this patch, tipc_wait_for_cond() returns -ERESTARTSYS when tsk->group is NULL, instead of continuing with the assumption of a non-NULL tsk->group. (It looks like 'dsts' should be re-checked and reloaded too, but it is a different bug.) Similar for tipc_send_group_unicast() and tipc_send_group_anycast(). Reported-by: syzbot+10a9db47c3a0e13eb31c@syzkaller.appspotmail.com Fixes: b7d42635517f ("tipc: introduce flow control for group broadcast messages") Fixes: ee106d7f942d ("tipc: introduce group anycast messaging") Fixes: 27bd9ec027f3 ("tipc: introduce group unicast messaging") Cc: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14neighbor: Remove externally learned entries from gc_listDavid Ahern
Externally learned entries are similar to PERMANENT entries in the sense they are managed by userspace and can not be garbage collected. As such remove them from the gc_list, remove the flags check from neigh_forced_gc and skip threshold checks in neigh_alloc. As with PERMANENT entries, this allows unlimited number of NTF_EXT_LEARNED entries. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14neighbor: Move neigh_update_ext_learned to core fileDavid Ahern
neigh_update_ext_learned has one caller in neighbour.c so does not need to be defined in the header. Move it and in the process remove the intialization of ndm_flags and just set it based on the flags check. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14neighbor: Remove state and flags arguments to neigh_delDavid Ahern
neigh_del now only has 1 caller, and the state and flags arguments are both 0. Remove them and simplify neigh_del. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14neighbor: Fix state check in neigh_forced_gcDavid Ahern
PERMANENT entries are not on the gc_list so the state check is now redundant. Also, the move to not purge entries until after 5 seconds should not apply to FAILED entries; those can be removed immediately to make way for newer ones. This restores the previous logic prior to the gc_list. Fixes: 58956317c8de ("neighbor: Improve garbage collection") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14neighbor: Fix locking order for gc_list changesDavid Ahern
Lock checker noted an inverted lock order between neigh_change_state (neighbor lock then table lock) and neigh_periodic_work (table lock and then neighbor lock) resulting in: [ 121.057652] ====================================================== [ 121.058740] WARNING: possible circular locking dependency detected [ 121.059861] 4.20.0-rc6+ #43 Not tainted [ 121.060546] ------------------------------------------------------ [ 121.061630] kworker/0:2/65 is trying to acquire lock: [ 121.062519] (____ptrval____) (&n->lock){++--}, at: neigh_periodic_work+0x237/0x324 [ 121.063894] [ 121.063894] but task is already holding lock: [ 121.064920] (____ptrval____) (&tbl->lock){+.-.}, at: neigh_periodic_work+0x194/0x324 [ 121.066274] [ 121.066274] which lock already depends on the new lock. [ 121.066274] [ 121.067693] [ 121.067693] the existing dependency chain (in reverse order) is: ... Fix by renaming neigh_change_state to neigh_update_gc_list, changing it to only manage whether an entry should be on the gc_list and taking locks in the same order as neigh_periodic_work. Invoke at the end of neigh_update only if diff between old or new states has the PERMANENT flag set. Fixes: 8cc196d6ef86 ("neighbor: gc_list changes should be protected by table lock") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net: Allow class-e address assignment via ifconfig ioctlDave Taht
While most distributions long ago switched to the iproute2 suite of utilities, which allow class-e (240.0.0.0/4) address assignment, distributions relying on busybox, toybox and other forms of ifconfig cannot assign class-e addresses without this kernel patch. While CIDR has been obsolete for 2 decades, and a survey of all the open source code in the world shows the IN_whatever macros are also obsolete... rather than obsolete CIDR from this ioctl entirely, this patch merely enables class-e assignment, sanely. Signed-off-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14ip6mr: Fix potential Spectre v1 vulnerabilityGustavo A. R. Silva
vr.mifi is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/ipv6/ip6mr.c:1845 ip6mr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap) net/ipv6/ip6mr.c:1919 ip6mr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap) Fix this by sanitizing vr.mifi before using it to index mrt->vif_table' Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net_sched: fold tcf_block_cb_call() into tc_setup_cb_call()Cong Wang
After commit 69bd48404f25 ("net/sched: Remove egdev mechanism"), tc_setup_cb_call() is nearly identical to tcf_block_cb_call(), so we can just fold tcf_block_cb_call() into tc_setup_cb_call() and remove its unused parameter 'exts'. Fixes: 69bd48404f25 ("net/sched: Remove egdev mechanism") Cc: Oz Shlomo <ozsh@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14VSOCK: bind to random port for VMADDR_PORT_ANYLepton Wu
The old code always starts from fixed port for VMADDR_PORT_ANY. Sometimes when VMM crashed, there is still orphaned vsock which is waiting for close timer, then it could cause connection time out for new started VM if they are trying to connect to same port with same guest cid since the new packets could hit that orphaned vsock. We could also fix this by doing more in vhost_vsock_reset_orphans, but any way, it should be better to start from a random local port instead of a fixed one. Signed-off-by: Lepton Wu <ytht.net@gmail.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net/tls: sleeping function from invalid contextAtul Gupta
HW unhash within mutex for registered tls devices cause sleep when called from tcp_set_state for TCP_CLOSE. Release lock and re-acquire after function call with ref count incr/dec. defined kref and fp release for tls_device to ensure device is not released outside lock. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:748 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/7 INFO: lockdep is turned off. CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W O Call Trace: <IRQ> dump_stack+0x5e/0x8b ___might_sleep+0x222/0x260 __mutex_lock+0x5c/0xa50 ? vprintk_emit+0x1f3/0x440 ? kmem_cache_free+0x22d/0x2a0 ? tls_hw_unhash+0x2f/0x80 ? printk+0x52/0x6e ? tls_hw_unhash+0x2f/0x80 tls_hw_unhash+0x2f/0x80 tcp_set_state+0x5f/0x180 tcp_done+0x2e/0xe0 tcp_rcv_state_process+0x92c/0xdd3 ? lock_acquire+0xf5/0x1f0 ? tcp_v4_rcv+0xa7c/0xbe0 ? tcp_v4_do_rcv+0x70/0x1e0 Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net/tls: Init routines in create_ctxAtul Gupta
create_ctx is called from tls_init and tls_hw_prot hence initialize function pointers in common routine. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14tipc: compare remote and local protocols in tipc_udp_enable()Cong Wang
When TIPC_NLA_UDP_REMOTE is an IPv6 mcast address but TIPC_NLA_UDP_LOCAL is an IPv4 address, a NULL-ptr deref is triggered as the UDP tunnel sock is initialized to IPv4 or IPv6 sock merely based on the protocol in local address. We should just error out when the remote address and local address have different protocols. Reported-by: syzbot+eb4da3a20fad2e52555d@syzkaller.appspotmail.com Cc: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14tipc: fix a double kfree_skb()Cong Wang
tipc_udp_xmit() drops the packet on error, there is no need to drop it again. Fixes: ef20cd4dd163 ("tipc: introduce UDP replicast") Reported-and-tested-by: syzbot+eae585ba2cc2752d3704@syzkaller.appspotmail.com Cc: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14tipc: use lock_sock() in tipc_sk_reinit()Cong Wang
lock_sock() must be used in process context to be race-free with other lock_sock() callers, for example, tipc_release(). Otherwise using the spinlock directly can't serialize a parallel tipc_release(). As it is blocking, we have to hold the sock refcnt before rhashtable_walk_stop() and release it after rhashtable_walk_start(). Fixes: 07f6c4bc048a ("tipc: convert tipc reference table to use generic rhashtable") Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>