summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-03-01selftests: fixes for UDP GROPaolo Abeni
The current implementation for UDP GRO tests is racy: the receiver may flush the RX queue while the sending is still transmitting and incorrectly report RX errors, with a wrong number of packet received. Add explicit timeouts to the receiver for both connection activation (first packet received for UDP) and reception completion, so that in the above critical scenario the receiver will wait for the transfer completion. Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: marvell: neta: disable comphy when setting modeMarek Behún
The comphy driver for Armada 3700 by Miquèl Raynal (which is currently in linux-next) does not actually set comphy mode when phy_set_mode_ext is called. The mode is set at next call of phy_power_on. Update the driver to semantics similar to mvpp2: helper mvneta_comphy_init sets comphy mode and powers it on. When mode is to be changed in mvneta_mac_config, first power the comphy off, then call mvneta_comphy_init (which sets the mode to new one). Only do this when new mode is different from old mode. This should also work for Armada 38x, since in that comphy driver methods power_on and power_off are unimplemented. Signed-off-by: Marek Behún <marek.behun@nic.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01Merge branch 'enetc-Add-mdio-support-and-device-tree-nodes'David S. Miller
Claudiu Manoil says: ==================== enetc: Add mdio support and device tree nodes This is the missing part to enable PCI probing of the ENETC ethernet ports on the LS1028A SoC and external traffic on the LS1028A RDB board. It's one of the first items on the TODO list for the recently merged ENETC ethernet driver. v3: Add DT bindings doc for ENETC connections v4: none ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01dt-bindings: net: freescale: enetc: Add connection bindings for ENETC ↵Claudiu Manoil
ethernet nodes Define connection bindings (external PHY connections and internal links) for the ENETC on-chip ethernet controllers. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01enetc: Add ENETC PF level external MDIO supportClaudiu Manoil
Each ENETC PF has its own MDIO interface, the corresponding MDIO registers are mapped in the ENETC's Port register block. The current patch adds a driver for these PF level MDIO buses, so that each PF can manage directly its own external link. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01arm64: dts: fsl: ls1028a-rdb: Add ENETC external eth ports for the LS1028A ↵Claudiu Manoil
RDB board The LS1028A RDB board features an Atheros PHY connected over SGMII to the ENETC PF0 (or Port0). ENETC Port1 (PF1) has no external connection on this board, so it can be disabled for now. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01arm64: dts: fsl: ls1028a: Add PCI IERC node and ENETC endpointsClaudiu Manoil
The LS1028A SoC features a PCI Integrated Endpoint Root Complex (IERC) defining several integrated PCI devices, including the ENETC ethernet controller integrated endpoints (IEPs). The IERC implements ECAM (Enhanced Configuration Access Mechanism) to provide access to the PCIe config space of the IEPs. This means the the IEPs (including ENETC) do not support the standard PCIe BARs, instead the Enhanced Allocation (EA) capability structures in the ECAM space are used to fix the base addresses in the system, and the PCI subsystem uses these structures for device enumeration and discovery. The "ranges" entries contain basic information from these EA capabily structures required by the kernel for device enumeration. The current patch also enables the first 2 ENETC PFs (Physiscal Functions) and the associated VFs (Virtual Functions), 2 VFs for each PF. Each of these ENETC PFs has an external ethernet port on the LS1028A SoC. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01bpf: drop refcount if bpf_map_new_fd() fails in map_create()Peng Sun
In bpf/syscall.c, map_create() first set map->usercnt to 1, a file descriptor is supposed to return to userspace. When bpf_map_new_fd() fails, drop the refcount. Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID") Signed-off-by: Peng Sun <sironhide0null@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01netfilter: nf_tables: merge ipv4 and ipv6 nat chain typesFlorian Westphal
Merge the ipv4 and ipv6 nat chain type. This is the last missing piece which allows to provide inet family support for nat in a follow patch. The kconfig knobs for ipv4/ipv6 nat chain are removed, the nat chain type will be built unconditionally if NFT_NAT expression is enabled. Before: text data bss dec hex filename 1576 896 0 2472 9a8 nft_chain_nat_ipv4.ko 1697 896 0 2593 a21 nft_chain_nat_ipv6.ko After: text data bss dec hex filename 1832 896 0 2728 aa8 nft_chain_nat.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_tables: nat: merge nft_masq protocol specific modulesFlorian Westphal
The family specific masq modules are way too small to warrant an extra module, just place all of them in nft_masq. before: text data bss dec hex filename 1001 832 0 1833 729 nft_masq.ko 766 896 0 1662 67e nft_masq_ipv4.ko 764 896 0 1660 67c nft_masq_ipv6.ko after: 2010 960 0 2970 b9a nft_masq.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_tables: nat: merge nft_redir protocol specific modulesFlorian Westphal
before: text data bss dec hex filename 990 832 0 1822 71e nft_redir.ko 697 896 0 1593 639 nft_redir_ipv4.ko 713 896 0 1609 649 nft_redir_ipv6.ko after: text data bss dec hex filename 1910 960 0 2870 b36 nft_redir.ko size is reduced, all helpers from nft_redir.ko can be made static. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: xt_IDLETIMER: fix sysfs callback function typeSami Tolvanen
Use struct device_attribute instead of struct idletimer_tg_attr, and the correct callback function type to avoid indirect call mismatches with Control Flow Integrity checking. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_conntrack: ensure that CONNTRACK_LOCKS is power of 2Li RongQing
CONNTRACK_LOCKS is divisor when computer array index, if it is power of 2, compiler will optimize modulo operation as bitwise AND, or else modulo will lower performance. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_tables: check the result of dereferencing base_chain->statsLi RongQing
Check the result of dereferencing base_chain->stats, instead of result of this_cpu_ptr with NULL. base_chain->stats maybe be changed to NULL when a chain is updated and a new NULL counter can be attached. And we do not need to check returning of this_cpu_ptr since base_chain->stats is from percpu allocator if it is non-NULL, this_cpu_ptr returns a valid value. And fix two sparse error by replacing rcu_access_pointer and rcu_dereference with READ_ONCE under rcu_read_lock. Thanks for Eric's help to finish this patch. Fixes: 009240940e84c1 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set") Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: bridge: Don't sabotage nf_hook calls for an l3mdev slaveDavid Ahern
Followup to a173f066c7cf ("netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev"). Some packets (e.g., ndisc) do not have the skb device flipped to the l3mdev (e.g., VRF) device. Update ip_sabotage_in to not drop packets for slave devices too. Currently, neighbor solicitation packets for 'dev -> bridge (addr) -> vrf' setups are getting dropped. This patch enables IPv6 communications for bridges with an address that are enslaved to a VRF. Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01ipvs: get sctphdr by sctphoff in sctp_csum_checkXin Long
sctp_csum_check() is called by sctp_s/dnat_handler() where it calls skb_make_writable() to ensure sctphdr to be linearized. So there's no need to get sctphdr by calling skb_header_pointer() in sctp_csum_check(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: convert the proto argument from u8 to u16Li RongQing
The proto in struct xt_match and struct xt_target is u16, when calling xt_check_target/match, their proto argument is u8, and will cause truncation, it is harmless to ip packet, since ip proto is u8 if a etable's match/target has proto that is u16, will cause the check failure. and convert be16 to short in bridge/netfilter/ebtables.c Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nft_tunnel: Add dst_cache supportwenxu
The metadata_dst does not initialize the dst_cache field, this causes problems to ip_md_tunnel_xmit() since it cannot use this cache, hence, Triggering a route lookup for every packet. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: conntrack: tcp: only close if RST matches exact sequenceFlorian Westphal
TCP resets cause instant transition from established to closed state provided the reset is in-window. Endpoints that implement RFC 5961 require resets to match the next expected sequence number. RST segments that are in-window (but that do not match RCV.NXT) are ignored, and a "challenge ACK" is sent back. Main problem for conntrack is that its a middlebox, i.e. whereas an end host might have ACK'd SEQ (and would thus accept an RST with this sequence number), conntrack might not have seen this ACK (yet). Therefore we can't simply flag RSTs with non-exact match as invalid. This updates RST processing as follows: 1. If the connection is in a state other than ESTABLISHED, nothing is changed, RST is subject to normal in-window check. 2. If the RSTs sequence number either matches exactly RCV.NXT, connection state moves to CLOSE. 3. The same applies if the RST sequence number aligns with a previous packet in the same direction. In all other cases, the connection remains in ESTABLISHED state. If the normal-in-window check passes, the timeout will be lowered to that of CLOSE. If the peer sends a challenge ack, connection timeout will be reset. If the challenge ACK triggers another RST (RST was valid after all), this 2nd RST will match expected sequence and conntrack state changes to CLOSE. If no challenge ACK is received, the connection will time out after CLOSE seconds (10 seconds by default), just like without this patch. Packetdrill test case: 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // Receive a segment. 0.210 < P. 1:1001(1000) ack 1 win 46 0.210 > . 1:1(0) ack 1001 // Application writes 1000 bytes. 0.250 write(4, ..., 1000) = 1000 0.250 > P. 1:1001(1000) ack 1001 // First reset, old sequence. Conntrack (correctly) considers this // invalid due to failed window validation (regardless of this patch). 0.260 < R 2:2(0) ack 1001 win 260 // 2nd reset, but too far ahead sequence. Same: correctly handled // as invalid. 0.270 < R 99990001:99990001(0) ack 1001 win 260 // in-window, but not exact sequence. // Current Linux kernels might reply with a challenge ack, and do not // remove connection. // Without this patch, conntrack state moves to CLOSE. // With patch, timeout is lowered like CLOSE, but connection stays // in ESTABLISHED state. 0.280 < R 1010:1010(0) ack 1001 win 260 // Expect challenge ACK 0.281 > . 1001:1001(0) ack 1001 win 501 // With or without this patch, RST will cause connection // to move to CLOSE (sequence number matches) // 0.282 < R 1001:1001(0) ack 1001 win 260 // ACK 0.300 < . 1001:1001(0) ack 1001 win 257 // more data could be exchanged here, connection // is still established // Client closes the connection. 0.610 < F. 1001:1001(0) ack 1001 win 260 0.650 > . 1001:1001(0) ack 1002 // Close the connection without reading outstanding data 0.700 close(4) = 0 // so one more reset. Will be deemed acceptable with patch as well: // connection is already closing. 0.701 > R. 1001:1001(0) ack 1002 win 501 // End packetdrill test case. With patch, this generates following conntrack events: [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED] [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] Without patch, first RST moves connection to close, whereas socket state does not change until FIN is received. [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED] [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED] [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED] Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01ipvs: change some data types from int to boolAndrea Claudi
Change the data type of the following variables from int to bool across ipvs code: - found - loop - need_full_dest - need_full_svc - payload_csum Also change the following functions to use bool full_entry param instead of int: - ip_vs_genl_parse_dest() - ip_vs_genl_parse_service() This patch does not change any functionality but makes the source code slightly easier to read. Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-28net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390XMaxime Chevallier
Upon setting the cmode on 6390 and 6390X, the associated serdes interfaces must be powered off/on. Both 6390X and 6390 share code to do so, but it currently uses the 6390 specific helper mv88e6390_serdes_power() to disable and enable the serdes interface. This call will fail silently on 6390X when trying so set a 10G interface such as XAUI or RXAUI, since mv88e6390_serdes_power() internally grabs the lane number based on modes supported by the 6390, and returns 0 when getting -ENODEV as a lane number. Using mv88e6390x_serdes_power() should be safe here, since we explicitly rule-out all ports but the 9 and 10, and because modes supported by 6390 ports 9 and 10 are a subset of those supported on 6390X. This was tested on 6390X using RXAUI mode. Fixes: 364e9d7776a3 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28selftests: rtnetlink: use internal netns switch for ip commandsDavid Ahern
'ip' can switch network namespaces internally and then run a given command relative to that namespace without the need to fork and exec another ip instance. Update all references of the form: ip netns exec "$testns" ip ... to ip -netns "$testns" ... Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28Merge branch 's390-qeth-next'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: updates 2019-02-28 please apply one more qeth patch series for net-next. This eliminates some of the quirks in our reset code, and slims down the internal state machine. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: drop redundant state checkingJulian Wiedmann
Now that qeth always uses dev_close() to shutdown the interface, we can trust the locking and remove some custom state checks. qeth_l?_stop_card() is no longer called for a card in UP state, so remove the checks there too. This basically makes the UP state obsolete, so rip out the whole thing (except for the sysfs-visible string). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: don't special-case HW trap during suspendJulian Wiedmann
It makes no difference whether we 1. manually disarm the HW trap and call the offline code with recovery_mode == 1, or 2. call the offline code with recovery_mode == 0, and let it disarm the HW trap for us. So consolidate the two code paths in the suspend callback. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: remove driver-wide workqueueJulian Wiedmann
The qeth-wide workqueue is now only used by a single caller to schedule close_dev work. Just put it on a system queue instead. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: don't defer close_dev work during recoveryJulian Wiedmann
The recovery code already runs in a kthread, we don't have to defer the offlining further. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: remove a redundant check for card->devJulian Wiedmann
smatch complains that __qeth_l3_set_offline() first accesses card->dev, and then later checks whether the pointer is valid. Since commit d3d1b205e89f ("s390/qeth: allocate netdevice early"), the pointer is _always_ valid - that patch merely missed to remove this one check. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: call dev_close() during recoveryJulian Wiedmann
When resetting an interface ("recovery"), qeth currently attempts to elide the call to dev_close(). We initially only call .ndo_close to quiesce the data path, and then offline & online the ccwgroup device. If the reset succeeded, a call to .ndo_open then resumes the data path along with some internal setup (dev_addr validation, RX modeset) that dev_open() would have usually triggered. dev_close() only gets called (via the close_dev worker) if the reset action fails. It's unclear whether this was initially done due to locking concerns, or rather to execute the reset transparently. Either way, temporarily closing the interface without dev_close() is fragile, and means we're susceptible to various races and unexpected behaviour. For instance: - Bypassing dev_deactivate_many() means that the qdiscs are not set to __QDISC_STATE_DEACTIVATED. Consequently any intermittent TX completion can wake up the txq, resulting in calls to .ndo_start_xmit while the data path is down. We have custom state checking to detect this case and drop such packets. - Because the IFF_UP flag doesn't reflect the interface's actual state during a reset, we have custom state checking in .ndo_open and .ndo_close to guard against invalid calls. - Considering that the reset might take a considerable amount of time (in particular if an IO fails and we end up waiting for its timeout), we _do_ want NETDEV_GOING_DOWN and NETDEV_DOWN events so that components like bonding, team, bridge, macvlan, vlan, ... can take appropriate action. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: unconditionally clear MAC_REGISTERED flagJulian Wiedmann
In its attempt to run only the minimal amount of tear down steps, qeth_l2_stop_card() fails to reset the "is dev_addr registered?" flag in some rare scenarios. But a future change to the tear down sequence would cause us to _always_ hit this issue, so patch it up before that code lands. Fix it by unconditionally clearing the flag bit. This also allows us to remove the additional cleanup step in qeth_dev_layer2_store(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: enable/disable the HW trap a little earlierJulian Wiedmann
When setting a L2 qeth device online, enable the HW trap as soon as the control plane is available. This allows us to catch any error that occurs during the very first commands. In the same spirit, the offline code should disable the HW trap as the very first step of its processing. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28s390/qeth: remove RECOVER stateJulian Wiedmann
The offline code uses a specific RECOVER state to indicate that the interface should be brought up when a qeth device is set online again. Rather than having a specific card-state for this, just put it in an internal flag bit and set the state to DOWN. When working with the card's state transitions, this reduces the complexity quite a bit. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net: dsa: mv88e6xxx: Fix u64 statisticsAndrew Lunn
The switch maintains u64 counters for the number of octets sent and received. These are kept as two u32's which need to be combined. Fix the combing, which wrongly worked on u16's. Fixes: 80c4627b2719 ("dsa: mv88x6xxx: Refactor getting a single statistic") Reported-by: Chris Healy <Chris.Healy@zii.aero> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28xen-netback: don't populate the hash cache on XenBus disconnectIgor Druzhinin
Occasionally, during the disconnection procedure on XenBus which includes hash cache deinitialization there might be some packets still in-flight on other processors. Handling of these packets includes hashing and hash cache population that finally results in hash cache data structure corruption. In order to avoid this we prevent hashing of those packets if there are no queues initialized. In that case RCU protection of queues guards the hash cache as well. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net/smc: allow pnetid-less configurationUrsula Braun
Without hardware pnetid support there must currently be a pnet table configured to determine the IB device port to be used for SMC RDMA traffic. This patch enables a setup without pnet table, if the used handshake interface belongs already to a RoCE port. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28xen-netback: fix occasional leak of grant ref mappings under memory pressureIgor Druzhinin
Zero-copy callback flag is not yet set on frag list skb at the moment xenvif_handle_frag_list() returns -ENOMEM. This eventually results in leaking grant ref mappings since xenvif_zerocopy_callback() is never called for these fragments. Those eventually build up and cause Xen to kill Dom0 as the slots get reused for new mappings: "d0v0 Attempt to implicitly unmap a granted PTE c010000329fce005" That behavior is observed under certain workloads where sudden spikes of page cache writes coexist with active atomic skb allocations from network traffic. Additionally, rework the logic to deal with frag_list deallocation in a single place. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net: sched: pie: avoid slow division in drop probability decayLeslie Monis
As per RFC 8033, it is sufficient for the drop probability decay factor to have a value of (1 - 1/64) instead of 98%. This avoids the need to do slow division. Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28sctp: chunk.c: correct format string for size_t in printkMatthias Maennich
According to Documentation/core-api/printk-formats.rst, size_t should be printed with %zu, rather than %Zu. In addition, using %Zu triggers a warning on clang (-Wformat-extra-args): net/sctp/chunk.c:196:25: warning: data argument not used by format string [-Wformat-extra-args] __func__, asoc, max_data); ~~~~~~~~~~~~~~~~^~~~~~~~~ ./include/linux/printk.h:440:49: note: expanded from macro 'pr_warn_ratelimited' printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ ./include/linux/printk.h:424:17: note: expanded from macro 'printk_ratelimited' printk(fmt, ##__VA_ARGS__); \ ~~~ ^ Fixes: 5b5e0928f742 ("lib/vsprintf.c: remove %Z support") Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Matthias Maennich <maennich@google.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net: netem: fix skb length BUG_ON in __skb_to_sgvecSheng Lan
It can be reproduced by following steps: 1. virtio_net NIC is configured with gso/tso on 2. configure nginx as http server with an index file bigger than 1M bytes 3. use tc netem to produce duplicate packets and delay: tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90% 4. continually curl the nginx http server to get index file on client 5. BUG_ON is seen quickly [10258690.371129] kernel BUG at net/core/skbuff.c:4028! [10258690.371748] invalid opcode: 0000 [#1] SMP PTI [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.0.0-rc6 #2 [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202 [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002 [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028 [10258690.372094] FS: 0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000 [10258690.372094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0 [10258690.372094] Call Trace: [10258690.372094] <IRQ> [10258690.372094] skb_to_sgvec+0x11/0x40 [10258690.372094] start_xmit+0x38c/0x520 [virtio_net] [10258690.372094] dev_hard_start_xmit+0x9b/0x200 [10258690.372094] sch_direct_xmit+0xff/0x260 [10258690.372094] __qdisc_run+0x15e/0x4e0 [10258690.372094] net_tx_action+0x137/0x210 [10258690.372094] __do_softirq+0xd6/0x2a9 [10258690.372094] irq_exit+0xde/0xf0 [10258690.372094] smp_apic_timer_interrupt+0x74/0x140 [10258690.372094] apic_timer_interrupt+0xf/0x20 [10258690.372094] </IRQ> In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's linear data size and nonlinear data size, thus BUG_ON triggered. Because the skb is cloned and a part of nonlinear data is split off. Duplicate packet is cloned in netem_enqueue() and may be delayed some time in qdisc. When qdisc len reached the limit and returns NET_XMIT_DROP, the skb will be retransmit later in write queue. the skb will be fragmented by tso_fragment(), the limit size that depends on cwnd and mss decrease, the skb's nonlinear data will be split off. The length of the skb cloned by netem will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(), the BUG_ON trigger. To fix it, netem returns NET_XMIT_SUCCESS to upper stack when it clones a duplicate packet. Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()") Signed-off-by: Sheng Lan <lansheng@huawei.com> Reported-by: Qin Ji <jiqin.ji@huawei.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28cxgb4vf: Enter debugging mode if FW is inaccessibleArjun Vynipadath
If we are not able to reach firmware, enter debugging mode that will help us to get adapter logs. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28cxgb4: Enable outer UDP checksum offload for T6Arjun Vynipadath
T6 adapters support outer UDP checksum offload for encapsulated packets, hence enabling netdev feature flag NETIF_F_GSO_UDP_TUNNEL_CSUM. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28cxgb4/cxgb4vf: Fix up netdev->hw_featuresArjun Vynipadath
GRO is done by cxgb4/cxgb4vf. Hence set NETIF_F_GRO flag for both cxgb4/cxgb4vf. Cleaned up VLAN netdev features in cxgb4vf. Also fixed NETIF_F_HIGHDMA being set unconditionally for vlan netdev features. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.gitKalle Valo
ath.git patches for 5.1. Major changes: ath10k * more preparation for SDIO support wil6210 * support up to 20 stations in AP mode
2019-02-28wil6210: check null pointer in _wil_cfg80211_merge_extra_iesAlexei Avshalom Lazar
ies1 or ies2 might be null when code inside _wil_cfg80211_merge_extra_ies access them. Add explicit check for null and make sure ies1/ies2 are not accessed in such a case. spos might be null and be accessed inside _wil_cfg80211_merge_extra_ies. Add explicit check for null in the while condition statement and make sure spos is not accessed in such a case. Signed-off-by: Alexei Avshalom Lazar <ailizaro@codeaurora.org> Signed-off-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-02-28wil6210: ignore HALP ICR if already handledMaya Erez
HALP ICR is set as long as the FW should stay awake. To prevent its multiple handling the driver masks this IRQ bit. However, if there is a different MISC ICR before the driver clears this bit, there is a risk of race condition between HALP mask and unmask. This race leads to HALP timeout, in case it is mistakenly masked. Add an atomic flag to indicate if HALP ICR should be handled. Signed-off-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-02-28wil6210: fix invalid sta statistics updateDedy Lansky
Upon status ring handling, in case there are both unicast and multicast (cid == max) status messages to handle, wrong sta statistics might get updated. Fix this by setting stats to NULL upon invalid cid (e.g. == max_assoc_sta). Signed-off-by: Dedy Lansky <dlansky@codeaurora.org> Signed-off-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-02-28wil6210: accessing 802.3 addresses via utility functionsAhmad Masri
Rearrange the code by having functions to access 802.3 header members, source and destination addresses. Signed-off-by: Ahmad Masri <amasri@codeaurora.org> Signed-off-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-02-28wil6210: support up to 20 stations in AP modeAhmad Masri
New FW added support for upto 20 clients in AP mode. Change the driver to support this as well. FW reports it's max supported associations in WMI_READY_EVENT. Some WMI commands/events use cidxtid field which is limited to 16 cids. Use new cid/tid fields instead. For Rx packets cid from rx descriptor is limited to 3 bits (0..7), to find the real cid, compare transmitter address with the stored stations mac address in the driver sta array. EDMA FW still supports 8 stations. Extending the support to 20 stations will come later. Signed-off-by: Ahmad Masri <amasri@codeaurora.org> Signed-off-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-02-28wil6210: add option to drop Tx packets when Tx ring is fullDedy Lansky
In AP mode with multiple clients, driver stops net queue (netif_tx_stop_queue) upon first ring (serving specific client) becoming full. This can have negative effect on transmission to other clients which may still have room in their corresponding rings. Implement new policy in which stop/wake net queue are not used. In case there is no room in the ring for a transmitted packet, drop the packet. New policy can be helpful to debug performance issues, to guarantee maximum utilization of net queues. New policy is disabled by default and can be enabled by debugfs: echo 1 > drop_if_ring_full Signed-off-by: Dedy Lansky <dlansky@codeaurora.org> Signed-off-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-02-28wil6210: remove rtap_include_phy_info module paramMaya Erez
Due to a HW issue in PHY info collection rtap_include_phy_info is not in use, hence can be removed. Signed-off-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>