summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-03-01socket: skip checking sk_err for recvmmsg(MSG_ERRQUEUE)Soheil Hassas Yeganeh
recvmmsg does not call ___sys_recvmsg when sk_err is set. That is fine for normal reads but, for MSG_ERRQUEUE, recvmmsg should always call ___sys_recvmsg regardless of sk->sk_err to be able to clear error queue. Otherwise, users are not able to drain the error queue using recvmmsg. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net: allow interface to be set into VRF if VLAN interface in same VRFMike Manning
Setting an interface into a VRF fails with 'RTNETLINK answers: File exists' if one of its VLAN interfaces is already in the same VRF. As the VRF is an upper device of the VLAN interface, it is also showing up as an upper device of the interface itself. The solution is to restrict this check to devices other than master. As only one master device can be linked to a device, the check in this case is that the upper device (VRF) being linked to is not the same as the master device instead of it not being any one of the upper devices. The following example shows an interface ens12 (with a VLAN interface ens12.10) being set into VRF green, which behaves as expected: # ip link add link ens12 ens12.10 type vlan id 10 # ip link set dev ens12 master vrfgreen # ip link show dev ens12 3: ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master vrfgreen state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff But if the VLAN interface has previously been set into the same VRF, then setting the interface into the VRF fails: # ip link set dev ens12 nomaster # ip link set dev ens12.10 master vrfgreen # ip link show dev ens12.10 39: ens12.10@ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrfgreen state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff # ip link set dev ens12 master vrfgreen RTNETLINK answers: File exists The workaround is to move the VLAN interface back into the default VRF beforehand, but it has to be shut first so as to avoid the risk of traffic leaking from the VRF. This fix avoids needing this workaround. Signed-off-by: Mike Manning <mmanning@att.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01Bluetooth: Fix missing encryption refresh on Security RequestSzymon Janc
If Security Request is received on connection that is already encrypted with sufficient security master should perform encryption key refresh procedure instead of just ignoring Slave Security Request (Core Spec 5.0 Vol 3 Part H 2.4.6). > ACL Data RX: Handle 3585 flags 0x02 dlen 6 SMP: Security Request (0x0b) len 1 Authentication requirement: Bonding, No MITM, SC, No Keypresses (0x09) < HCI Command: LE Start Encryption (0x08|0x0019) plen 28 Handle: 3585 Random number: 0x0000000000000000 Encrypted diversifier: 0x0000 Long term key: 44264272a5c426a9e868f034cf0e69f3 > HCI Event: Command Status (0x0f) plen 4 LE Start Encryption (0x08|0x0019) ncmd 1 Status: Success (0x00) > HCI Event: Encryption Key Refresh Complete (0x30) plen 3 Status: Success (0x00) Handle: 3585 Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-03-01ipv6: allow userspace to add IFA_F_OPTIMISTIC addressesSabrina Dubroca
According to RFC 4429 (section 3.1), adding new IPv6 addresses as optimistic addresses is acceptable, as long as the implementation follows some rules: * Optimistic DAD SHOULD only be used when the implementation is aware that the address is based on a most likely unique interface identifier (such as in [RFC2464]), generated randomly [RFC3041], or by a well-distributed hash function [RFC3972] or assigned by Dynamic Host Configuration Protocol for IPv6 (DHCPv6) [RFC3315]. Optimistic DAD SHOULD NOT be used for manually entered addresses. Thus, it seems reasonable to allow userspace to set the optimistic flag when adding new addresses. We must not let userspace set NODAD + OPTIMISTIC, since if the kernel is not performing DAD we would never clear the optimistic flag. We must also ignore userspace's request to add OPTIMISTIC flag to addresses that have already completed DAD (addresses that don't have the TENTATIVE flag, or that have the DADFAILED flag). Then we also need to clear the OPTIMISTIC flag on permanent addresses when DAD fails. Otherwise, IFA_F_OPTIMISTIC addresses added by userspace can still be used after DAD has failed, because in ipv6_chk_addr_and_flags(), IFA_F_OPTIMISTIC overrides IFA_F_TENTATIVE. Setting IFA_F_OPTIMISTIC from userspace is conditional on CONFIG_IPV6_OPTIMISTIC_DAD and the optimistic_dad sysctl. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net: Fix spelling mistake "greater then" -> "greater than"Gal Pressman
Fix trivial spelling mistake "greater then" -> "greater than". Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net: ipv4: avoid unused variable warning for sysctlArnd Bergmann
The newly introudced ip_min_valid_pmtu variable is only used when CONFIG_SYSCTL is set: net/ipv4/route.c:135:12: error: 'ip_min_valid_pmtu' defined but not used [-Werror=unused-variable] This moves it to the other variables like it, to avoid the harmless warning. Fixes: c7272c2f1229 ("net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net/smc: prevent new connections on link groupKarsten Graul
When the processing of a DELETE LINK message has started, new connections should not be added to the link group that is about to terminate. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net/smc: process add/delete link messagesKarsten Graul
Add initial support for the LLC messages ADD LINK and DELETE LINK. Introduce a link state field. Extend the initial LLC handshake with ADD LINK processing. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net/smc: do not allow eyecatchers in rmbeKarsten Graul
SMC does not support eyecatchers in RMB elements, decline peers requesting this support. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net/smc: process confirm/delete rkey messagesKarsten Graul
Process and respond to CONFIRM RKEY and DELETE RKEY messages. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net/smc: respond to test link messagesKarsten Graul
Add TEST LINK message responses, which also serves as preparation for support of sockopt TCP_KEEPALIVE. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net/smc: remove unused fields from smc structuresKarsten Graul
The daddr field holds the destination IPv4 address. The field was set but never used and can be removed. The addr field was a left-over from an earlier version of non-blocking connects and can be removed. The result of the call to kernel_getpeername is not used, the call can be removed. Non-blocking connects are working, so remove restriction comment. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net/smc: move netinfo function to file smc_clc.cKarsten Graul
The function smc_netinfo_by_tcpsk() belongs to CLC handling. Move it to smc_clc.c and rename to smc_clc_netinfo_by_tcpsk. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net/smc: cleanup smc_llc.h and smc_clc.h headersStefan Raspl
Remove structures used internal only from headers. And remove an extra function parameter. Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite dumproute flowsYuval Mintz
The various MFC entries are being held in the same kind of mr_tables for both ipmr and ip6mr, and their traversal logic is identical. Also, with the exception of the addresses [and other small tidbits] the major bulk of the nla setting is identical. Unite as much of the dumping as possible between the two. Notice this requires creating an mr_table iterator for each, as the for-each preprocessor macro can't be used by the common logic. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ip6mr: Remove MFC_NOTIFY and refactor flagsYuval Mintz
MFC_NOTIFY exists in ip6mr, probably as some legacy code [was already removed for ipmr in commit 06bd6c0370bb ("net: ipmr: remove unused MFC_NOTIFY flag and make the flags enum"). Remove it from ip6mr as well, and move the enum into a common file; Notice MFC_OFFLOAD is currently only used by ipmr. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite vif seq functionsYuval Mintz
Same as previously done with the mfc seq, the logic for the vif seq is refactored to be shared between ipmr and ip6mr. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite mfc seq logicYuval Mintz
With the exception of the final dump, ipmr and ip6mr have the exact same seq logic for traversing a given mr_table. Refactor that code and make it common. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite logic for searching in MFC cacheYuval Mintz
ipmr and ip6mr utilize the exact same methods for searching the hashed resolved connections, difference being only in the construction of the hash comparison key. In order to unite the flow, introduce an mr_table operation set that would contain the protocol specific information required for common flows, in this case - the hash parameters and a comparison key representing a (*,*) route. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Make mfc_cache a common structureYuval Mintz
mfc_cache and mfc6_cache are almost identical - the main difference is in the origin/group addresses and comparison-key. Make a common structure encapsulating most of the multicast routing logic - mr_mfc and convert both ipmr and ip6mr into using it. For easy conversion [casting, in this case] mr_mfc has to be the first field inside every multicast routing abstraction utilizing it. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite creation of new mr_tableYuval Mintz
Now that both ipmr and ip6mr are using the same mr_table structure, we can have a common function to allocate & initialize a new instance. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01mroute*: Make mr_table a common structYuval Mintz
Following previous changes to ip6mr, mr_table and mr6_table are basically the same [up to mr6_table having additional '6' suffixes to its variable names]. Move the common structure definition into a common header; This requires renaming all references in ip6mr to variables that had the distinct suffix. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ip6mr: Align hash implementation to ipmrYuval Mintz
Since commit 8fb472c09b9d ("ipmr: improve hash scalability") ipmr has been using rhashtable as a basis for its mfc routes, but ip6mr is currently still using the old private MFC hash implementation. Align ip6mr to the current ipmr implementation. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ip6mr: Make mroute_sk rcu-basedYuval Mintz
In ipmr the mr_table socket is handled under RCU. Introduce the same for ip6mr. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr,ipmr6: Define a uniform vif_deviceYuval Mintz
The two implementations have almost identical structures - vif_device and mif_device. As a step toward uniforming the mr_tables, eliminate the mif_device and relocate the vif_device definition into a new common header file. Also, introduce a common initializing function for setting most of the vif_device fields in a new common source file. This requires modifying the ipv{4,6] Kconfig and ipv4 makefile as we're introducing a new common config option - CONFIG_IP_MROUTE_COMMON. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01xfrm: Fix ESN sequence number handling for IPsec GSO packets.Steffen Klassert
When IPsec offloading was introduced, we accidentally incremented the sequence number counter on the xfrm_state by one packet too much in the ESN case. This leads to a sequence number gap of one packet after each GSO packet. Fix this by setting the sequence number to the correct value. Fixes: d7dbefc45cf5 ("xfrm: Add xfrm_replay_overflow functions for offloading") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-02-28ipv6: route: dissect flow in input path if fib rules need itRoopa Prabhu
Dissect flow in fwd path if fib rules require it. Controlled by a flag to avoid penatly for the common case. Flag is set when fib rules with sport, dport and proto match that require flow dissect are installed. Also passes the dissected hash keys to the multipath hash function when applicable to avoid dissecting the flow again. icmp packets will continue to use inner header for hash calculations. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipv6: route: dissect flow in input path if fib rules need itRoopa Prabhu
Dissect flow in fwd path if fib rules require it. Controlled by a flag to avoid penatly for the common case. Flag is set when fib rules with sport, dport and proto match that require flow dissect are installed. Also passes the dissected hash keys to the multipath hash function when applicable to avoid dissecting the flow again. icmp packets will continue to use inner header for hash calculations (Thanks to Nikolay Aleksandrov for some review here). Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipv6: fib6_rules: support for match on sport, dport and ip protoRoopa Prabhu
support to match on src port, dst port and ip protocol. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipv4: fib_rules: support match on sport, dport and ip protoRoopa Prabhu
support to match on src port, dst port and ip protocol. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net: fib_rules: support for match on ip_proto, sport and dportRoopa Prabhu
uapi for ip_proto, sport and dport range match in fib rules. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipvs: remove IPS_NAT_MASK check to fix passive FTPJulian Anastasov
The IPS_NAT_MASK check in 4.12 replaced previous check for nfct_nat() which was needed to fix a crash in 2.6.36-rc, see commit 7bcbf81a2296 ("ipvs: avoid oops for passive FTP"). But as IPVS does not set the IPS_SRC_NAT and IPS_DST_NAT bits, checking for IPS_NAT_MASK prevents PASV response to be properly mangled and blocks the transfer. Remove the check as it is not needed after 3.12 commit 41d73ec053d2 ("netfilter: nf_conntrack: make sequence number adjustments usuable without NAT") which changes nfct_nat() with nfct_seqadj() and especially after 3.13 commit b25adce16064 ("ipvs: correct usage/allocation of seqadj ext in ipvs"). Thanks to Li Shuang and Florian Westphal for reporting the problem! Reported-by: Li Shuang <shuali@redhat.com> Fixes: be7be6e161a2 ("netfilter: ipvs: fix incorrect conflict resolution") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-28mlxsw: spectrum: Fix handling of resource_size_paramJiri Pirko
Current code uses global variables, adjusts them and passes pointer down to devlink. With every other mlxsw_core instance, the previously passed pointer values are rewritten. Fix this by de-globalize the variables and also memcpy size_params during devlink resource registration. Also, introduce a convenient size_param_init helper. Fixes: ef3116e5403e ("mlxsw: spectrum: Register KVD resources with devlink") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net/smc: fix NULL pointer dereference on sock_create_kern() error pathDavide Caratti
when sock_create_kern(..., a) returns an error, 'a' might not be a valid pointer, so it shouldn't be dereferenced to read a->sk->sk_sndbuf and and a->sk->sk_rcvbuf; not doing that caused the following crash: general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 4254 Comm: syzkaller919713 Not tainted 4.16.0-rc1+ #18 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:smc_create+0x14e/0x300 net/smc/af_smc.c:1410 RSP: 0018:ffff8801b06afbc8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff8801b63457c0 RCX: ffffffff85a3e746 RDX: 0000000000000004 RSI: 00000000ffffffff RDI: 0000000000000020 RBP: ffff8801b06afbf0 R08: 00000000000007c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8801b6345c08 R14: 00000000ffffffe9 R15: ffffffff8695ced0 FS: 0000000001afb880(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000040 CR3: 00000001b0721004 CR4: 00000000001606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __sock_create+0x4d4/0x850 net/socket.c:1285 sock_create net/socket.c:1325 [inline] SYSC_socketpair net/socket.c:1409 [inline] SyS_socketpair+0x1c0/0x6f0 net/socket.c:1366 do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x26/0x9b RIP: 0033:0x4404b9 RSP: 002b:00007fff44ab6908 EFLAGS: 00000246 ORIG_RAX: 0000000000000035 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004404b9 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000002b RBP: 00007fff44ab6910 R08: 0000000000000002 R09: 00007fff44003031 R10: 0000000020000040 R11: 0000000000000246 R12: ffffffffffffffff R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000 Code: 48 c1 ea 03 80 3c 02 00 0f 85 b3 01 00 00 4c 8b a3 48 04 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 20 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 82 01 00 00 4d 8b 7c 24 20 48 b8 00 00 00 00 RIP: smc_create+0x14e/0x300 net/smc/af_smc.c:1410 RSP: ffff8801b06afbc8 Fixes: cd6851f30386 smc: remote memory buffers (RMBs) Reported-and-tested-by: syzbot+aa0227369be2dcc26ebe@syzkaller.appspotmail.com Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net/smc: use link_id of server in confirm link replyKarsten Graul
The CONFIRM LINK reply message must contain the link_id sent by the server. And set the link_id explicitly when initializing the link. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net/smc: use a constant for control message lengthKarsten Graul
The sizeof(struct smc_cdc_msg) evaluates to 48 bytes instead of the required 44 bytes. We need to use the constant value of SMC_WR_TX_SIZE to set and check the control message length. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net: sch: prio: Add offload ability for grafting a childNogah Frankel
Offload sch_prio graft command for capable drivers. Warn in case of a failure, unless the graft was done as part of a destroy operation (the new qdisc is a noop) or if all the qdiscs (the parent, the old child, and the new one) are not offloaded. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: prio: Delete child qdiscs when removing bandsNogah Frankel
When the number the bands of sch_prio is decreased, child qdiscs on the deleted bands would get deleted as well. This change and deletions are being done under sch_tree_lock of the sch_prio qdisc. Part of the destruction of qdisc is unoffloading it, if it is offloaded. Un-offloading can't be done inside this lock. Move the offload command to be done before reducing the number of bands, so unoffloading of the qdiscs that are about to be deleted could be done outside of the lock. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net: sch: Don't warn on missmatching qlen and backlog for offloaded qdiscsNogah Frankel
Offloaded qdiscs are allowed to expose only parts of their statistics. It means that if backlog is being exposed and qlen is not, it might trigger a warning in qdisc_tree_reduce_backlog. Do not warn in case the qdisc that was removed was an offloaded one. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net/tcp/illinois: replace broken algorithm reference linkJoey Pabalinas
The link to the pdf containing the algorithm description is now a dead link; it seems http://www.ifp.illinois.edu/~srikant/ has been moved to https://sites.google.com/a/illinois.edu/srikant/ and none of the original papers can be found there... I have replaced it with the only working copy I was able to find. n.b. there is also a copy available at: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.6350&rep=rep1&type=pdf However, this seems to only be a *cached* version, so I am unsure exactly how reliable that link can be expected to remain over time and have decided against using that one. Signed-off-by: Joey Pabalinas <joeypabalinas@gmail.com> 1 file changed, 1 insertion(+), 1 deletion(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28inet: whitespace cleanupStephen Hemminger
Ran simple script to find/remove trailing whitespace and blank lines at EOF because that kind of stuff git whines about and editors leave behind. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28tcp: purge write queue upon RSTSoheil Hassas Yeganeh
When the connection is reset, there is no point in keeping the packets on the write queue until the connection is closed. RFC 793 (page 70) and RFC 793-bis (page 64) both suggest purging the write queue upon RST: https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07 Moreover, this is essential for a correct MSG_ZEROCOPY implementation, because userspace cannot call close(fd) before receiving zerocopy signals even when the connection is reset. Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28tcp: revert F-RTO extension to detect more spurious timeoutsYuchung Cheng
This reverts commit 89fe18e44f7ee5ab1c90d0dff5835acee7751427. While the patch could detect more spurious timeouts, it could cause poor TCP performance on broken middle-boxes that modifies TCP packets (e.g. receive window, SACK options). Since the performance gain is much smaller compared to the potential loss. The best solution is to fully revert the change. Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts") Reported-by: Teodor Milkov <tm@del.bg> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28tcp: revert F-RTO middle-box workaroundYuchung Cheng
This reverts commit cc663f4d4c97b7297fb45135ab23cfd508b35a77. While fixing some broken middle-boxes that modifies receive window fields, it does not address middle-boxes that strip off SACK options. The best solution is to fully revert this patch and the root F-RTO enhancement. Fixes: cc663f4d4c97 ("tcp: restrict F-RTO to work-around broken middle-boxes") Reported-by: Teodor Milkov <tm@del.bg> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27ip_tunnel: Rename & publish init_tunnel_flowPetr Machata
Initializing struct flowi4 is useful for drivers that need to emulate routing decisions made by a tunnel interface. Publish the function (appropriately renamed) so that the drivers in question don't need to cut'n'paste it around. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27net: GRE: Add is_gretap_dev, is_ip6gretap_devPetr Machata
Determining whether a device is a GRE device is easily done by inspecting struct net_device.type. However, for the tap variants, the type is just ARPHRD_ETHER. Therefore introduce two predicate functions that use netdev_ops to tell the tap devices. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27tls: Use correct sk->sk_prot for IPV6Boris Pismenny
The tls ulp overrides sk->prot with a new tls specific proto structs. The tls specific structs were previously based on the ipv4 specific tcp_prot sturct. As a result, attaching the tls ulp to an ipv6 tcp socket replaced some ipv6 callback with the ipv4 equivalents. This patch adds ipv6 tls proto structs and uses them when attached to ipv6 sockets. Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27sit: fix IFLA_MTU ignored on NEWLINKXin Long
Commit 128bb975dc3c ("ip6_gre: init dev->mtu and dev->hard_header_len correctly") fixed IFLA_MTU ignored on NEWLINK for ip6_gre. The same mtu fix is also needed for sit. Note that dev->hard_header_len setting for sit works fine, no need to fix it. sit is actually ipv4 tunnel, it can't call ip6_tnl_change_mtu to set mtu. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27ip6_tunnel: fix IFLA_MTU ignored on NEWLINKXin Long
Commit 128bb975dc3c ("ip6_gre: init dev->mtu and dev->hard_header_len correctly") fixed IFLA_MTU ignored on NEWLINK for ip6_gre. The same mtu fix is also needed for ip6_tunnel. Note that dev->hard_header_len setting for ip6_tunnel works fine, no need to fix it. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27ip_gre: fix IFLA_MTU ignored on NEWLINKXin Long
It's safe to remove the setting of dev's needed_headroom and mtu in __gre_tunnel_init, as discussed in [1], ip_tunnel_newlink can do it properly. Now Eric noticed that it could cover the mtu value set in do_setlink when creating a ip_gre dev. It makes IFLA_MTU param not take effect. So this patch is to remove them to make IFLA_MTU work, as in other ipv4 tunnels. [1]: https://patchwork.ozlabs.org/patch/823504/ Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Reported-by: Eric Garver <e@erig.me> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>