summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-07-28tls: Remove dead code in tls_sw_sendmsgDoron Roberts-Kedes
tls_push_record either returns 0 on success or a negative value on failure. This patch removes code that would only be executed if tls_push_record were to return a positive value. Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-28tcp_bbr: fix bw probing to raise in-flight data for very small BDPsNeal Cardwell
For some very small BDPs (with just a few packets) there was a quantization effect where the target number of packets in flight during the super-unity-gain (1.25x) phase of gain cycling was implicitly truncated to a number of packets no larger than the normal unity-gain (1.0x) phase of gain cycling. This meant that in multi-flow scenarios some flows could get stuck with a lower bandwidth, because they did not push enough packets inflight to discover that there was more bandwidth available. This was really only an issue in multi-flow LAN scenarios, where RTTs and BDPs are low enough for this to be an issue. This fix ensures that gain cycling can raise inflight for small BDPs by ensuring that in PROBE_BW mode target inflight values with a super-unity gain are always greater than inflight values with a gain <= 1. Importantly, this applies whether the inflight value is calculated for use as a cwnd value, or as a target inflight value for the end of the super-unity phase in bbr_is_next_cycle_phase() (both need to be bigger to ensure we can probe with more packets in flight reliably). This is a candidate fix for stable releases. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-28net: socket: Fix potential spectre v1 gadget in sock_is_registeredJeremy Cline
'family' can be a user-controlled value, so sanitize it after the bounds check to avoid speculative out-of-bounds access. Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-28net: socket: fix potential spectre v1 gadget in socketcallJeremy Cline
'call' is a user-controlled value, so sanitize the array index after the bounds check to avoid speculating past the bounds of the 'nargs' array. Found with the help of Smatch: net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue 'nargs' [r] (local cap) Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-07-28 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) API fixes for libbpf's BTF mapping of map key/value types in order to make them compatible with iproute2's BPF_ANNOTATE_KV_PAIR() markings, from Martin. 2) Fix AF_XDP to not report POLLIN prematurely by using the non-cached consumer pointer of the RX queue, from Björn. 3) Fix __xdp_return() to check for NULL pointer after the rhashtable lookup that retrieves the allocator object, from Taehee. 4) Fix x86-32 JIT to adjust ebp register in prologue and epilogue by 4 bytes which got removed from overall stack usage, from Wang. 5) Fix bpf_skb_load_bytes_relative() length check to use actual packet length, from Daniel. 6) Fix uninitialized return code in libbpf bpf_perf_event_read_simple() handler, from Thomas. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-28ipv4: remove BUG_ON() from fib_compute_spec_dstLorenzo Bianconi
Remove BUG_ON() from fib_compute_spec_dst routine and check in_dev pointer during flowi4 data structure initialization. fib_compute_spec_dst routine can be run concurrently with device removal where ip_ptr net_device pointer is set to NULL. This can happen if userspace enables pkt info on UDP rx socket and the device is removed while traffic is flowing Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-28bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog()Taehee Yoo
bpf_parse_prog() is protected by rcu_read_lock(). so that GFP_KERNEL is not allowed in the bpf_parse_prog(). [51015.579396] ============================= [51015.579418] WARNING: suspicious RCU usage [51015.579444] 4.18.0-rc6+ #208 Not tainted [51015.579464] ----------------------------- [51015.579488] ./include/linux/rcupdate.h:303 Illegal context switch in RCU read-side critical section! [51015.579510] other info that might help us debug this: [51015.579532] rcu_scheduler_active = 2, debug_locks = 1 [51015.579556] 2 locks held by ip/1861: [51015.579577] #0: 00000000a8c12fd1 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x2e0/0x910 [51015.579711] #1: 00000000bf815f8e (rcu_read_lock){....}, at: lwtunnel_build_state+0x96/0x390 [51015.579842] stack backtrace: [51015.579869] CPU: 0 PID: 1861 Comm: ip Not tainted 4.18.0-rc6+ #208 [51015.579891] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015 [51015.579911] Call Trace: [51015.579950] dump_stack+0x74/0xbb [51015.580000] ___might_sleep+0x16b/0x3a0 [51015.580047] __kmalloc_track_caller+0x220/0x380 [51015.580077] kmemdup+0x1c/0x40 [51015.580077] bpf_parse_prog+0x10e/0x230 [51015.580164] ? kasan_kmalloc+0xa0/0xd0 [51015.580164] ? bpf_destroy_state+0x30/0x30 [51015.580164] ? bpf_build_state+0xe2/0x3e0 [51015.580164] bpf_build_state+0x1bb/0x3e0 [51015.580164] ? bpf_parse_prog+0x230/0x230 [51015.580164] ? lock_is_held_type+0x123/0x1a0 [51015.580164] lwtunnel_build_state+0x1aa/0x390 [51015.580164] fib_create_info+0x1579/0x33d0 [51015.580164] ? sched_clock_local+0xe2/0x150 [51015.580164] ? fib_info_update_nh_saddr+0x1f0/0x1f0 [51015.580164] ? sched_clock_local+0xe2/0x150 [51015.580164] fib_table_insert+0x201/0x1990 [51015.580164] ? lock_downgrade+0x610/0x610 [51015.580164] ? fib_table_lookup+0x1920/0x1920 [51015.580164] ? lwtunnel_valid_encap_type.part.6+0xcb/0x3a0 [51015.580164] ? rtm_to_fib_config+0x637/0xbd0 [51015.580164] inet_rtm_newroute+0xed/0x1b0 [51015.580164] ? rtm_to_fib_config+0xbd0/0xbd0 [51015.580164] rtnetlink_rcv_msg+0x331/0x910 [ ... ] Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-28bpf: fix bpf_skb_load_bytes_relative pkt length checkDaniel Borkmann
The len > skb_headlen(skb) cannot be used as a maximum upper bound for the packet length since it does not have any relation to the full linear packet length when filtering is used from upper layers (e.g. in case of reuseport BPF programs) as by then skb->data, skb->len already got mangled through __skb_pull() and others. Fixes: 4e1ec56cdc59 ("bpf: add skb_load_bytes_relative helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com>
2018-07-27net: tipc: bcast: Replace GFP_ATOMIC with GFP_KERNEL in tipc_bcast_init()Jia-Ju Bai
tipc_bcast_init() is never called in atomic context. It calls kzalloc() with GFP_ATOMIC, which is not necessary. GFP_ATOMIC can be replaced with GFP_KERNEL. This is found by a static analysis tool named DCNS written by myself. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net: tipc: name_table: Replace GFP_ATOMIC with GFP_KERNEL in tipc_nametbl_init()Jia-Ju Bai
tipc_nametbl_init() is never called in atomic context. It calls kzalloc() with GFP_ATOMIC, which is not necessary. GFP_ATOMIC can be replaced with GFP_KERNEL. This is found by a static analysis tool named DCNS written by myself. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27sch_cake: Make gso-splitting configurable from userspaceDave Taht
This patch restores cake's deployed behavior at line rate to always split gso, and makes gso splitting configurable from userspace. running cake unlimited (unshaped) at 1gigE, local traffic: no-split-gso bql limit: 131966 split-gso bql limit: ~42392-45420 On this 4 stream test splitting gso apart results in halving the observed interpacket latency at no loss in throughput. Summary of tcp_nup test run 'gso-split' (at 2018-07-26 16:03:51.824728): Ping (ms) ICMP : 0.83 0.81 ms 341 TCP upload avg : 235.43 235.39 Mbits/s 301 TCP upload sum : 941.71 941.56 Mbits/s 301 TCP upload::1 : 235.45 235.43 Mbits/s 271 TCP upload::2 : 235.45 235.41 Mbits/s 289 TCP upload::3 : 235.40 235.40 Mbits/s 288 TCP upload::4 : 235.41 235.40 Mbits/s 291 verses Summary of tcp_nup test run 'no-split-gso' (at 2018-07-26 16:37:23.563960): avg median # data pts Ping (ms) ICMP : 1.67 1.73 ms 348 TCP upload avg : 234.56 235.37 Mbits/s 301 TCP upload sum : 938.24 941.49 Mbits/s 301 TCP upload::1 : 234.55 235.38 Mbits/s 285 TCP upload::2 : 234.57 235.37 Mbits/s 286 TCP upload::3 : 234.58 235.37 Mbits/s 274 TCP upload::4 : 234.54 235.42 Mbits/s 288 Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27l2tp: drop ->mru from struct l2tp_sessionGuillaume Nault
This field is not used. Treat PPPIOC*MRU the same way as PPPIOC*FLAGS: "get" requests return 0, while "set" requests vadidate the user supplied pointer but discard its value. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27l2tp: drop ->flags from struct pppol2tp_sessionGuillaume Nault
This field is not used. Keep validating user input in PPPIOCSFLAGS. Even though we discard the value, it would look wrong to succeed if an invalid address was passed from userspace. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27l2tp: ignore L2TP_ATTR_VLAN_ID netlink attributeGuillaume Nault
The value of this attribute is never used. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27l2tp: ignore L2TP_ATTR_DATA_SEQ netlink attributeGuillaume Nault
The value of this attribute is never used. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net/rds/Kconfig: Correct the RDS dependsAnders Roxell
Remove prefix 'CONFIG_' from CONFIG_IPV6 Fixes: ba7d7e2677c0 ("net/rds/Kconfig: RDS should depend on IPV6") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net: dcb: Add priority-to-DSCP map gettersPetr Machata
On ingress, a network device such as a switch assigns to packets priority based on various criteria. Common options include interpreting PCP and DSCP fields according to user configuration. When a packet egresses the switch, a reverse process may rewrite PCP and/or DSCP values according to packet priority. The following three functions support a) obtaining a DSCP-to-priority map or vice versa, and b) finding default-priority entries in APP database. The DCB subsystem supports for APP entries a very generous M:N mapping between priorities and protocol identifiers. Understandably, several (say) DSCP values can map to the same priority. But this asymmetry holds the other way around as well--one priority can map to several DSCP values. For this reason, the following functions operate in terms of bitmaps, with ones in positions that match some APP entry. - dcb_ieee_getapp_dscp_prio_mask_map() to compute for a given netdevice a map of DSCP-to-priority-mask, which gives for each DSCP value a bitmap of priorities related to that DSCP value by APP, along the lines of dcb_ieee_getapp_mask(). - dcb_ieee_getapp_prio_dscp_mask_map() similarly to compute for a given netdevice a map from priorities to a bitmap of DSCPs. - dcb_ieee_getapp_default_prio_mask() which finds all default-priority rules for a given port in APP database, and returns a mask of priorities allowed by these default-priority rules. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net: dcb: For wild-card lookups, use priority -1, not 0Petr Machata
The function dcb_app_lookup walks the list of specified DCB APP entries, looking for one that matches a given criteria: ifindex, selector, protocol ID and optionally also priority. The "don't care" value for priority is set to 0, because that priority has not been allowed under CEE regime, which predates the IEEE standardization. Under IEEE, 0 is a valid priority number. But because dcb_app_lookup considers zero a wild card, attempts to add an APP entry with priority 0 fail when other entries exist for a given ifindex / selector / PID triplet. Fix by changing the wild-card value to -1. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net: sched: don't dump chains only held by actionsJiri Pirko
In case a chain is empty and not explicitly created by a user, such chain should not exist. The only exception is if there is an action "goto chain" pointing to it. In that case, don't show the chain in the dump. Track the chain references held by actions and use them to find out if a chain should or should not be shown in chain dump. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2018-07-27 1) Extend the output_mark to also support the input direction and masking the mark values before applying to the skb. 2) Add a new lookup key for the upcomming xfrm interfaces. 3) Extend the xfrm lookups to match xfrm interface IDs. 4) Add virtual xfrm interfaces. The purpose of these interfaces is to overcome the design limitations that the existing VTI devices have. The main limitations that we see with the current VTI are the following: VTI interfaces are L3 tunnels with configurable endpoints. For xfrm, the tunnel endpoint are already determined by the SA. So the VTI tunnel endpoints must be either the same as on the SA or wildcards. In case VTI tunnel endpoints are same as on the SA, we get a one to one correlation between the SA and the tunnel. So each SA needs its own tunnel interface. On the other hand, we can have only one VTI tunnel with wildcard src/dst tunnel endpoints in the system because the lookup is based on the tunnel endpoints. The existing tunnel lookup won't work with multiple tunnels with wildcard tunnel endpoints. Some usecases require more than on VTI tunnel of this type, for example if somebody has multiple namespaces and every namespace requires such a VTI. VTI needs separate interfaces for IPv4 and IPv6 tunnels. So when routing to a VTI, we have to know to which address family this traffic class is going to be encapsulated. This is a lmitation because it makes routing more complex and it is not always possible to know what happens behind the VTI, e.g. when the VTI is move to some namespace. VTI works just with tunnel mode SAs. We need generic interfaces that ensures transfomation, regardless of the xfrm mode and the encapsulated address family. VTI is configured with a combination GRE keys and xfrm marks. With this we have to deal with some extra cases in the generic tunnel lookup because the GRE keys on the VTI are actually not GRE keys, the GRE keys were just reused for something else. All extensions to the VTI interfaces would require to add even more complexity to the generic tunnel lookup. So to overcome this, we developed xfrm interfaces with the following design goal: It should be possible to tunnel IPv4 and IPv6 through the same interface. No limitation on xfrm mode (tunnel, transport and beet). Should be a generic virtual interface that ensures IPsec transformation, no need to know what happens behind the interface. Interfaces should be configured with a new key that must match a new policy/SA lookup key. The lookup logic should stay in the xfrm codebase, no need to change or extend generic routing and tunnel lookups. Should be possible to use IPsec hardware offloads of the underlying interface. 5) Remove xfrm pcpu policy cache. This was added after the flowcache removal, but it turned out to make things even worse. From Florian Westphal. 6) Allow to update the set mark on SA updates. From Nathan Harold. 7) Convert some timestamps to time64_t. From Arnd Bergmann. 8) Don't check the offload_handle in xfrm code, it is an opaque data cookie for the driver. From Shannon Nelson. 9) Remove xfrmi interface ID from flowi. After this pach no generic code is touched anymore to do xfrm interface lookups. From Benedict Wong. 10) Allow to update the xfrm interface ID on SA updates. From Nathan Harold. 11) Don't pass zero to ERR_PTR() in xfrm_resolve_and_create_bundle. From YueHaibing. 12) Return more detailed errors on xfrm interface creation. From Benedict Wong. 13) Use PTR_ERR_OR_ZERO instead of IS_ERR + PTR_ERR. From the kbuild test robot. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-07-27 1) Fix PMTU handling of vti6. We update the PMTU on the xfrm dst_entry which is not cached anymore after the flowchache removal. So update the PMTU of the original dst_entry instead. From Eyal Birger. 2) Fix a leak of kernel memory to userspace. From Eric Dumazet. 3) Fix a possible dst_entry memleak in xfrm_lookup_route. From Tommi Rantala. 4) Fix a skb leak in case we can't call nlmsg_multicast from xfrm_nlmsg_multicast. From Florian Westphal. 5) Fix a leak of a temporary buffer in the error path of esp6_input. From Zhen Lei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27xfrm: fix ptr_ret.cocci warningskbuild test robot
net/xfrm/xfrm_interface.c:692:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Fixes: 44e2b838c24d ("xfrm: Return detailed errors from xfrmi_newlink") CC: Benedict Wong <benedictwong@google.com> Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-07-27xdp: add NULL pointer check in __xdp_return()Taehee Yoo
rhashtable_lookup() can return NULL. so that NULL pointer check routine should be added. Fixes: 02b55e5657c3 ("xdp: add MEM_TYPE_ZERO_COPY") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-26net: sched: unmark chain as explicitly created on deleteJiri Pirko
Once user manually deletes the chain using "chain del", the chain cannot be marked as explicitly created anymore. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26tls: Skip zerocopy path for ITER_KVECDoron Roberts-Kedes
The zerocopy path ultimately calls iov_iter_get_pages, which defines the step function for ITER_KVECs as simply, return -EFAULT. Taking the non-zerocopy path for ITER_KVECs avoids the unnecessary fallback. See https://lore.kernel.org/lkml/20150401023311.GL29656@ZenIV.linux.org.uk/T/#u for a discussion of why zerocopy for vmalloc data is not a good idea. Discovered while testing NBD traffic encrypted with ktls. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26net: sched: cls_api: fix dead code in switchGustavo A. R. Silva
Code at line 1850 is unreachable. Fix this by removing the break statement above it, so the code for case RTM_GETCHAIN can be properly executed. Addresses-Coverity-ID: 1472050 ("Structurally dead code") Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26l2tp: remove ->recv_payload_hookGuillaume Nault
The tunnel reception hook is only used by l2tp_ppp for skipping PPP framing bytes. This is a session specific operation, but once a PPP session sets ->recv_payload_hook on its tunnel, all frames received by the tunnel will enter pppol2tp_recv_payload_hook(), including those targeted at Ethernet sessions (an L2TPv3 tunnel can multiplex PPP and Ethernet sessions). So this mechanism is wrong, and uselessly complex. Let's just move this functionality to the pppol2tp rx handler and drop ->recv_payload_hook. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26tipc: add missing dev_put() on error in tipc_enable_l2_mediaYueHaibing
when tipc_own_id failed to obtain node identity,dev_put should be call before return -EINVAL. Fixes: 682cd3cf946b ("tipc: confgiure and apply UDP bearer MTU on running links") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26RDS: RDMA: Fix the NULL-ptr deref in rds_ib_get_mrAvinash Repaka
Registration of a memory region(MR) through FRMR/fastreg(unlike FMR) needs a connection/qp. With a proxy qp, this dependency on connection will be removed, but that needs more infrastructure patches, which is a work in progress. As an intermediate fix, the get_mr returns EOPNOTSUPP when connection details are not populated. The MR registration through sendmsg() will continue to work even with fast registration, since connection in this case is formed upfront. This patch fixes the following crash: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 4244 Comm: syzkaller468044 Not tainted 4.16.0-rc6+ #361 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544 RSP: 0018:ffff8801b059f890 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff8801b07e1300 RCX: ffffffff8562d96e RDX: 000000000000000d RSI: 0000000000000001 RDI: 0000000000000068 RBP: ffff8801b059f8b8 R08: ffffed0036274244 R09: ffff8801b13a1200 R10: 0000000000000004 R11: ffffed0036274243 R12: ffff8801b13a1200 R13: 0000000000000001 R14: ffff8801ca09fa9c R15: 0000000000000000 FS: 00007f4d050af700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4d050aee78 CR3: 00000001b0d9b006 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __rds_rdma_map+0x710/0x1050 net/rds/rdma.c:271 rds_get_mr_for_dest+0x1d4/0x2c0 net/rds/rdma.c:357 rds_setsockopt+0x6cc/0x980 net/rds/af_rds.c:347 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x4456d9 RSP: 002b:00007f4d050aedb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00000000006dac3c RCX: 00000000004456d9 RDX: 0000000000000007 RSI: 0000000000000114 RDI: 0000000000000004 RBP: 00000000006dac38 R08: 00000000000000a0 R09: 0000000000000000 R10: 0000000020000380 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffbfb36d6f R14: 00007f4d050af9c0 R15: 0000000000000005 Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 cc 01 00 00 4c 8b bb 80 04 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 9c 01 00 00 4d 8b 7f 68 48 b8 00 00 00 00 00 RIP: rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544 RSP: ffff8801b059f890 ---[ end trace 7e1cea13b85473b0 ]--- Reported-by: syzbot+b51c77ef956678a65834@syzkaller.appspotmail.com Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26net/tls: Removed redundant checks for non-NULLVakul Garg
Removed checks against non-NULL before calling kfree_skb() and crypto_free_aead(). These functions are safe to be called with NULL as an argument. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26net: rollback orig value on failure of dev_qdisc_change_tx_queue_lenTariq Toukan
Fix dev_change_tx_queue_len so it rolls back original value upon a failure in dev_qdisc_change_tx_queue_len. This is already done for notifirers' failures, share the code. In case of failure in dev_qdisc_change_tx_queue_len, some tx queues would still be of the new length, while they should be reverted. Currently, the revert is not done, and is marked with a TODO label in dev_qdisc_change_tx_queue_len, and should find some nice solution to do it. Yet it is still better to not apply the newly requested value. Fixes: 48bfd55e7e41 ("net_sched: plug in qdisc ops change_tx_queue_len") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Reported-by: Ran Rozenstein <ranro@mellanox.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26cbs: Add support for the graft functionVinicius Costa Gomes
This will allow to install a child qdisc under cbs. The main use case is to install ETF (Earliest TxTime First) qdisc under cbs, so there's another level of control for time-sensitive traffic. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25rds: send: Fix dead code in rds_sendmsgGustavo A. R. Silva
Currently, code at label *out* is unreachable. Fix this by updating variable *ret* with -EINVAL, so the jump to *out* can be properly executed instead of directly returning from function. Addresses-Coverity-ID: 1472059 ("Structurally dead code") Fixes: 1e2b44e78eea ("rds: Enable RDS IPv6 support") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25net/rds/Kconfig: RDS should depend on IPV6Anders Roxell
Build error, implicit declaration of function __inet6_ehashfn shows up When RDS is enabled but not IPV6. net/rds/connection.c: In function ‘rds_conn_bucket’: net/rds/connection.c:67:9: error: implicit declaration of function ‘__inet6_ehashfn’; did you mean ‘__inet_ehashfn’? [-Werror=implicit-function-declaration] hash = __inet6_ehashfn(lhash, 0, fhash, 0, rds_hash_secret); ^~~~~~~~~~~~~~~ __inet_ehashfn Current code adds IPV6 as a depends on in config RDS. Fixes: eee2fa6ab322 ("rds: Changing IP address internal representation to struct in6_addr") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25net/smc: improve delete link processingKarsten Graul
Send an orderly DELETE LINK request before termination of a link group, add support for client triggered DELETE LINK processing. And send a disorderly DELETE LINK before module is unloaded. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25net/smc: provide fallback reason codeKarsten Graul
Remember the fallback reason code and the peer diagnosis code for smc sockets, and provide them in smc_diag.c to the netlink interface. And add more detailed reason codes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25net/smc: use correct vlan gid of RoCE deviceUrsula Braun
SMC code uses the base gid for VLAN traffic. The gids exchanged in the CLC handshake and the gid index used for the QP have to switch from the base gid to the appropriate vlan gid. When searching for a matching IB device port for a certain vlan device, it does not make sense to return an IB device port, which is not enabled for the used vlan_id. Add another check whether a vlan gid exists for a certain IB device port. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25net/smc: fewer parameters for smc_llc_send_confirm_link()Ursula Braun
Link confirmation will always be sent across the new link being confirmed. This allows to shrink the parameter list. No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26xfrm: Return detailed errors from xfrmi_newlinkBenedict Wong
Currently all failure modes of xfrm interface creation return EEXIST. This change improves the granularity of errnos provided by also returning ENODEV or EINVAL if failures happen in looking up the underlying interface, or a required parameter is not provided. This change has been tested against the Android Kernel Networking Tests, with additional xfrmi_newlink tests here: https://android-review.googlesource.com/c/kernel/tests/+/715755 Signed-off-by: Benedict Wong <benedictwong@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-07-26xfrm: fix 'passing zero to ERR_PTR()' warningYueHaibing
Fix a static code checker warning: net/xfrm/xfrm_policy.c:1836 xfrm_resolve_and_create_bundle() warn: passing zero to 'ERR_PTR' xfrm_tmpl_resolve return 0 just means no xdst found, return NULL instead of passing zero to ERR_PTR. Fixes: d809ec895505 ("xfrm: do not assume that template resolving always returns xfrms") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-07-26xsk: fix poll/POLLIN premature returnsBjörn Töpel
Polling for the ingress queues relies on reading the producer/consumer pointers of the Rx queue. Prior this commit, a cached consumer pointer could be used, instead of the actual consumer pointer and therefore report POLLIN prematurely. This patch makes sure that the non-cached consumer pointer is used instead. Reported-by: Qi Zhang <qi.z.zhang@intel.com> Tested-by: Qi Zhang <qi.z.zhang@intel.com> Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-25net: igmp: make function __ip_mc_inc_group() staticWei Yongjun
Fixes the following sparse warnings: net/ipv4/igmp.c:1391:6: warning: symbol '__ip_mc_inc_group' was not declared. Should it be static? Fixes: 6e2059b53f98 ("ipv4/igmp: init group mode as INCLUDE when join source group") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25tcp: make function tcp_retransmit_stamp() staticWei Yongjun
Fixes the following sparse warnings: net/ipv4/tcp_timer.c:25:5: warning: symbol 'tcp_retransmit_stamp' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25net/sched: cls_flower: Use correct inline function for assignment of vlan tpidJianbo Liu
This fixes the following sparse warning: net/sched/cls_flower.c:1356:36: warning: incorrect type in argument 3 (different base types) net/sched/cls_flower.c:1356:36: expected unsigned short [unsigned] [usertype] value net/sched/cls_flower.c:1356:36: got restricted __be16 [usertype] vlan_tpid Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25tcp: ack immediately when a cwr packet arrivesLawrence Brakmo
We observed high 99 and 99.9% latencies when doing RPCs with DCTCP. The problem is triggered when the last packet of a request arrives CE marked. The reply will carry the ECE mark causing TCP to shrink its cwnd to 1 (because there are no packets in flight). When the 1st packet of the next request arrives, the ACK was sometimes delayed even though it is CWR marked, adding up to 40ms to the RPC latency. This patch insures that CWR marked data packets arriving will be acked immediately. Packetdrill script to reproduce the problem: 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> 0.110 < [ect0] . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 0.200 < [ect0] . 1:1001(1000) ack 1 win 257 0.200 > [ect01] . 1:1(0) ack 1001 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 1:2(1) ack 1001 0.200 < [ect0] . 1001:2001(1000) ack 2 win 257 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 2:3(1) ack 2001 0.200 < [ect0] . 2001:3001(1000) ack 3 win 257 0.200 < [ect0] . 3001:4001(1000) ack 3 win 257 0.200 > [ect01] . 3:3(0) ack 4001 0.210 < [ce] P. 4001:4501(500) ack 3 win 257 +0.001 read(4, ..., 4500) = 4500 +0 write(4, ..., 1) = 1 +0 > [ect01] PE. 3:4(1) ack 4501 +0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257 // Previously the ACK sequence below would be 4501, causing a long RTO +0.040~+0.045 > [ect01] . 4:4(0) ack 5501 // delayed ack +0.311 < [ect0] . 5501:6501(1000) ack 4 win 257 // More data +0 > [ect01] . 4:4(0) ack 6501 // now acks everything +0.500 < F. 9501:9501(0) ack 4 win 257 Modified based on comments by Neal Cardwell <ncardwell@google.com> Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-07-24ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pullWillem de Bruijn
Syzbot reported a read beyond the end of the skb head when returning IPV6_ORIGDSTADDR: BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242 CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125 kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219 kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261 copy_to_user include/linux/uaccess.h:184 [inline] put_cmsg+0x5ef/0x860 net/core/scm.c:242 ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719 ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733 rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521 [..] This logic and its ipv4 counterpart read the destination port from the packet at skb_transport_offset(skb) + 4. With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a packet that stores headers exactly up to skb_transport_offset(skb) in the head and the remainder in a frag. Call pskb_may_pull before accessing the pointer to ensure that it lies in skb head. Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24net/sched: add skbprio schedulerNishanth Devarajan
Skbprio (SKB Priority Queue) is a queueing discipline that prioritizes packets according to their skb->priority field. Under congestion, already-enqueued lower priority packets will be dropped to make space available for higher priority packets. Skbprio was conceived as a solution for denial-of-service defenses that need to route packets with different priorities as a means to overcome DoS attacks. v5 *Do not reference qdisc_dev(sch)->tx_queue_len for setting limit. Instead set default sch->limit to 64. v4 *Drop Documentation/networking/sch_skbprio.txt doc file to move it to tc man page for Skbprio, in iproute2. v3 *Drop max_limit parameter in struct skbprio_sched_data and instead use sch->limit. *Reference qdisc_dev(sch)->tx_queue_len only once, during initialisation for qdisc (previously being referenced every time qdisc changes). *Move qdisc's detailed description from in-code to Documentation/networking. *When qdisc is saturated, enqueue incoming packet first before dequeueing lowest priority packet in queue - improves usage of call stack registers. *Introduce and use overlimit stat to keep track of number of dropped packets. v2 *Use skb->priority field rather than DS field. Rename queueing discipline as SKB Priority Queue (previously Gatekeeper Priority Queue). *Queueing discipline is made classful to expose Skbprio's internal priority queues. Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com> Reviewed-by: Sachin Paryani <sachin.paryani@gmail.com> Reviewed-by: Cody Doucette <doucette@bu.edu> Reviewed-by: Michel Machado <michel@digirati.com.br> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24net: remove blank lines at end of fileStephen Hemminger
Several files have extra line at end of file. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24l2tp: remove trailing newlineStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>