summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-08-01net: sched: change name of zombie chain to "held_by_acts_only"Jiri Pirko
As mentioned by Cong and Jakub during the review process, it is a bit odd to sometimes (act flow) create a new chain which would be immediately a "zombie". So just rename it to "held_by_acts_only". Signed-off-by: Jiri Pirko <jiri@mellanox.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01rds: remove redundant variable 'rds_ibdev'YueHaibing
Variable 'rds_ibdev' is being assigned but never used, so can be removed. fix this clang warning: net/rds/ib_send.c:762:24: warning: variable ‘rds_ibdev’ set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01strparser: remove redundant variable 'rd_desc'YueHaibing
Variable 'rd_desc' is being assigned but never used, so can be removed. fix this clang warning: net/strparser/strparser.c:411:20: warning: variable ‘rd_desc’ set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01ip_gre: remove redundant variables t_hlenYueHaibing
After commit ffc2b6ee4174 ("ip_gre: fix IFLA_MTU ignored on NEWLINK") variable t_hlen is assigned values that are never read, hence they are redundant and can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: remove set but not used variable 'skb_size'Wei Yongjun
Fixes gcc '-Wunused-but-set-variable' warning: net/ipv4/tcp_output.c: In function 'tcp_collapse_retrans': net/ipv4/tcp_output.c:2700:6: warning: variable 'skb_size' set but not used [-Wunused-but-set-variable] int skb_size, next_skb_size; ^ Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add stat of data packet reordering eventsWei Wang
Introduce a new TCP stats to record the number of reordering events seen and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Application can use this stats to track the frequency of the reordering events in addition to the existing reordering stats which tracks the magnitude of the latest reordering event. Note: this new stats tracks reordering events triggered by ACKs, which could often be fewer than the actual number of packets being delivered out-of-order. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add dsack blocks received statsWei Wang
Introduce a new TCP stat to record the number of DSACK blocks received (RFC4989 tcpEStatsStackDSACKDups) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add data bytes retransmitted statsWei Wang
Introduce a new TCP stat to record the number of bytes retransmitted (RFC4898 tcpEStatsPerfOctetsRetrans) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add data bytes sent statsWei Wang
Introduce a new TCP stat to record the number of bytes sent (RFC4898 tcpEStatsPerfHCDataOctetsOut) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add a helper to calculate size of opt_statsWei Wang
This is to refactor the calculation of the size of opt_stats to a helper function to make the code cleaner and easier for later changes. Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: dsa: Do not suspend/resume closed slave_devFlorian Fainelli
If a DSA slave network device was previously disabled, there is no need to suspend or resume it. Fixes: 2446254915a7 ("net: dsa: allow switch drivers to implement suspend/resume hooks") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: ipv4: Notify about changes to ip_forward_update_priorityPetr Machata
Drivers may make offloading decision based on whether ip_forward_update_priority is enabled or not. Therefore distribute netevent notifications to give them a chance to react to a change. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: ipv4: Control SKB reprioritization after forwardingPetr Machata
After IPv4 packets are forwarded, the priority of the corresponding SKB is updated according to the TOS field of IPv4 header. This overrides any prioritization done earlier by e.g. an skbedit action or ingress-qos-map defined at a vlan device. Such overriding may not always be desirable. Even if the packet ends up being routed, which implies this is an L3 network node, an administrator may wish to preserve whatever prioritization was done earlier on in the pipeline. Therefore introduce a sysctl that controls this behavior. Keep the default value at 1 to maintain backward-compatible behavior. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01netlink: Fix spectre v1 gadget in netlink_create()Jeremy Cline
'protocol' is a user-controlled value, so sanitize it after the bounds check to avoid using it for speculative out-of-bounds access to arrays indexed by it. This addresses the following accesses detected with the help of smatch: * net/netlink/af_netlink.c:654 __netlink_create() warn: potential spectre issue 'nlk_cb_mutex_keys' [w] * net/netlink/af_netlink.c:654 __netlink_create() warn: potential spectre issue 'nlk_cb_mutex_key_strings' [w] * net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre issue 'nl_table' [w] (local cap) Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jeremy Cline <jcline@redhat.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: add helpers checking if socket can be bound to nonlocal addressVincent Bernat
The construction "net->ipv4.sysctl_ip_nonlocal_bind || inet->freebind || inet->transparent" is present three times and its IPv6 counterpart is also present three times. We introduce two small helpers to characterize these tests uniformly. Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net/tls: Use kmemdup to simplify the codezhong jiang
Kmemdup is better than kmalloc+memcpy. So replace them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net/tipc: remove redundant variables 'tn' and 'oport'Colin Ian King
Variables 'tn' and 'oport' are being assigned but are never used hence they are redundant and can be removed. Cleans up clang warnings: warning: variable 'oport' set but not used [-Wunused-but-set-variable] warning: variable 'tn' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01rds: Remove IPv6 dependencyKa-Cheong Poon
This patch removes the IPv6 dependency from RDS. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01rds: rds_ib_recv_alloc_cache() should call alloc_percpu_gfp() insteadKa-Cheong Poon
Currently, rds_ib_conn_alloc() calls rds_ib_recv_alloc_caches() without passing along the gfp_t flag. But rds_ib_recv_alloc_caches() and rds_ib_recv_alloc_cache() should take a gfp_t parameter so that rds_ib_recv_alloc_cache() can call alloc_percpu_gfp() using the correct flag instead of calling alloc_percpu(). Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01rxrpc: Transmit more ACKs during data receptionDavid Howells
Immediately flush any outstanding ACK on entry to rxrpc_recvmsg_data() - which transfers data to the target buffers - if we previously had an Rx underrun (ie. we returned -EAGAIN because we ran out of received data). This lets the server know what we've managed to receive something. Also flush any outstanding ACK after calling the function if it hit -EAGAIN to let the server know we processed some data. It might be better to send more ACKs, possibly on a time-based scheme, but that needs some more consideration. With this and some additional AFS patches, it is possible to get large unencrypted O_DIRECT reads to be almost as fast as NFS over TCP. It looks like it might be theoretically possible to improve performance yet more for a server running a single operation as investigation of packet timestamps indicates that the server keeps stalling. The issue appears to be that rxrpc runs in to trouble with ACK packets getting batched together (up to ~32 at a time) somewhere between the IP transmit queue on the client and the ethernet receive queue on the server. However, this case isn't too much of a worry as even a lightly loaded server should be receiving sufficient packet flux to flush the ACK packets to the UDP socket. Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-01rxrpc: Propose, but don't immediately transmit, the final ACK for a callDavid Howells
The final ACK that closes out an rxrpc call needs to be transmitted by the client unless we're going to follow up with a DATA packet for a new call on the same channel (which implicitly ACK's the previous call, thereby saving an ACK). Currently, we don't do that, so if no follow on call is immediately forthcoming, the server will resend the last DATA packet - at which point rxrpc_conn_retransmit_call() will be triggered and will (re)send the final ACK. But the server has to hold on to the last packet until the ACK is received, thereby holding up its resources. Fix the client side to propose a delayed final ACK, to be transmitted after a short delay, assuming the call isn't superseded by a new one. Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-01rxrpc: Increase the size of a call's Rx windowDavid Howells
Increase the size of a call's Rx window from 32 to 63 - ie. one less than the size of the ring buffer. This makes large data transfers perform better when the Tx window on the other side is around 64 (as is the case with Auristor's YFS fileserver). If the server window size is ~32 or smaller, this should make no difference. Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-01rxrpc: Trace socket notificationDavid Howells
Trace notifications from the softirq side of the socket to the process-context side. Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-01rxrpc: Trace packet transmissionDavid Howells
Trace successful packet transmission (kernel_sendmsg() succeeded, that is) in AF_RXRPC. We can share the enum that defines the transmission points with the trace_rxrpc_tx_fail() tracepoint, so rename its constants to be applicable to both. Also, save the internal call->debug_id in the rxrpc_channel struct so that it can be used in retransmission trace lines. Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-01rxrpc: Fix the trace for terminal ACK (re)transmissionDavid Howells
Fix the trace for terminal ACK (re)transmission to put in the right parameters. Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-01rxrpc: Show some more information through /proc filesDavid Howells
Show the four current call IDs in /proc/net/rxrpc/conns. Show the current packet Rx serial number in /proc/net/rxrpc/calls. Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-01rxrpc: Display call expect-receive-by timeout in procDavid Howells
Display in /proc/net/rxrpc/calls the timeout by which a call next expects to receive a packet. This makes it easier to debug timeout issues. Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-01rxrpc: remove redundant variables 'sp' and 'did_discard'YueHaibing
Variables 'sp' and 'did_discard' are being assigned, but are never used, hence they are redundant and can be removed. fix following warning: net/rxrpc/call_event.c:165:25: warning: variable 'sp' set but not used [-Wunused-but-set-variable] net/rxrpc/conn_client.c:1054:7: warning: variable 'did_discard' set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-01Bluetooth: hidp: buffer overflow in hidp_process_reportMark Salyzyn
CVE-2018-9363 The buffer length is unsigned at all layers, but gets cast to int and checked in hidp_process_report and can lead to a buffer overflow. Switch len parameter to unsigned int to resolve issue. This affects 3.18 and newer kernels. Signed-off-by: Mark Salyzyn <salyzyn@android.com> Fixes: a4b1b5877b514b276f0f31efe02388a9c2836728 ("HID: Bluetooth: hidp: make sure input buffers are big enough") Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Kees Cook <keescook@chromium.org> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: security@kernel.org Cc: kernel-team@android.com Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-07-31ipv4: frags: handle possible skb truesize changeEric Dumazet
ip_frag_queue() might call pskb_pull() on one skb that is already in the fragment queue. We need to take care of possible truesize change, or we might have an imbalance of the netns frags memory usage. IPv6 is immune to this bug, because RFC5722, Section 4, amended by Errata ID 3089 states : When reassembling an IPv6 datagram, if one or more its constituent fragments is determined to be an overlapping fragment, the entire datagram (and any constituent fragments) MUST be silently discarded. Fixes: 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31inet: frag: enforce memory limits earlierEric Dumazet
We currently check current frags memory usage only when a new frag queue is created. This allows attackers to first consume the memory budget (default : 4 MB) creating thousands of frag queues, then sending tiny skbs to exceed high_thresh limit by 2 to 3 order of magnitude. Note that before commit 648700f76b03 ("inet: frags: use rhashtables for reassembly units"), work queue could be starved under DOS, getting no cpu cycles. After commit 648700f76b03, only the per frag queue timer can eventually remove an incomplete frag queue and its skbs. Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jann Horn <jannh@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Peter Oskolkov <posk@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31net: remove bogus RCU annotations on socket.wqChristoph Hellwig
We never use RCU protection for it, just a lot of cargo-cult rcu_deference_protects calls. Note that we do keep the kfree_rcu call for it, as the references through struct sock are RCU protected and thus might require a grace period before freeing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31sunrpc: whitespace fixesStephen Hemminger
Remove trailing whitespace and blank line at EOF Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31NFSv4 client live hangs after live data migration recoveryBill Baker
After a live data migration event at the NFS server, the client may send I/O requests to the wrong server, causing a live hang due to repeated recovery events. On the wire, this will appear as an I/O request failing with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly. NFS4ERR_BADSSESSION is returned because the session ID being used was issued by the other server and is not valid at the old server. The failure is caused by async worker threads having cached the transport (xprt) in the rpc_task structure. After the migration recovery completes, the task is redispatched and the task resends the request to the wrong server based on the old value still present in tk_xprt. The solution is to recompute the tk_xprt field of the rpc_task structure so that the request goes to the correct server. Signed-off-by: Bill Baker <bill.baker@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Helen Chao <helen.chao@oracle.com> Fixes: fb43d17210ba ("SUNRPC: Use the multipath iterator to assign a ...") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31sunrpc: kstrtoul() can also return -ERANGEDan Carpenter
Smatch complains that "num" can be uninitialized when kstrtoul() returns -ERANGE. It's true enough, but basically harmless in this case. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31sunrpc: Change rpc_print_iostats to rpc_clnt_show_stats and handle rpc_clnt ↵Dave Wysochanski
clones The existing rpc_print_iostats has a few shortcomings. First, the naming is not consistent with other functions in the kernel that display stats. Second, it is really displaying stats for an rpc_clnt structure as it displays both xprt stats and per-op stats. Third, it does not handle rpc_clnt clones, which is important for the one in-kernel tree caller of this function, the NFS client's nfs_show_stats function. Fix all of the above by renaming the rpc_print_iostats to rpc_clnt_show_stats and looping through any rpc_clnt clones via cl_parent. Once this interface is fixed, this addresses a problem with NFSv4. Before this patch, the /proc/self/mountstats always showed incorrect counts for NFSv4 lease and session related opcodes such as SEQUENCE, RENEW, SETCLIENTID, CREATE_SESSION, etc. These counts were always 0 even though many ops would go over the wire. The reason for this is there are multiple rpc_clnt structures allocated for any given NFSv4 mount, and inside nfs_show_stats() we callled into rpc_print_iostats() which only handled one of them, nfs_server->client. Fix these counts by calling sunrpc's new rpc_clnt_show_stats() function, which handles cloned rpc_clnt structs and prints the stats together. Note that one side-effect of the above is that multiple mounts from the same NFS server will show identical counts in the above ops due to the fact the one rpc_clnt (representing the NFSv4 client state) is shared across mounts. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31xsk: don't allow umem replace at stack levelJakub Kicinski
Currently drivers have to check if they already have a umem installed for a given queue and return an error if so. Make better use of XDP_QUERY_XSK_UMEM and move this functionality to the core. We need to keep rtnl across the calls now. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31xsk: refactor xdp_umem_assign_dev()Jakub Kicinski
Return early and only take the ref on dev once there is no possibility of failing. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31bpf: Support bpf_get_socket_cookie in more prog typesAndrey Ignatov
bpf_get_socket_cookie() helper can be used to identify skb that correspond to the same socket. Though socket cookie can be useful in many other use-cases where socket is available in program context. Specifically BPF_PROG_TYPE_CGROUP_SOCK_ADDR and BPF_PROG_TYPE_SOCK_OPS programs can benefit from it so that one of them can augment a value in a map prepared earlier by other program for the same socket. The patch adds support to call bpf_get_socket_cookie() from BPF_PROG_TYPE_CGROUP_SOCK_ADDR and BPF_PROG_TYPE_SOCK_OPS. It doesn't introduce new helpers. Instead it reuses same helper name bpf_get_socket_cookie() but adds support to this helper to accept `struct bpf_sock_addr` and `struct bpf_sock_ops`. Documentation in bpf.h is changed in a way that should not break automatic generation of markdown. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-31lwt_bpf: remove unnecessary rcu_read_lock in run_lwt_bpfTaehee Yoo
run_lwt_bpf is called by bpf_{input/output/xmit}. These functions are already protected by rcu_read_lock. because lwtunnel_{input/output/xmit} holds rcu_read_lock and then calls bpf_{input/output/xmit}. So that rcu_read_lock in the run_lwt_bpf is unnecessary. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-31bpf: add End.DT6 action to bpf_lwt_seg6_action helperMathieu Xhonneux
The seg6local LWT provides the End.DT6 action, which allows to decapsulate an outer IPv6 header containing a Segment Routing Header (SRH), full specification is available here: https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-05 This patch adds this action now to the seg6local BPF interface. Since it is not mandatory that the inner IPv6 header also contains a SRH, seg6_bpf_srh_state has been extended with a pointer to a possible SRH of the outermost IPv6 header. This helps assessing if the validation must be triggered or not, and avoids some calls to ipv6_find_hdr. v3: s/1/true, s/0/false for boolean values v2: - changed true/false -> 1/0 - preempt_enable no longer called in first conditional block Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-30RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments constBart Van Assche
Since neither ib_post_send() nor ib_post_recv() modify the data structure their second argument points at, declare that argument const. This change makes it necessary to declare the 'bad_wr' argument const too and also to modify all ULPs that call ib_post_send(), ib_post_recv() or ib_post_srq_recv(). This patch does not change any functionality but makes it possible for the compiler to verify whether the ib_post_(send|recv|srq_recv) really do not modify the posted work request. To make this possible, only one cast had to be introduce that casts away constness, namely in rpcrdma_post_recvs(). The only way I can think of to avoid that cast is to introduce an additional loop in that function or to change the data type of bad_wr from struct ib_recv_wr ** into int (an index that refers to an element in the work request list). However, both approaches would require even more extensive changes than this patch. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-31net: xsk: don't return frames via the allocator on errorJakub Kicinski
xdp_return_buff() is used when frame has been successfully handled (transmitted) or if an error occurred during delayed processing and there is no way to report it back to xdp_do_redirect(). In case of __xsk_rcv_zc() error is propagated all the way back to the driver, so there is no need to call xdp_return_buff(). Driver will recycle the frame anyway after seeing that error happened. Fixes: 173d3adb6f43 ("xsk: add zero-copy support for Rx") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-30netlink: Don't shift with UB on nlk->ngroupsDmitry Safonov
On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in hang during boot. Check for 0 ngroups and use (unsigned long long) as a type to shift. Fixes: 7acf9d4237c4 ("netlink: Do not subscribe to non-existent groups"). Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30sunrpc: Add _add_rpc_iostats() to add rpc_iostats metricsDave Wysochanski
Add a helper function to add the metrics in two rpc_iostats structures. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-30sunrpc: add _print_rpc_iostats() to output metrics for one RPC opDave Wysochanski
Refactor the output of the metrics for one RPC op into an internal function. No functional change. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-30net/sunrpc: Make rpc_auth_create_args a constSargun Dhillon
This turns rpc_auth_create_args into a const as it gets passed through the auth stack. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-30net/ipv6: fix metrics leakSabrina Dubroca
Since commit d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info"), ipv6 metrics are shared and refcounted. rt6_set_from() assigns the rt->from pointer and increases the refcount on from's metrics. This reference is never released. Introduce the fib6_metrics_release() helper and use it to release the metrics. Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30fib_rules: NULL check before kfree is not neededYueHaibing
kfree(NULL) is safe,so this removes NULL check before freeing the mem Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net/tls: Use socket data_ready callback on record availabilityVakul Garg
On receipt of a complete tls record, use socket's saved data_ready callback instead of state_change callback. In function tls_queue(), the TLS record is queued in encrypted state. But the decryption happen inline when tls_sw_recvmsg() or tls_sw_splice_read() get invoked. So it should be ok to notify the waiting context about the availability of data as soon as we could collect a full TLS record. For new data availability notification, sk_data_ready callback is more appropriate. It points to sock_def_readable() which wakes up specifically for EPOLLIN event. This is in contrast to the socket callback sk_state_change which points to sock_def_wakeup() which issues a wakeup unconditionally (without event mask). Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>