summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-05-18ipv6: lift copy_from_user out of ipv6_route_ioctlChristoph Hellwig
Prepare for better compat ioctl handling by moving the user copy out of ipv6_route_ioctl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-18SUNRPC: Restructure svc_tcp_recv_record()Chuck Lever
Refactor: svc_recvfrom() is going to be converted to read into bio_vecs in a moment. Unhook the only other caller, svc_tcp_recv_record(), which just wants to read the 4-byte stream record marker into a kvec. While we're in the area, streamline this helper by straight-lining the hot path, replace dprintk call sites with tracepoints, and reduce the number of atomic bit operations in this path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18SUNRPC: Rename svc_sock::sk_reclenChuck Lever
Clean up. I find the name of the svc_sock::sk_reclen field confusing, so I've changed it to better reflect its function. This field is not read directly to get the record length. Rather, it is a buffer containing a record marker that needs to be decoded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18SUNRPC: Trace server-side rpcbind registration eventsChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18SUNRPC: Replace dprintk call sites in TCP state change calloutsChuck Lever
Report TCP socket state changes and accept failures via tracepoints, replacing dprintk() call sites. No tracepoint is added in svc_tcp_listen_data_ready. There's no information available there that isn't also reported by the svcsock_new_socket and the accept failure tracepoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18SUNRPC: Add more svcsock tracepointsChuck Lever
In addition to tracing recently-updated socket sendto events, this commit adds a trace event class that can be used for additional svcsock-related tracepoints in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18SUNRPC: Remove "#include <trace/events/skb.h>"Chuck Lever
Clean up: Commit 850cbaddb52d ("udp: use it's own memory accounting schema") removed the last skb-related tracepoint from svcsock.c, so it is no longer necessary to include trace/events/skb.h. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18SUNRPC: Trace a few more generic svc_xprt eventsChuck Lever
In lieu of dprintks or tracepoints in each individual transport implementation, introduce tracepoints in the generic part of the RPC layer. These typically fire for connection lifetime events, so shouldn't contribute a lot of noise. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18SUNRPC: Tracepoint to record errors in svc_xpo_create()Chuck Lever
Capture transport creation failures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18svcrdma: Add tracepoints to report ->xpo_accept failuresChuck Lever
Failure to accept a connection is typically due to a problem specific to a transport type. Also, ->xpo_accept returns NULL on error rather than reporting a specific problem. So, add failure-specific tracepoints in svc_rdma_accept(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18svcrdma: Displayed remote IP address should match stored addressChuck Lever
Clean up: After commit 1e091c3bbf51 ("svcrdma: Ignore source port when computing DRC hash"), the IP address stored in xpt_remote always has a port number of zero. Thus, there's no need to display the port number when displaying the IP address of a remote NFS/RDMA client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18svcrdma: Rename tracepoints that record header decoding errorsChuck Lever
Clean up: Use a consistent naming convention so that these trace points can be enabled quickly via a glob. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18svcrdma: Remove backchannel dprintk call sitesChuck Lever
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18svcrdma: Fix backchannel return codeChuck Lever
Way back when I was writing the RPC/RDMA server-side backchannel code, I misread the TCP backchannel reply handler logic. When svc_tcp_recvfrom() successfully receives a backchannel reply, it does not return -EAGAIN. It sets XPT_DATA and returns zero. Update svc_rdma_recvfrom() to return zero. Here, XPT_DATA doesn't need to be set again: it is set whenever a new message is received, behind a spin lock in a single threaded context. Also, if handling the cb reply is not successful, the message is simply dropped. There's no special message framing to deal with as there is in the TCP case. Now that the handle_bc_reply() return value is ignored, I've removed the dprintk call sites in the error exit of handle_bc_reply() in favor of trace points in other areas that already report the error cases. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18svcrdma: trace undersized Write chunksChuck Lever
Clean up: Replace a dprintk call site. This is the last remaining dprintk call site in svc_rdma_rw.c, so remove dprintk infrastructure as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18svcrdma: Trace page overruns when constructing RDMA ReadsChuck Lever
Clean up: Replace a dprintk call site with a tracepoint. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18svcrdma: Clean up handling of get_rw_ctx errorsChuck Lever
Clean up: Replace two dprintk call sites with a tracepoint. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18svcrdma: Clean up the tracing for rw_ctx_init errorsChuck Lever
- De-duplicate code - Rename the tracepoint with "_err" to allow enabling via glob - Report the sg_cnt for the failing rw_ctx - Fix a dumb signage issue Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18SUNRPC: Move xpt_mutex into socket xpo_sendto methodsChuck Lever
It appears that the RPC/RDMA transport does not need serialization of calls to its xpo_sendto method. Move the mutex into the socket methods that still need that serialization. Tail latencies are unambiguously better with this patch applied. fio randrw 8KB 70/30 on NFSv3, smaller numbers are better: clat percentiles (usec): With xpt_mutex: r | 99.99th=[ 8848] w | 99.99th=[ 9634] Without xpt_mutex: r | 99.99th=[ 8586] w | 99.99th=[ 8979] Serializing the construction of RPC/RDMA transport headers is not really necessary at this point, because the Linux NFS server implementation never changes its credit grant on a connection. If that should change, then svc_rdma_sendto will need to serialize access to the transport's credit grant fields. Reported-by: kbuild test robot <lkp@intel.com> [ cel: fix uninitialized variable warning ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18Bluetooth: Add SCO fallback for invalid LMP parameters errorHsin-Yu Chao
Bluetooth PTS test case HFP/AG/ACC/BI-12-I accepts SCO connection with invalid parameter at the first SCO request expecting AG to attempt another SCO request with the use of "safe settings" for given codec, base on section 5.7.1.2 of HFP 1.7 specification. This patch addresses it by adding "Invalid LMP Parameters" (0x1e) to the SCO fallback case. Verified with below log: < HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17 Handle: 256 Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x0380 3-EV3 may not be used 2-EV5 may not be used 3-EV5 may not be used > HCI Event: Command Status (0x0f) plen 4 Setup Synchronous Connection (0x01|0x0028) ncmd 1 Status: Success (0x00) > HCI Event: Number of Completed Packets (0x13) plen 5 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Max Slots Change (0x1b) plen 3 Handle: 256 Max slots: 1 > HCI Event: Synchronous Connect Complete (0x2c) plen 17 Status: Invalid LMP Parameters / Invalid LL Parameters (0x1e) Handle: 0 Address: 00:1B:DC:F2:21:59 (OUI 00-1B-DC) Link type: eSCO (0x02) Transmission interval: 0x00 Retransmission window: 0x02 RX packet length: 0 TX packet length: 0 Air mode: Transparent (0x03) < HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17 Handle: 256 Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 8 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x03c8 EV3 may be used 2-EV3 may not be used 3-EV3 may not be used 2-EV5 may not be used 3-EV5 may not be used > HCI Event: Command Status (0x0f) plen 4 Setup Synchronous Connection (0x01|0x0028) ncmd 1 Status: Success (0x00) > HCI Event: Max Slots Change (0x1b) plen 3 Handle: 256 Max slots: 5 > HCI Event: Max Slots Change (0x1b) plen 3 Handle: 256 Max slots: 1 > HCI Event: Synchronous Connect Complete (0x2c) plen 17 Status: Success (0x00) Handle: 257 Address: 00:1B:DC:F2:21:59 (OUI 00-1B-DC) Link type: eSCO (0x02) Transmission interval: 0x06 Retransmission window: 0x04 RX packet length: 30 TX packet length: 30 Air mode: Transparent (0x03) Signed-off-by: Hsin-Yu Chao <hychao@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-05-18Bluetooth: Fix for GAP/SEC/SEM/BI-10-CŁukasz Rymanowski
Security Mode 1 level 4, force us to use have key size 16 octects long. This patch adds check for that. This is required for the qualification test GAP/SEC/SEM/BI-10-C Logs from test when ATT is configured with sec level BT_SECURITY_FIPS < ACL Data TX: Handle 3585 flags 0x00 dlen 11 #28 [hci0] 3.785965 SMP: Pairing Request (0x01) len 6 IO capability: DisplayYesNo (0x01) OOB data: Authentication data not present (0x00) Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d) Max encryption key size: 16 Initiator key distribution: EncKey Sign (0x05) Responder key distribution: EncKey IdKey Sign (0x07) > ACL Data RX: Handle 3585 flags 0x02 dlen 11 #35 [hci0] 3.883020 SMP: Pairing Response (0x02) len 6 IO capability: DisplayYesNo (0x01) OOB data: Authentication data not present (0x00) Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d) Max encryption key size: 7 Initiator key distribution: EncKey Sign (0x05) Responder key distribution: EncKey IdKey Sign (0x07) < ACL Data TX: Handle 3585 flags 0x00 dlen 6 #36 [hci0] 3.883136 SMP: Pairing Failed (0x05) len 1 Reason: Encryption key size (0x06) Signed-off-by: Łukasz Rymanowski <lukasz.rymanowski@codecoup.pl> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-05-18esp4: improve xfrm4_beet_gso_segment() to be more readableXin Long
This patch is to improve the code to make xfrm4_beet_gso_segment() more readable, and keep consistent with xfrm6_beet_gso_segment(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-05-17rds: convert get_user_pages() --> pin_user_pages()John Hubbard
This code was using get_user_pages_fast(), in a "Case 2" scenario (DMA/RDMA), using the categorization from [1]. That means that it's time to convert the get_user_pages_fast() + put_page() calls to pin_user_pages_fast() + unpin_user_pages() calls. There is some helpful background in [2]: basically, this is a small part of fixing a long-standing disconnect between pinning pages, and file systems' use of those pages. [1] Documentation/core-api/pin_user_pages.rst [2] "Explicit pinning of user-space pages": https://lwn.net/Articles/807108/ Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: rds-devel@oss.oracle.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17net: allow __skb_ext_alloc to sleepFlorian Westphal
mptcp calls this from the transmit side, from process context. Allow a sleeping allocation instead of unconditional GFP_ATOMIC. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: remove inner wait loop from mptcp_sendmsg_fragFlorian Westphal
previous patches made sure we only call into this function when these prerequisites are met, so no need to wait on the subflow socket anymore. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/7 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: fill skb page frag cache outside of mptcp_sendmsg_fragFlorian Westphal
The mptcp_sendmsg_frag helper contains a loop that will wait on the subflow sk. It seems preferrable to only wait in mptcp_sendmsg() when blocking io is requested. mptcp_sendmsg already has such a wait loop that is used when no subflow socket is available for transmission. This is another preparation patch that makes sure we call mptcp_sendmsg_frag only if the page frag cache has been refilled. Followup patch will remove the wait loop from mptcp_sendmsg_frag(). The retransmit worker doesn't need to do this refill as it won't transmit new mptcp-level data. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: fill skb extension cache outside of mptcp_sendmsg_fragFlorian Westphal
The mptcp_sendmsg_frag helper contains a loop that will wait on the subflow sk. It seems preferrable to only wait in mptcp_sendmsg() when blocking io is requested. mptcp_sendmsg already has such a wait loop that is used when no subflow socket is available for transmission. This is a preparation patch that makes sure we call mptcp_sendmsg_frag only if a skb extension has been allocated. Moreover, such allocation currently uses GFP_ATOMIC while it could use sleeping allocation instead. Followup patches will remove the wait loop from mptcp_sendmsg_frag() and will allow to do a sleeping allocation for the extension. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: avoid blocking in tcp_sendpagesFlorian Westphal
The transmit loop continues to xmit new data until an error is returned or all data was transmitted. For the blocking i/o case, this means that tcp_sendpages() may block on the subflow until more space becomes available, i.e. we end up sleeping with the mptcp socket lock held. Instead we should check if a different subflow is ready to be used. This restarts the subflow sk lookup when the tx operation succeeded and the tcp subflow can't accept more data or if tcp_sendpages indicates -EAGAIN on a blocking mptcp socket. In that case we also need to set the NOSPACE bit to make sure we get notified once memory becomes available. In case all subflows are busy, the existing logic will wait until a subflow is ready, releasing the mptcp socket lock while doing so. The mptcp worker already sets DONTWAIT, so no need to make changes there. v2: * set NOSPACE bit * add a comment to clarify that mptcp-sk sndbuf limits need to be checked as well. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: break and restart in case mptcp sndbuf is fullFlorian Westphal
Its not enough to check for available tcp send space. We also hold on to transmitted data for mptcp-level retransmits. Right now we will send more and more data if the peer can ack data at the tcp level fast enough, since that frees up tcp send buffer space. But we also need to check that data was acked and reclaimed at the mptcp level. Therefore add needed check in mptcp_sendmsg, flush tcp data and wait until more mptcp snd space becomes available if we are over the limit. Before we wait for more data, also make sure we start the retransmit timer if we ran out of sndbuf space. Otherwise there is a very small chance that we wait forever: * receiver is waiting for data * sender is blocked because mptcp socket buffer is full * at tcp level, all data was acked * mptcp-level snd_una was not updated, because last ack that acknowledged the last data packet carried an older MPTCP-ack. Restarting the retransmit timer avoids this problem: if TCP subflow is idle, data is retransmitted from the RTX queue. New data will make the peer send a new, updated MPTCP-Ack. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: move common nospace-pattern to a helperFlorian Westphal
Paolo noticed that ssk_check_wmem() has same pattern, so add/use common helper for both places. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17net: revert "net: get rid of an signed integer overflow in ip_idents_reserve()"Yuqi Jin
Commit adb03115f459 ("net: get rid of an signed integer overflow in ip_idents_reserve()") used atomic_cmpxchg to replace "atomic_add_return" inside the function "ip_idents_reserve". The reason was to avoid UBSAN warning. However, this change has caused performance degrade and in GCC-8, fno-strict-overflow is now mapped to -fwrapv -fwrapv-pointer and signed integer overflow is now undefined by default at all optimization levels[1]. Moreover, it was a bug in UBSAN vs -fwrapv /-fno-strict-overflow, so Let's revert it safely. [1] https://gcc.gnu.org/gcc-8/changes.html Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiong Wang <jiongwang@huawei.com> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17nexthop: Fix attribute checking for groupsDavid Ahern
For nexthop groups, attributes after NHA_GROUP_TYPE are invalid, but nh_check_attr_group starts checking at NHA_GROUP. The group type defaults to multipath and the NHA_GROUP_TYPE is currently optional so this has slipped through so far. Fix the attribute checking to handle support of new group types. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: ASSOGBA Emery <assogba.emery@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17bpfilter: use 'userprogs' syntax to build bpfilter_umhMasahiro Yamada
The user mode helper should be compiled for the same architecture as the kernel. This Makefile reused the 'hostprogs' syntax by overriding HOSTCC with CC. Use the new syntax 'userprogs' to fix the Makefile mess. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Sam Ravnborg <sam@ravnborg.org>
2020-05-17bpfilter: check if $(CC) can link static libc in KconfigMasahiro Yamada
On Fedora, linking static glibc requires the glibc-static RPM package, which is not part of the glibc-devel package. CONFIG_CC_CAN_LINK does not check the capability of static linking, so you can enable CONFIG_BPFILTER_UMH, then fail to build: HOSTLD net/bpfilter/bpfilter_umh /usr/bin/ld: cannot find -lc collect2: error: ld returned 1 exit status Add CONFIG_CC_CAN_LINK_STATIC, and make CONFIG_BPFILTER_UMH depend on it. Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org>
2020-05-17bpfilter: match bit size of bpfilter_umh to that of the kernelMasahiro Yamada
bpfilter_umh is built for the default machine bit of the compiler, which may not match to the bit size of the kernel. This happens in the scenario below: You can use biarch GCC that defaults to 64-bit for building the 32-bit kernel. In this case, Kbuild passes -m32 to teach the compiler to produce 32-bit kernel space objects. However, it is missing when building bpfilter_umh. It is built as a 64-bit ELF, and then embedded into the 32-bit kernel. The 32-bit kernel and 64-bit umh is a bad combination. In theory, we can have 32-bit umh running on 64-bit kernel, but we do not have a good reason to support such a usecase. The best is to match the bit size between them. Pass -m32 or -m64 to the umh build command if it is found in $(KBUILD_CFLAGS). Evaluate CC_CAN_LINK against the kernel bit-size. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-05-16ethtool: don't call set_channels in drivers if config didn't changeJakub Kicinski
Don't call drivers if nothing changed. Netlink code already contains this logic. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16ethtool: check if there is at least one channel for TX/RX in the coreJakub Kicinski
Having a channel config with no ability to RX or TX traffic is clearly wrong. Check for this in the core so the drivers don't have to. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16mptcp: Use 32-bit DATA_ACK when possibleChristoph Paasch
RFC8684 allows to send 32-bit DATA_ACKs as long as the peer is not sending 64-bit data-sequence numbers. The 64-bit DSN is only there for extreme scenarios when a very high throughput subflow is combined with a long-RTT subflow such that the high-throughput subflow wraps around the 32-bit sequence number space within an RTT of the high-RTT subflow. It is thus a rare scenario and we should try to use the 32-bit DATA_ACK instead as long as possible. It allows to reduce the TCP-option overhead by 4 bytes, thus makes space for an additional SACK-block. It also makes tcpdumps much easier to read when the DSN and DATA_ACK are both either 32 or 64-bit. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16net: dsa: mt7530: fix roaming from DSA user portsDENG Qingfang
When a client moves from a DSA user port to a software port in a bridge, it cannot reach any other clients that connected to the DSA user ports. That is because SA learning on the CPU port is disabled, so the switch ignores the client's frames from the CPU port and still thinks it is at the user port. Fix it by enabling SA learning on the CPU port. To prevent the switch from learning from flooding frames from the CPU port, set skb->offload_fwd_mark to 1 for unicast and broadcast frames, and let the switch flood them instead of trapping to the CPU port. Multicast frames still need to be trapped to the CPU port for snooping, so set the SA_DIS bit of the MTK tag to 1 when transmitting those frames to disable SA learning. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16netns: enable to inherit devconf from current netnsNicolas Dichtel
The goal is to be able to inherit the initial devconf parameters from the current netns, ie the netns where this new netns has been created. This is useful in a containers environment where /proc/sys is read only. For example, if a pod is created with specifics devconf parameters and has the capability to create netns, the user expects to get the same parameters than his 'init_net', which is not the real init_net in this case. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16ipv6: Fix suspicious RCU usage warning in ip6mrMadhuparna Bhowmik
This patch fixes the following warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc4-next-20200507-syzkaller #0 Not tainted ----------------------------- net/ipv6/ip6mr.c:124 RCU-list traversed in non-reader section!! ipmr_new_table() returns an existing table, but there is no table at init. Therefore the condition: either holding rtnl or the list is empty is used. Fixes: d1db275dd3f6e ("ipv6: ip6mr: support multiple tables") Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15Merge tag 'nfs-for-5.7-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fixes: - nfs: fix NULL deference in nfs4_get_valid_delegation Bugfixes: - Fix corruption of the return value in cachefiles_read_or_alloc_pages() - Fix several fscache cookie issues - Fix a fscache queuing race that can trigger a BUG_ON - NFS: Fix two use-after-free regressions due to the RPC_TASK_CRED_NOREF flag - SUNRPC: Fix a use-after-free regression in rpc_free_client_work() - SUNRPC: Fix a race when tearing down the rpc client debugfs directory - SUNRPC: Signalled ASYNC tasks need to exit - NFSv3: fix rpc receive buffer size for MOUNT call" * tag 'nfs-for-5.7-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv3: fix rpc receive buffer size for MOUNT call SUNRPC: 'Directory with parent 'rpc_clnt' already present!' NFS/pnfs: Don't use RPC_TASK_CRED_NOREF with pnfs NFS: Don't use RPC_TASK_CRED_NOREF with delegreturn SUNRPC: Signalled ASYNC tasks need to exit nfs: fix NULL deference in nfs4_get_valid_delegation SUNRPC: fix use-after-free in rpc_free_client_work() cachefiles: Fix race between read_waiter and read_copier involving op->to_do NFSv4: Fix fscache cookie aux_data to ensure change_attr is included NFS: Fix fscache super_cookie allocation NFS: Fix fscache super_cookie index_key from changing after umount cachefiles: Fix corruption of the return value in cachefiles_read_or_alloc_pages()
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Move the bpf verifier trace check into the new switch statement in HEAD. Resolve the overlapping changes in hinic, where bug fixes overlap the addition of VF support. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix sk_psock reference count leak on receive, from Xiyu Yang. 2) CONFIG_HNS should be invisible, from Geert Uytterhoeven. 3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this, from Maciej Żenczykowski. 4) ipv4 route redirect backoff wasn't actually enforced, from Paolo Abeni. 5) Fix netprio cgroup v2 leak, from Zefan Li. 6) Fix infinite loop on rmmod in conntrack, from Florian Westphal. 7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet. 8) Various bpf probe handling fixes, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits) selftests: mptcp: pm: rm the right tmp file dpaa2-eth: properly handle buffer size restrictions bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range bpf: Restrict bpf_probe_read{, str}() only to archs where they work MAINTAINERS: Mark networking drivers as Maintained. ipmr: Add lockdep expression to ipmr_for_each_table macro ipmr: Fix RCU list debugging warning drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810 tcp: fix error recovery in tcp_zerocopy_receive() MAINTAINERS: Add Jakub to networking drivers. MAINTAINERS: another add of Karsten Graul for S390 networking drivers: ipa: fix typos for ipa_smp2p structure doc pppoe: only process PADT targeted at local interfaces selftests/bpf: Enforce returning 0 for fentry/fexit programs bpf: Enforce returning 0 for fentry/fexit progs net: stmmac: fix num_por initialization security: Fix the default value of secid_to_secctx hook libbpf: Fix register naming in PT_REGS s390 macros ...
2020-05-15mptcp: cope better with MP_JOIN failurePaolo Abeni
Currently, on MP_JOIN failure we reset the child socket, but leave the request socket untouched. tcp_check_req will deal with it according to the 'tcp_abort_on_overflow' sysctl value - by default the req socket will stay alive. The above leads to inconsistent behavior on MP JOIN failure, and bad listener overflow accounting. This patch addresses the issue leveraging the infrastructure just introduced to ask the TCP stack to drop the req on failure. The child socket is not freed anymore by subflow_syn_recv_sock(), instead it's moved to a dead state and will be disposed by the next sock_put done by the TCP stack, so that listener overflow accounting is not affected by MP JOIN failure. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15inet_connection_sock: factor out destroy helper.Paolo Abeni
Move the steps to prepare an inet_connection_sock for forced disposal inside a separate helper. No functional changes inteded, this will just simplify the next patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15mptcp: add new sock flag to deal with join subflowsPaolo Abeni
MP_JOIN subflows must not land into the accept queue. Currently tcp_check_req() calls an mptcp specific helper to detect such scenario. Such helper leverages the subflow context to check for MP_JOIN subflows. We need to deal also with MP JOIN failures, even when the subflow context is not available due allocation failure. A possible solution would be changing the syn_recv_sock() signature to allow returning a more descriptive action/ error code and deal with that in tcp_check_req(). Since the above need is MPTCP specific, this patch instead uses a TCP request socket hole to add a MPTCP specific flag. Such flag is used by the MPTCP syn_recv_sock() to tell tcp_check_req() how to deal with the request socket. This change is a no-op for !MPTCP build, and makes the MPTCP code simpler. It allows also the next patch to deal correctly with MP JOIN failure. v1 -> v2: - be more conservative on drop_req initialization (Mat) RFC -> v1: - move the drop_req bit inside tcp_request_sock (Eric) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-15 The following pull-request contains BPF updates for your *net-next* tree. We've added 37 non-merge commits during the last 1 day(s) which contain a total of 67 files changed, 741 insertions(+), 252 deletions(-). The main changes are: 1) bpf_xdp_adjust_tail() now allows to grow the tail as well, from Jesper. 2) bpftool can probe CONFIG_HZ, from Daniel. 3) CAP_BPF is introduced to isolate user processes that use BPF infra and to secure BPF networking services by dropping CAP_SYS_ADMIN requirement in certain cases, from Alexei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15net: sched: cls_flower: implement terse dump supportVlad Buslov
Implement tcf_proto_ops->terse_dump() callback for flower classifier. Only dump handle, flags and action data in terse mode. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15net: sched: implement terse dump support in actVlad Buslov
Extend tcf_action_dump() with boolean argument 'terse' that is used to request terse-mode action dump. In terse mode only essential data needed to identify particular action (action kind, cookie, etc.) and its stats is put to resulting skb and everything else is omitted. Implement tcf_exts_terse_dump() helper in cls API that is intended to be used to request terse dump of all exts (actions) attached to the filter. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>