summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-12-11rfkill: add a reason to the HW rfkill stateEmmanuel Grumbach
The WLAN device may exist yet not be usable. This can happen when the WLAN device is controllable by both the host and some platform internal component. We need some arbritration that is vendor specific, but when the device is not available for the host, we need to reflect this state towards the user space. Add a reason field to the rfkill object (and event) so that userspace can know why the device is in rfkill: because some other platform component currently owns the device, or because the actual hw rfkill signal is asserted. Capable userspace can now determine the reason for the rfkill and possibly do some negotiation on a side band channel using a proprietary protocol to gain ownership on the device in case the device is owned by some other component. When the host gains ownership on the device, the kernel can remove the RFKILL_HARD_BLOCK_NOT_OWNER reason and the hw rfkill state will be off. Then, the userspace can bring the device up and start normal operation. The rfkill_event structure is enlarged to include the additional byte, it is now 9 bytes long. Old user space will ask to read only 8 bytes so that the kernel can know not to feed them with more data. When the user space writes 8 bytes, new kernels will just read what is present in the file descriptor. This new byte is read only from the userspace standpoint anyway. If a new user space uses an old kernel, it'll ask to read 9 bytes but will get only 8, and it'll know that it didn't get the new state. When it'll write 9 bytes, the kernel will again ignore this new byte which is read only from the userspace standpoint. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://lore.kernel.org/r/20201104134641.28816-1-emmanuel.grumbach@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2020-12-10 The following pull-request contains BPF updates for your *net* tree. We've added 21 non-merge commits during the last 12 day(s) which contain a total of 21 files changed, 163 insertions(+), 88 deletions(-). The main changes are: 1) Fix propagation of 32-bit signed bounds from 64-bit bounds, from Alexei. 2) Fix ring_buffer__poll() return value, from Andrii. 3) Fix race in lwt_bpf, from Cong. 4) Fix test_offload, from Toke. 5) Various xsk fixes. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Cong Wang, Hulk Robot, Jakub Kicinski, Jean-Philippe Brucker, John Fastabend, Magnus Karlsson, Maxim Mikityanskiy, Yonghong Song ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-10rtnetlink: RCU-annotate both dimensions of rtnl_msg_handlersJakub Kicinski
We use rcu_assign_pointer to assign both the table and the entries, but the entries are not marked as __rcu. This generates sparse warnings. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-10tcp: correctly handle increased zerocopy args struct sizeArjun Roy
A prior patch increased the size of struct tcp_zerocopy_receive but did not update do_tcp_getsockopt() handling to properly account for this. This patch simply reintroduces content erroneously cut from the referenced prior patch that handles the new struct size. Fixes: 18fb76ed5386 ("net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.") Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-10can: isotp: add SF_BROADCAST support for functional addressingOliver Hartkopp
When CAN_ISOTP_SF_BROADCAST is set in the CAN_ISOTP_OPTS flags the CAN_ISOTP socket is switched into functional addressing mode, where only single frame (SF) protocol data units can be send on the specified CAN interface and the given tp.tx_id after bind(). In opposite to normal and extended addressing this socket does not register a CAN-ID for reception which would be needed for a 1-to-1 ISOTP connection with a segmented bi-directional data transfer. Sending SFs on this socket is therefore a TX-only 'broadcast' operation. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Thomas Wagner <thwa1@web.de> Link: https://lore.kernel.org/r/20201206144731.4609-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-12-09net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flowerGuillaume Nault
TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL is a u32 attribute (MPLS label is 20 bits long). Fixes the following bug: $ tc filter add dev ethX ingress protocol mpls_uc \ flower mpls lse depth 2 label 256 \ action drop $ tc filter show dev ethX ingress filter protocol mpls_uc pref 49152 flower chain 0 filter protocol mpls_uc pref 49152 flower chain 0 handle 0x1 eth_type 8847 mpls lse depth 2 label 0 <-- invalid label 0, should be 256 ... Fixes: 61aec25a6db5 ("cls_flower: Support filtering on multiple MPLS Label Stack Entries") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09net: x25: Fix handling of Restart Request and Restart ConfirmationXie He
1. When the x25 module gets loaded, layer 2 may already be running and connected. In this case, although we are in X25_LINK_STATE_0, we still need to handle the Restart Request received, rather than ignore it. 2. When we are in X25_LINK_STATE_2, we have already sent a Restart Request and is waiting for the Restart Confirmation with t20timer. t20timer will restart itself repeatedly forever so it will always be there, as long as we are in State 2. So we don't need to check x25_t20timer_pending again. Fixes: d023b2b9ccc2 ("net/x25: fix restart request/confirm handling") Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: be careful on subflows shutdownPaolo Abeni
When the workqueue disposes of the msk, the subflows can still receive some data from the peer after __mptcp_close_ssk() completes. The above could trigger a race between the msk receive path and the msk destruction. Acquiring the mptcp_data_lock() in __mptcp_destroy_sock() will not save the day: the rx path could be reached even after msk destruction completes. Instead use the subflow 'disposable' flag to prevent entering the msk receive path after __mptcp_close_ssk(). Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: plug subflow context memory leakPaolo Abeni
When a MPTCP listener socket is closed with unaccepted children pending, the ULP release callback will be invoked, but nobody will call into __mptcp_close_ssk() on the corresponding subflow. As a consequence, at ULP release time, the 'disposable' flag will be cleared and the subflow context memory will be leaked. This change addresses the issue always freeing the context if the subflow is still in the accept queue at ULP release time. Additionally, this fixes an incorrect code reference in the related comment. Note: this fix leverages the changes introduced by the previous commit. Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: link MPC subflow into msk only after acceptPaolo Abeni
Christoph reported the following splat: WARNING: CPU: 0 PID: 4615 at net/ipv4/inet_connection_sock.c:1031 inet_csk_listen_stop+0x8e8/0xad0 net/ipv4/inet_connection_sock.c:1031 Modules linked in: CPU: 0 PID: 4615 Comm: syz-executor.4 Not tainted 5.9.0 #37 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:inet_csk_listen_stop+0x8e8/0xad0 net/ipv4/inet_connection_sock.c:1031 Code: 03 00 00 00 e8 79 b2 3d ff e9 ad f9 ff ff e8 1f 76 ba fe be 02 00 00 00 4c 89 f7 e8 62 b2 3d ff e9 14 f9 ff ff e8 08 76 ba fe <0f> 0b e9 97 f8 ff ff e8 fc 75 ba fe be 03 00 00 00 4c 89 f7 e8 3f RSP: 0018:ffffc900037f7948 EFLAGS: 00010293 RAX: ffff88810a349c80 RBX: ffff888114ee1b00 RCX: ffffffff827b14cd RDX: 0000000000000000 RSI: ffffffff827b1c38 RDI: 0000000000000005 RBP: ffff88810a2a8000 R08: ffff88810a349c80 R09: fffff520006fef1f R10: 0000000000000003 R11: fffff520006fef1e R12: ffff888114ee2d00 R13: dffffc0000000000 R14: 0000000000000001 R15: ffff888114ee1d68 FS: 00007f2ac1945700(0000) GS:ffff88811b400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd44798bc0 CR3: 0000000109810002 CR4: 0000000000170ef0 Call Trace: __tcp_close+0xd86/0x1110 net/ipv4/tcp.c:2433 __mptcp_close_ssk+0x256/0x430 net/mptcp/protocol.c:1761 __mptcp_destroy_sock+0x49b/0x770 net/mptcp/protocol.c:2127 mptcp_close+0x62d/0x910 net/mptcp/protocol.c:2184 inet_release+0xe9/0x1f0 net/ipv4/af_inet.c:434 __sock_release+0xd2/0x280 net/socket.c:596 sock_close+0x15/0x20 net/socket.c:1277 __fput+0x276/0x960 fs/file_table.c:281 task_work_run+0x109/0x1d0 kernel/task_work.c:151 get_signal+0xe8f/0x1d40 kernel/signal.c:2561 arch_do_signal+0x88/0x1b60 arch/x86/kernel/signal.c:811 exit_to_user_mode_loop kernel/entry/common.c:161 [inline] exit_to_user_mode_prepare+0x9b/0xf0 kernel/entry/common.c:191 syscall_exit_to_user_mode+0x22/0x150 kernel/entry/common.c:266 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f2ac1254469 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48 RSP: 002b:00007f2ac1944dc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffbf RBX: 000000000069bf00 RCX: 00007f2ac1254469 RDX: 0000000000000000 RSI: 0000000000008982 RDI: 0000000000000003 RBP: 000000000069bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000069bf0c R13: 00007ffeb53f178f R14: 00000000004668b0 R15: 0000000000000003 After commit 0397c6d85f9c ("mptcp: keep unaccepted MPC subflow into join list"), the msk's workqueue and/or PM can touch the MPC subflow - and acquire its socket lock - even if it's still unaccepted. If the above event races with the relevant listener socket close, we can end-up with the above splat. This change addresses the issue delaying the MPC socket insertion in conn_list at accept time - that is, partially reverting the blamed commit. We must additionally ensure that mptcp_pm_fully_established() happens after accept() time, or the PM will not be able to handle properly such event - conn_list could be empty otherwise. In the receive path, we check the subflow list node to ensure it is out of the listener queue. Be sure client subflows do not match transiently such condition moving them into the join list earlier at creation time. Since we now have multiple mptcp_pm_fully_established() call sites from different code-paths, said helper can now race with itself. Use an additional PM status bit to avoid multiple notifications. Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/103 Fixes: 0397c6d85f9c ("mptcp: keep unaccepted MPC subflow into join list"), Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: use the variable sk instead of open-codingGeliang Tang
Since the local variable sk has been defined, use it instead of open-coding. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: rename add_addr_signal and mptcp_add_addr_statusGeliang Tang
Since the RM_ADDR signal had been reused with add_addr_signal, it's not suitable to call it add_addr_signal or mptcp_add_addr_status. So this patch renamed add_addr_signal to addr_signal, and renamed mptcp_add_addr_status to mptcp_addr_signal_status. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: drop rm_addr_signal flagGeliang Tang
This patch reused add_addr_signal for the RM_ADDR announcing signal, by defining a new ADD_ADDR status named MPTCP_RM_ADDR_SIGNAL. Then the flag rm_addr_signal in PM could be dropped. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: print out port and ahmac when receiving ADD_ADDRGeliang Tang
This patch printed out more debugging information for the ADD_ADDR suboption parsing on the incoming path. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: add port parameter for mptcp_pm_announce_addrGeliang Tang
This patch added a new parameter 'port' for mptcp_pm_announce_addr. If this parameter is true, we set the MPTCP_ADD_ADDR_PORT bit of the add_addr_signal. That means the announced address is added with a port number. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: send out dedicated packet for ADD_ADDR using portGeliang Tang
The process is similar to that of the ADD_ADDR IPv6, this patch also sent out a pure ack for the ADD_ADDR using port. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: add the outgoing ADD_ADDR port supportGeliang Tang
This patch added a new add_addr_signal type named MPTCP_ADD_ADDR_PORT, to identify it is an address with port to be added. It also added a new parameter 'port' for both mptcp_add_addr_len and mptcp_pm_add_addr_signal. In mptcp_established_options_add_addr, we check whether the announced address is added with port. If it is, we put this port number to mptcp_out_options's port field. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: use adding up size to get ADD_ADDR lengthGeliang Tang
This patch uses adding up size to get the ADD_ADDR suboption length rather than returning the ADD_ADDR size constants. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: add port support for ADD_ADDR suboption writingGeliang Tang
In rfc8684, the length of ADD_ADDR suboption with IPv4 address and port is 18 octets, but mptcp_write_options is 32-bit aligned, so we need to pad it to 20 octets. All the other port related option lengths need to be added up 2 octets similarly. This patch added a new field 'port' in mptcp_out_options. When this field is set with a port number, we need to add up 4 octets for the ADD_ADDR suboption, and put the port number into the suboption. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: unify ADD_ADDR and ADD_ADDR6 suboptions writingGeliang Tang
The length of ADD_ADDR6 is 12 octets longer than ADD_ADDR. That's the only difference between them. This patch dropped the duplicate code between ADD_ADDR and ADD_ADDR6 suboptions writing, and unify them into one. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: unify ADD_ADDR and echo suboptions writingGeliang Tang
There are two differences between ADD_ADDR suboption and ADD_ADDR echo suboption: The length of the former is 8 octets longer than the length of the latter. The former's echo-flag is 0, and latter's echo-flag is 1. This patch added two local variables, len and echo, to unify ADD_ADDR and ADD_ADDR echo suboptions writing. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Switch to RCU in x_tables to fix possible NULL pointer dereference, from Subash Abhinov Kasiviswanathan. 2) Fix netlink dump of dynset timeouts later than 23 days. 3) Add comment for the indirect serialization of the nft commit mutex with rtnl_mutex. 4) Remove bogus check for confirmed conntrack when matching on the conntrack ID, from Brett Mastbergen. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09net: rxrpc: convert comma to semicolonZheng Yongjun
Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09tcp: fix cwnd-limited bug for TSO deferral where we send nothingNeal Cardwell
When cwnd is not a multiple of the TSO skb size of N*MSS, we can get into persistent scenarios where we have the following sequence: (1) ACK for full-sized skb of N*MSS arrives -> tcp_write_xmit() transmit full-sized skb with N*MSS -> move pacing release time forward -> exit tcp_write_xmit() because pacing time is in the future (2) TSQ callback or TCP internal pacing timer fires -> try to transmit next skb, but TSO deferral finds remainder of available cwnd is not big enough to trigger an immediate send now, so we defer sending until the next ACK. (3) repeat... So we can get into a case where we never mark ourselves as cwnd-limited for many seconds at a time, even with bulk/infinite-backlog senders, because: o In case (1) above, every time in tcp_write_xmit() we have enough cwnd to send a full-sized skb, we are not fully using the cwnd (because cwnd is not a multiple of the TSO skb size). So every time we send data, we are not cwnd limited, and so in the cwnd-limited tracking code in tcp_cwnd_validate() we mark ourselves as not cwnd-limited. o In case (2) above, every time in tcp_write_xmit() that we try to transmit the "remainder" of the cwnd but defer, we set the local variable is_cwnd_limited to true, but we do not send any packets, so sent_pkts is zero, so we don't call the cwnd-limited logic to update tp->is_cwnd_limited. Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler") Reported-by: Ingemar Johansson <ingemar.s.johansson@ericsson.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201209035759.1225145-1-ncardwell.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09net: flow_offload: Fix memory leak for indirect flow blockChris Mi
The offending commit introduces a cleanup callback that is invoked when the driver module is removed to clean up the tunnel device flow block. But it returns on the first iteration of the for loop. The remaining indirect flow blocks will never be freed. Fixes: 1fac52da5942 ("net: flow_offload: consolidate indirect flow_block infrastructure") CC: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com>
2020-12-09tcp: Retain ECT bits for tos reflectionWei Wang
For DCTCP, we have to retain the ECT bits set by the congestion control algorithm on the socket when reflecting syn TOS in syn-ack, in order to make ECN work properly. Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket") Reported-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Wei Wang <weiwan@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09ethtool: fix stack overflow in ethnl_parse_bitset()Michal Kubecek
Syzbot reported a stack overflow in bitmap_from_arr32() called from ethnl_parse_bitset() when bitset from netlink message is longer than target bitmap length. While ethnl_compact_sanity_checks() makes sure that trailing part is all zeros (i.e. the request does not try to touch bits kernel does not recognize), we also need to cap change_bits to nbits so that we don't try to write past the prepared bitmaps. Fixes: 88db6d1e4f62 ("ethtool: add ethnl_parse_bitset() helper") Reported-by: syzbot+9d39fa49d4df294aab93@syzkaller.appspotmail.com Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Link: https://lore.kernel.org/r/3487ee3a98e14cd526f55b6caaa959d2dcbcad9f.1607465316.git.mkubecek@suse.cz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09net: sched: incorrect Kconfig dependencies on Netfilter modulesPablo Neira Ayuso
- NET_ACT_CONNMARK and NET_ACT_CTINFO only require conntrack support. - NET_ACT_IPT only requires NETFILTER_XTABLES symbols, not IP_NF_IPTABLES. After this patch, NET_ACT_IPT becomes consistent with NET_EMATCH_IPT. NET_ACT_IPT dependency on IP_NF_IPTABLES predates Linux-2.6.12-rc2 (initial git repository build). Fixes: 22a5dc0e5e3e ("net: sched: Introduce connmark action") Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/r/20201208204707.11268-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09can: isotp: isotp_setsockopt(): block setsockopt on bound socketsOliver Hartkopp
The isotp socket can be widely configured in its behaviour regarding addressing types, fill-ups, receive pattern tests and link layer length. Usually all these settings need to be fixed before bind() and can not be changed afterwards. This patch adds a check to enforce the common usage pattern. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Thomas Wagner <thwa1@web.de> Link: https://lore.kernel.org/r/20201203140604.25488-2-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/r/20201204133508.742120-3-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09xdp: Remove the xdp_attachment_flags_ok() callbackToke Høiland-Jørgensen
Since commit 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device"), the XDP program attachment info is now maintained in the core code. This interacts badly with the xdp_attachment_flags_ok() check that prevents unloading an XDP program with different load flags than it was loaded with. In practice, two kinds of failures are seen: - An XDP program loaded without specifying a mode (and which then ends up in driver mode) cannot be unloaded if the program mode is specified on unload. - The dev_xdp_uninstall() hook always calls the driver callback with the mode set to the type of the program but an empty flags argument, which means the flags_ok() check prevents the program from being removed, leading to bpf prog reference leaks. The original reason this check was added was to avoid ambiguity when multiple programs were loaded. With the way the checks are done in the core now, this is quite simple to enforce in the core code, so let's add a check there and get rid of the xdp_attachment_flags_ok() callback entirely. Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk
2020-12-09SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcallChuck Lever
There's no need to defer allocation of pages for the receive buffer. - This upcall is quite infrequent - gssp_alloc_receive_pages() can allocate the pages with GFP_KERNEL, unlike the transport - gssp_alloc_receive_pages() knows exactly how many pages are needed Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga Kornievskaia <kolga@netapp.com>
2020-12-09sunrpc: clean-up cache downcallRoberto Bergantinos Corpas
We can simplify code around cache_downcall unifying memory allocations using kvmalloc. This has the benefit of getting rid of cache_slow_downcall (and queue_io_mutex), and also matches userland allocation size and limits. Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-09netfilter: nft_ct: Remove confirmation check for NFT_CT_IDBrett Mastbergen
Since commit 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id hash calculation") the ct id will not change from initialization to confirmation. Removing the confirmation check allows for things like adding an element to a 'typeof ct id' set in prerouting upon reception of the first packet of a new connection, and then being able to reference that set consistently both before and after the connection is confirmed. Fixes: 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id hash calculation") Signed-off-by: Brett Mastbergen <brett.mastbergen@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-08bpf: Only provide bpf_sock_from_file with CONFIG_NETFlorent Revest
This moves the bpf_sock_from_file definition into net/core/filter.c which only gets compiled with CONFIG_NET and also moves the helper proto usage next to other tracing helpers that are conditional on CONFIG_NET. This avoids ld: kernel/trace/bpf_trace.o: in function `bpf_sock_from_file': bpf_trace.c:(.text+0xe23): undefined reference to `sock_from_file' When compiling a kernel with BPF and without NET. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20201208173623.1136863-1-revest@chromium.org
2020-12-08tcp: select sane initial rcvq_space.space for big MSSEric Dumazet
Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS. This is no longer the case, and Hazem Mohamed Abuelfotoh reported that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames. Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit of rcvq_space.space computation, while it can select later a smaller value for tp->rcv_ssthresh and tp->window_clamp. ss -temoi on receiver would show : skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596 This means that TCP can not increase its window in tcp_grow_window(), and that DRS can never kick. Fix this by making sure that rcvq_space.space is not bigger than number of bytes that can be held in TCP receive queue. People unable/unwilling to change their kernel can work around this issue by selecting a bigger tcp_rmem[1] value as in : echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem Based on an initial report and patch from Hazem Mohamed Abuelfotoh https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/ Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner") Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08net: openvswitch: conntrack: simplify the return expression of ↵Zheng Yongjun
ovs_ct_limit_get_default_limit() Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08net: core: devlink: simplify the return expression of ↵Zheng Yongjun
devlink_nl_cmd_trap_set_doit() Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08net: ipv6: rpl_iptunnel: simplify the return expression of rpl_do_srh()Zheng Yongjun
Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08net/sched: cls_u32: simplify the return expression of u32_reoffload_knode()Zheng Yongjun
Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08net: sched: fix spelling mistake in Kconfig "trys" -> "tries"Colin Ian King
There is a spelling mistake in the Kconfig help text. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2020-12-07 Here's the main bluetooth-next pull request for the 5.11 kernel. - Updated Bluetooth entries in MAINTAINERS to include Luiz von Dentz - Added support for Realtek 8822CE and 8852A devices - Added support for MediaTek MT7615E device - Improved workarounds for fake CSR devices - Fix Bluetooth qualification test case L2CAP/COS/CFD/BV-14-C - Fixes for LL Privacy support - Enforce 16 byte encryption key size for FIPS security level - Added new mgmt commands for extended advertising support - Multiple other smaller fixes & improvements Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08net/af_iucv: use DECLARE_SOCKADDR to cast from sockaddrJulian Wiedmann
This gets us compile-time size checking. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08net: tipc: prevent possible null deref of linkCengiz Can
`tipc_node_apply_property` does a null check on a `tipc_link_entry` pointer but also accesses the same pointer out of the null check block. This triggers a warning on Coverity Static Analyzer because we're implying that `e->link` can BE null. Move "Update MTU for node link entry" line into if block to make sure that we're not in a state that `e->link` is null. Signed-off-by: Cengiz Can <cengiz@kernel.wtf> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08netfilter: nftables: comment indirect serialization of commit_mutex with ↵Pablo Neira Ayuso
rtnl_mutex Add an explicit comment in the code to describe the indirect serialization of the holders of the commit_mutex with the rtnl_mutex. Commit 90d2723c6d4c ("netfilter: nf_tables: do not hold reference on netdevice from preparation phase") already describes this, but a comment in this case is better for reference. Reported-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-08netfilter: nft_dynset: fix timeouts later than 23 daysPablo Neira Ayuso
Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by 8e1102d5a159 ("netfilter: nf_tables: support timeouts larger than 23 days"), otherwise ruleset listing breaks. Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-08net: dsa: print the MTU value that could not be setRasmus Villemoes
These warnings become somewhat more informative when they include the MTU value that could not be set and not just the errno. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Link: https://lore.kernel.org/r/20201205133944.10182-1-rasmus.villemoes@prevas.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-08xsk: Validate socket state in xsk_recvmsg, prior touching socket membersBjörn Töpel
In AF_XDP the socket state needs to be checked, prior touching the members of the socket. This was not the case for the recvmsg implementation. Fix that by moving the xsk_is_bound() call. Fixes: 45a86681844e ("xsk: Add support for recvmsg()") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201207082008.132263-1-bjorn.topel@gmail.com
2020-12-08netfilter: x_tables: Switch synchronization to RCUSubash Abhinov Kasiviswanathan
When running concurrent iptables rules replacement with data, the per CPU sequence count is checked after the assignment of the new information. The sequence count is used to synchronize with the packet path without the use of any explicit locking. If there are any packets in the packet path using the table information, the sequence count is incremented to an odd value and is incremented to an even after the packet process completion. The new table value assignment is followed by a write memory barrier so every CPU should see the latest value. If the packet path has started with the old table information, the sequence counter will be odd and the iptables replacement will wait till the sequence count is even prior to freeing the old table info. However, this assumes that the new table information assignment and the memory barrier is actually executed prior to the counter check in the replacement thread. If CPU decides to execute the assignment later as there is no user of the table information prior to the sequence check, the packet path in another CPU may use the old table information. The replacement thread would then free the table information under it leading to a use after free in the packet processing context- Unable to handle kernel NULL pointer dereference at virtual address 000000000000008e pc : ip6t_do_table+0x5d0/0x89c lr : ip6t_do_table+0x5b8/0x89c ip6t_do_table+0x5d0/0x89c ip6table_filter_hook+0x24/0x30 nf_hook_slow+0x84/0x120 ip6_input+0x74/0xe0 ip6_rcv_finish+0x7c/0x128 ipv6_rcv+0xac/0xe4 __netif_receive_skb+0x84/0x17c process_backlog+0x15c/0x1b8 napi_poll+0x88/0x284 net_rx_action+0xbc/0x23c __do_softirq+0x20c/0x48c This could be fixed by forcing instruction order after the new table information assignment or by switching to RCU for the synchronization. Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore") Reported-by: Sean Tranchetti <stranche@codeaurora.org> Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-07Merge branch 'master' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-12-07 1) Sysbot reported fixes for the new 64/32 bit compat layer. From Dmitry Safonov. 2) Fix a memory leak in xfrm_user_policy that was introduced by adding the 64/32 bit compat layer. From Yu Kuai. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: net: xfrm: fix memory leak in xfrm_user_policy() xfrm/compat: Don't allocate memory with __GFP_ZERO xfrm/compat: memset(0) 64-bit padding at right place xfrm/compat: Translate by copying XFRMA_UNSPEC attribute ==================== Link: https://lore.kernel.org/r/20201207093937.2874932-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-07mptcp: print new line in mptcp_seq_show() if mptcp isn't in useJianguo Wu
When do cat /proc/net/netstat, the output isn't append with a new line, it looks like this: [root@localhost ~]# cat /proc/net/netstat ... MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0[root@localhost ~]# This is because in mptcp_seq_show(), if mptcp isn't in use, net->mib.mptcp_statistics is NULL, so it just puts all 0 after "MPTcpExt:", and return, forgot the '\n'. After this patch: [root@localhost ~]# cat /proc/net/netstat ... MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0 [root@localhost ~]# Fixes: fc518953bc9c8d7d ("mptcp: add and use MIB counter infrastructure") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Acked-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/142e2fd9-58d9-bb13-fb75-951cccc2331e@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>