summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-03-10net: nfc: nci: drop nci_uart_default_recvJiri Slaby
nci_uart_register returns -EINVAL immediately when nu->ops.recv is not set. So the same 'if' later never triggers so nci_uart_default_recv is never used. Drop it. Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210302062214.29627-17-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10net: nfc: nci: drop nci_uart_ops::recv_bufJiri Slaby
There is noone setting nci_uart_ops::recv_buf, so the default one (nci_uart_default_recv_buf) is always used. So drop this hook, move nci_uart_default_recv_buf before the use in nci_uart_tty_receive and remove unused parameter flags. Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210302062214.29627-16-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10net: nfc: nci: remove memset of nci_uart_driversJiri Slaby
nci_uart_drivers is a global definition, so there is no need to initialize its memory to zero during module load. Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210302062214.29627-15-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-10tty: remove TTY_LDISC_MAGICJiri Slaby
First, it is never checked. Second, use of it as a debugging aid is at least questionable. With the current tools, I don't think anyone used this kind of thing for debugging purposes for years. On the top of that, e.g. serdev does not set this field of tty_ldisc_ops at all. So get rid of this legacy. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210302062214.29627-8-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09Merge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau. 2) TX skb error handling fix in mt76 driver, also from Felix. 3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman. 4) Avoid double free of percpu pointers when freeing a cloned bpf prog. From Cong Wang. 5) Use correct printf format for dma_addr_t in ath11k, from Geert Uytterhoeven. 6) Fix resolve_btfids build with older toolchains, from Kun-Chuan Hsieh. 7) Don't report truncated frames to mac80211 in mt76 driver, from Lorenzop Bianconi. 8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang. 9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann. 10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from Arjun Roy. 11) Ignore routes with deleted nexthop object in mlxsw, from Ido Schimmel. 12) Need to undo tcp early demux lookup sometimes in nf_nat, from Florian Westphal. 13) Fix gro aggregation for udp encaps with zero csum, from Daniel Borkmann. 14) Make sure to always use imp*_ndo_send when necessaey, from Jason A. Donenfeld. 15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov. 16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin. 17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir Oltean. 18) Make sure textsearch copntrol block is large enough, from Wilem de Bruijn. 19) Revert MAC changes to r8152 leading to instability, from Hates Wang. 20) Advance iov in 9p even for empty reads, from Jissheng Zhang. 21) Double hook unregister in nftables, from PabloNeira Ayuso. 22) Fix memleak in ixgbe, fropm Dinghao Liu. 23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne. 24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang Tang. 25) Fix DOI refcount bugs in cipso, from Paul Moore. 26) One too many irqsave in ibmvnic, from Junlin Yang. 27) Fix infinite loop with MPLS gso segmenting via virtio_net, from Balazs Nemeth. * git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits) s390/qeth: fix notification for pending buffers during teardown s390/qeth: schedule TX NAPI on QAOB completion s390/qeth: improve completion of pending TX buffers s390/qeth: fix memory leak after failed TX Buffer allocation net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0 net: check if protocol extracted by virtio_net_hdr_set_proto is correct net: dsa: xrs700x: check if partner is same as port in hsr join net: lapbether: Remove netif_start_queue / netif_stop_queue atm: idt77252: fix null-ptr-dereference atm: uPD98402: fix incorrect allocation atm: fix a typo in the struct description net: qrtr: fix error return code of qrtr_sendmsg() mptcp: fix length of ADD_ADDR with port sub-option net: bonding: fix error return code of bond_neigh_init() net: enetc: allow hardware timestamping on TX queues with tc-etf enabled net: enetc: set MAC RX FIFO to recommended value net: davicom: Use platform_get_irq_optional() net: davicom: Fix regulator not turned off on driver removal net: davicom: Fix regulator not turned off on failed probe net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports ...
2021-03-09net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0Balazs Nemeth
A packet with skb_inner_network_header(skb) == skb_network_header(skb) and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers from the packet. Subsequently, the call to skb_mac_gso_segment will again call mpls_gso_segment with the same packet leading to an infinite loop. In addition, ensure that the header length is a multiple of four, which should hold irrespective of the number of stacked labels. Signed-off-by: Balazs Nemeth <bnemeth@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08net: qrtr: fix error return code of qrtr_sendmsg()Jia-Ju Bai
When sock_alloc_send_skb() returns NULL to skb, no error return code of qrtr_sendmsg() is assigned. To fix this bug, rc is assigned with -ENOMEM in this case. Fixes: 194ccc88297a ("net: qrtr: Support decoding incoming v2 packets") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08mptcp: fix length of ADD_ADDR with port sub-optionDavide Caratti
in current Linux, MPTCP peers advertising endpoints with port numbers use a sub-option length that wrongly accounts for the trailing TCP NOP. Also, receivers will only process incoming ADD_ADDR with port having such wrong sub-option length. Fix this, making ADD_ADDR compliant to RFC8684 §3.4.1. this can be verified running tcpdump on the kselftests artifacts: unpatched kernel: [root@bottarga mptcp]# tcpdump -tnnr unpatched.pcap | grep add-addr reading from file unpatched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535 IP 10.0.1.1.10000 > 10.0.1.2.53078: Flags [.], ack 101, win 509, options [nop,nop,TS val 214459678 ecr 521312851,mptcp add-addr v1 id 1 a00:201:2774:2d88:7436:85c3:17fd:101], length 0 IP 10.0.1.2.53078 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 521312852 ecr 214459678,mptcp add-addr[bad opt]] patched kernel: [root@bottarga mptcp]# tcpdump -tnnr patched.pcap | grep add-addr reading from file patched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535 IP 10.0.1.1.10000 > 10.0.1.2.38178: Flags [.], ack 101, win 509, options [nop,nop,TS val 3728873902 ecr 2732713192,mptcp add-addr v1 id 1 10.0.2.1:10100 hmac 0xbccdfcbe59292a1f,nop,nop], length 0 IP 10.0.1.2.38178 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 2732713195 ecr 3728873902,mptcp add-addr v1-echo id 1 10.0.2.1:10100,nop,nop], length 0 Fixes: 22fb85ffaefb ("mptcp: add port support for ADD_ADDR suboption writing") CC: stable@vger.kernel.org # 5.11+ Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08bpf: BPF-helper for MTU checking add length inputJesper Dangaard Brouer
The FIB lookup example[1] show how the IP-header field tot_len (iph->tot_len) is used as input to perform the MTU check. This patch extend the BPF-helper bpf_check_mtu() with the same ability to provide the length as user parameter input, via mtu_len parameter. This still needs to be done before the bpf_check_mtu() helper API becomes frozen. [1] samples/bpf/xdp_fwd_kern.c Fixes: 34b2021cc616 ("bpf: Add BPF-helper for MTU checking") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161521555850.3515614.6533850861569774444.stgit@firesoul
2021-03-08SUNRPC: Set memalloc_nofs_save() for sync tasksBenjamin Coddington
We could recurse into NFS doing memory reclaim while sending a sync task, which might result in a deadlock. Set memalloc_nofs_save for sync task execution. Fixes: a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-08net: dsa: fix switchdev objects on bridge master mistakenly being applied on ↵Vladimir Oltean
ports Tobias reports that after the blamed patch, VLAN objects being added to a bridge device are being added to all slave ports instead (swp2, swp3). ip link add br0 type bridge vlan_filtering 1 ip link set swp2 master br0 ip link set swp3 master br0 bridge vlan add dev br0 vid 100 self This is because the fix was too broad: we made dsa_port_offloads_netdev say "yes, I offload the br0 bridge" for all slave ports, but we didn't add the checks whether the switchdev object was in fact meant for the physical port or for the bridge itself. So we are reacting on events in a way in which we shouldn't. The reason why the fix was too broad is because the question itself, "does this DSA port offload this netdev", was too broad in the first place. The solution is to disambiguate the question and separate it into two different functions, one to be called for each switchdev attribute / object that has an orig_dev == net_bridge (dsa_port_offloads_bridge), and the other for orig_dev == net_bridge_port (*_offloads_bridge_port). In the case of VLAN objects on the bridge interface, this solves the problem because we know that VLAN objects are per bridge port and not per bridge. And when orig_dev is equal to the net_bridge, we offload it as a bridge, but not as a bridge port; that's how we are able to skip reacting on those events. Note that this is compatible with future plans to have explicit offloading of VLAN objects on the bridge interface as a bridge port (in DSA, this signifies that we should add that VLAN towards the CPU port). Fixes: 99b8202b179f ("net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored") Reported-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Tested-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix incorrect enum type definition in nfnetlink_cthelper UAPI, from Dmitry V. Levin. 2) Remove extra space in deprecated automatic helper assignment notice, from Klemen Košir. 3) Drop early socket demux socket after NAT mangling, from Florian Westphal. Add a test to exercise this bug. 4) Fix bogus invalid packet report in the conntrack TCP tracker, also from Florian. 5) Fix access to xt[NFPROTO_UNSPEC] list with no mutex in target/match_revfn(), from Vasily Averin. 6) Disallow updates on the table ownership flag. 7) Fix double hook unregistration of tables with owner. 8) Remove bogus check on the table owner in __nft_release_tables(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-06rpc: fix NULL dereference on kmalloc failureJ. Bruce Fields
I think this is unlikely but possible: svc_authenticate sets rq_authop and calls svcauth_gss_accept. The kmalloc(sizeof(*svcdata), GFP_KERNEL) fails, leaving rq_auth_data NULL, and returning SVC_DENIED. This causes svc_process_common to go to err_bad_auth, and eventually call svc_authorise. That calls ->release == svcauth_gss_release, which tries to dereference rq_auth_data. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Link: https://lore.kernel.org/linux-nfs/3F1B347F-B809-478F-A1E9-0BE98E22B0F0@oracle.com/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-06sunrpc: fix refcount leak for rpc auth modulesDaniel Kobras
If an auth module's accept op returns SVC_CLOSE, svc_process_common() enters a call path that does not call svc_authorise() before leaving the function, and thus leaks a reference on the auth module's refcount. Hence, make sure calls to svc_authenticate() and svc_authorise() are paired for all call paths, to make sure rpc auth modules can be unloaded. Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message") Link: https://lore.kernel.org/linux-nfs/3F1B347F-B809-478F-A1E9-0BE98E22B0F0@oracle.com/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-06NFSD: Repair misuse of sv_lock in 5.10.16-rt30.Joe Korty
[ This problem is in mainline, but only rt has the chops to be able to detect it. ] Lockdep reports a circular lock dependency between serv->sv_lock and softirq_ctl.lock on system shutdown, when using a kernel built with CONFIG_PREEMPT_RT=y, and a nfs mount exists. This is due to the definition of spin_lock_bh on rt: local_bh_disable(); rt_spin_lock(lock); which forces a softirq_ctl.lock -> serv->sv_lock dependency. This is not a problem as long as _every_ lock of serv->sv_lock is a: spin_lock_bh(&serv->sv_lock); but there is one of the form: spin_lock(&serv->sv_lock); This is what is causing the circular dependency splat. The spin_lock() grabs the lock without first grabbing softirq_ctl.lock via local_bh_disable. If later on in the critical region, someone does a local_bh_disable, we get a serv->sv_lock -> softirq_ctrl.lock dependency established. Deadlock. Fix is to make serv->sv_lock be locked with spin_lock_bh everywhere, no exceptions. [ OK ] Stopped target NFS client services. Stopping Logout off all iSCSI sessions on shutdown... Stopping NFS server and services... [ 109.442380] [ 109.442385] ====================================================== [ 109.442386] WARNING: possible circular locking dependency detected [ 109.442387] 5.10.16-rt30 #1 Not tainted [ 109.442389] ------------------------------------------------------ [ 109.442390] nfsd/1032 is trying to acquire lock: [ 109.442392] ffff994237617f60 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0xd9/0x270 [ 109.442405] [ 109.442405] but task is already holding lock: [ 109.442406] ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442415] [ 109.442415] which lock already depends on the new lock. [ 109.442415] [ 109.442416] [ 109.442416] the existing dependency chain (in reverse order) is: [ 109.442417] [ 109.442417] -> #1 (&serv->sv_lock){+.+.}-{0:0}: [ 109.442421] rt_spin_lock+0x2b/0xc0 [ 109.442428] svc_add_new_perm_xprt+0x42/0xa0 [ 109.442430] svc_addsock+0x135/0x220 [ 109.442434] write_ports+0x4b3/0x620 [ 109.442438] nfsctl_transaction_write+0x45/0x80 [ 109.442440] vfs_write+0xff/0x420 [ 109.442444] ksys_write+0x4f/0xc0 [ 109.442446] do_syscall_64+0x33/0x40 [ 109.442450] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 109.442454] [ 109.442454] -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}: [ 109.442457] __lock_acquire+0x1264/0x20b0 [ 109.442463] lock_acquire+0xc2/0x400 [ 109.442466] rt_spin_lock+0x2b/0xc0 [ 109.442469] __local_bh_disable_ip+0xd9/0x270 [ 109.442471] svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442474] svc_close_list+0x60/0x90 [ 109.442476] svc_close_net+0x49/0x1a0 [ 109.442478] svc_shutdown_net+0x12/0x40 [ 109.442480] nfsd_destroy+0xc5/0x180 [ 109.442482] nfsd+0x1bc/0x270 [ 109.442483] kthread+0x194/0x1b0 [ 109.442487] ret_from_fork+0x22/0x30 [ 109.442492] [ 109.442492] other info that might help us debug this: [ 109.442492] [ 109.442493] Possible unsafe locking scenario: [ 109.442493] [ 109.442493] CPU0 CPU1 [ 109.442494] ---- ---- [ 109.442495] lock(&serv->sv_lock); [ 109.442496] lock((softirq_ctrl.lock).lock); [ 109.442498] lock(&serv->sv_lock); [ 109.442499] lock((softirq_ctrl.lock).lock); [ 109.442501] [ 109.442501] *** DEADLOCK *** [ 109.442501] [ 109.442501] 3 locks held by nfsd/1032: [ 109.442503] #0: ffffffff93b49258 (nfsd_mutex){+.+.}-{3:3}, at: nfsd+0x19a/0x270 [ 109.442508] #1: ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442512] #2: ffffffff93a81b20 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xc0 [ 109.442518] [ 109.442518] stack backtrace: [ 109.442519] CPU: 0 PID: 1032 Comm: nfsd Not tainted 5.10.16-rt30 #1 [ 109.442522] Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.2 09/22/2015 [ 109.442524] Call Trace: [ 109.442527] dump_stack+0x77/0x97 [ 109.442533] check_noncircular+0xdc/0xf0 [ 109.442546] __lock_acquire+0x1264/0x20b0 [ 109.442553] lock_acquire+0xc2/0x400 [ 109.442564] rt_spin_lock+0x2b/0xc0 [ 109.442570] __local_bh_disable_ip+0xd9/0x270 [ 109.442573] svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442577] svc_close_list+0x60/0x90 [ 109.442581] svc_close_net+0x49/0x1a0 [ 109.442585] svc_shutdown_net+0x12/0x40 [ 109.442588] nfsd_destroy+0xc5/0x180 [ 109.442590] nfsd+0x1bc/0x270 [ 109.442595] kthread+0x194/0x1b0 [ 109.442600] ret_from_fork+0x22/0x30 [ 109.518225] nfsd: last server has exited, flushing export cache [ OK ] Stopped NFSv4 ID-name mapping service. [ OK ] Stopped GSSAPI Proxy Daemon. [ OK ] Stopped NFS Mount Daemon. [ OK ] Stopped NFS status monitor for NFSv2/3 locking.. Fixes: 719f8bcc883e ("svcrpc: fix xpt_list traversal locking on shutdown") Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-06svcrdma: disable timeouts on rdma backchannelTimo Rothenpieler
This brings it in line with the regular tcp backchannel, which also has all those timeouts disabled. Prevents the backchannel from timing out, getting some async operations like server side copying getting stuck indefinitely on the client side. Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org> Fixes: 5d252f90a800 ("svcrdma: Add class for RDMA backwards direction transport") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-05CIPSO: Fix unaligned memory access in cipso_v4_gentag_hdrSergey Nazarov
We need to use put_unaligned when writing 32-bit DOI value in cipso_v4_gentag_hdr to avoid unaligned memory access. v2: unneeded type cast removed as Ondrej Mosnacek suggested. Signed-off-by: Sergey Nazarov <s-nazarov@yandex.ru> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04cipso,calipso: resolve a number of problems with the DOI refcountsPaul Moore
The current CIPSO and CALIPSO refcounting scheme for the DOI definitions is a bit flawed in that we: 1. Don't correctly match gets/puts in netlbl_cipsov4_list(). 2. Decrement the refcount on each attempt to remove the DOI from the DOI list, only removing it from the list once the refcount drops to zero. This patch fixes these problems by adding the missing "puts" to netlbl_cipsov4_list() and introduces a more conventional, i.e. not-buggy, refcounting mechanism to the DOI definitions. Upon the addition of a DOI to the DOI list, it is initialized with a refcount of one, removing a DOI from the list removes it from the list and drops the refcount by one; "gets" and "puts" behave as expected with respect to refcounts, increasing and decreasing the DOI's refcount by one. Fixes: b1edeb102397 ("netlabel: Replace protocol/NetLabel linking with refrerence counts") Fixes: d7cce01504a0 ("netlabel: Add support for removing a CALIPSO DOI.") Reported-by: syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04mptcp: free resources when the port number is mismatchedGeliang Tang
When the port number is mismatched with the announced ones, use 'goto dispose_child' to free the resources instead of using 'goto out'. This patch also moves the port number checking code in subflow_syn_recv_sock before mptcp_finish_join, otherwise subflow_drop_ctx will fail in dispose_child. Fixes: 5bc56388c74f ("mptcp: add port number check for MP_JOIN") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04mptcp: fix missing wakeupPaolo Abeni
__mptcp_clean_una() can free write memory and should wake-up user-space processes when needed. When such function is invoked by the MPTCP receive path, the wakeup is not needed, as the TCP stack will later trigger subflow_write_space which will do the wakeup as needed. Other __mptcp_clean_una() call sites need an additional wakeup check Let's bundle the relevant code in a new helper and use it. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/165 Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Fixes: 64b9cea7a0af ("mptcp: fix spurious retransmissions") Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04mptcp: fix race in release_cbPaolo Abeni
If we receive a MPTCP_PUSH_PENDING even from a subflow when mptcp_release_cb() is serving the previous one, the latter will be delayed up to the next release_sock(msk). Address the issue implementing a test/serve loop for such event. Additionally rename the push helper to __mptcp_push_pending() to be more consistent with the existing code. Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04mptcp: factor out __mptcp_retrans helper()Paolo Abeni
Will simplify the following patch, no functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04mptcp: reset 'first' and ack_hint on subflow closeFlorian Westphal
Just like with last_snd, we have to NULL 'first' on subflow close. ack_hint isn't strictly required (its never dereferenced), but better to clear this explicitly as well instead of making it an exception. msk->first is dereferenced unconditionally at accept time, but at that point the ssk is not on the conn_list yet -- this means worker can't see it when iterating the conn_list. Reported-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04mptcp: dispose initial struct socket when its subflow is closedFlorian Westphal
Christoph Paasch reported following crash: dst_release underflow WARNING: CPU: 0 PID: 1319 at net/core/dst.c:175 dst_release+0xc1/0xd0 net/core/dst.c:175 CPU: 0 PID: 1319 Comm: syz-executor217 Not tainted 5.11.0-rc6af8e85128b4d0d24083c5cac646e891227052e0c #70 Call Trace: rt_cache_route+0x12e/0x140 net/ipv4/route.c:1503 rt_set_nexthop.constprop.0+0x1fc/0x590 net/ipv4/route.c:1612 __mkroute_output net/ipv4/route.c:2484 [inline] ... The worker leaves msk->subflow alone even when it happened to close the subflow ssk associated with it. Fixes: 866f26f2a9c33b ("mptcp: always graft subflow socket to parent") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/157 Reported-by: Christoph Paasch <cpaasch@apple.com> Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04mptcp: fix memory accounting on allocation errorPaolo Abeni
In case of memory pressure the MPTCP xmit path keeps at most a single skb in the tx cache, eventually freeing additional ones. The associated counter for forward memory is not update accordingly, and that causes the following splat: WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Modules linked in: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0 RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003 RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7 R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100 R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85 FS: 0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0 Call Trace: __mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547 mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272 process_one_work+0x896/0x1170 kernel/workqueue.c:2275 worker_thread+0x605/0x1350 kernel/workqueue.c:2421 kthread+0x344/0x410 kernel/kthread.c:292 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296 At close time, as reported by syzkaller/Christoph. This change address the issue properly updating the fwd allocated memory counter in the error path. Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/136 Fixes: 724cfd2ee8aa ("mptcp: allocate TX skbs in msk context") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04mptcp: put subflow sock on connect errorFlorian Westphal
mptcp_add_pending_subflow() performs a sock_hold() on the subflow, then adds the subflow to the join list. Without a sock_put the subflow sk won't be freed in case connect() fails. unreferenced object 0xffff88810c03b100 (size 3000): [..] sk_prot_alloc.isra.0+0x2f/0x110 sk_alloc+0x5d/0xc20 inet6_create+0x2b7/0xd30 __sock_create+0x17f/0x410 mptcp_subflow_create_socket+0xff/0x9c0 __mptcp_subflow_connect+0x1da/0xaf0 mptcp_pm_nl_work+0x6e0/0x1120 mptcp_worker+0x508/0x9a0 Fixes: 5b950ff4331ddda ("mptcp: link MPC subflow into msk only after accept") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04mptcp: reset last_snd on subflow closeFlorian Westphal
Send logic caches last active subflow in the msk, so it needs to be cleared when the cached subflow is closed. Fixes: d5f49190def61c ("mptcp: allow picking different xmit subflows") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/155 Reported-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04net: sched: avoid duplicates in classes dumpMaximilian Heyne
This is a follow up of commit ea3274695353 ("net: sched: avoid duplicates in qdisc dump") which has fixed the issue only for the qdisc dump. The duplicate printing also occurs when dumping the classes via tc class show dev eth0 Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable") Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04nexthop: Do not flush blackhole nexthops when loopback goes downIdo Schimmel
As far as user space is concerned, blackhole nexthops do not have a nexthop device and therefore should not be affected by the administrative or carrier state of any netdev. However, when the loopback netdev goes down all the blackhole nexthops are flushed. This happens because internally the kernel associates blackhole nexthops with the loopback netdev. This behavior is both confusing to those not familiar with kernel internals and also diverges from the legacy API where blackhole IPv4 routes are not flushed when the loopback netdev goes down: # ip route add blackhole 198.51.100.0/24 # ip link set dev lo down # ip route show 198.51.100.0/24 blackhole 198.51.100.0/24 Blackhole IPv6 routes are flushed, but at least user space knows that they are associated with the loopback netdev: # ip -6 route show 2001:db8:1::/64 blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium Fix this by only flushing blackhole nexthops when the loopback netdev is unregistered. Fixes: ab84be7e54fc ("net: Initial nexthop code") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Donald Sharp <sharpd@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04net: sctp: trivial: fix typo in commentDrew Fustini
Fix typo of 'overflow' for comment in sctp_tsnmap_check(). Reported-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Drew Fustini <drew@beagleboard.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04netfilter: nftables: bogus check for netlink portID with table ownerPablo Neira Ayuso
The existing branch checks for 0 != table->nlpid which always evaluates true for tables that have an owner. Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-04netfilter: nftables: fix possible double hook unregistration with table ownerPablo Neira Ayuso
Skip hook unregistration of owner tables from the netns exit path, nft_rcv_nl_event() unregisters the table hooks before tearing down the table content. Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-03rtnetlink: using dev_base_seq from target netzhang kai
Signed-off-by: zhang kai <zhangkaiheb@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03net: 9p: advance iov on empty readJisheng Zhang
I met below warning when cating a small size(about 80bytes) txt file on 9pfs(msize=2097152 is passed to 9p mount option), the reason is we miss iov_iter_advance() if the read count is 0 for zerocopy case, so we didn't truncate the pipe, then iov_iter_pipe() thinks the pipe is full. Fix it by removing the exception for 0 to ensure to call iov_iter_advance() even on empty read for zerocopy case. [ 8.279568] WARNING: CPU: 0 PID: 39 at lib/iov_iter.c:1203 iov_iter_pipe+0x31/0x40 [ 8.280028] Modules linked in: [ 8.280561] CPU: 0 PID: 39 Comm: cat Not tainted 5.11.0+ #6 [ 8.281260] RIP: 0010:iov_iter_pipe+0x31/0x40 [ 8.281974] Code: 2b 42 54 39 42 5c 76 22 c7 07 20 00 00 00 48 89 57 18 8b 42 50 48 c7 47 08 b [ 8.283169] RSP: 0018:ffff888000cbbd80 EFLAGS: 00000246 [ 8.283512] RAX: 0000000000000010 RBX: ffff888000117d00 RCX: 0000000000000000 [ 8.283876] RDX: ffff88800031d600 RSI: 0000000000000000 RDI: ffff888000cbbd90 [ 8.284244] RBP: ffff888000cbbe38 R08: 0000000000000000 R09: ffff8880008d2058 [ 8.284605] R10: 0000000000000002 R11: ffff888000375510 R12: 0000000000000050 [ 8.284964] R13: ffff888000cbbe80 R14: 0000000000000050 R15: ffff88800031d600 [ 8.285439] FS: 00007f24fd8af600(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 8.285844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.286150] CR2: 00007f24fd7d7b90 CR3: 0000000000c97000 CR4: 00000000000406b0 [ 8.286710] Call Trace: [ 8.288279] generic_file_splice_read+0x31/0x1a0 [ 8.289273] ? do_splice_to+0x2f/0x90 [ 8.289511] splice_direct_to_actor+0xcc/0x220 [ 8.289788] ? pipe_to_sendpage+0xa0/0xa0 [ 8.290052] do_splice_direct+0x8b/0xd0 [ 8.290314] do_sendfile+0x1ad/0x470 [ 8.290576] do_syscall_64+0x2d/0x40 [ 8.290818] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.291409] RIP: 0033:0x7f24fd7dca0a [ 8.292511] Code: c3 0f 1f 80 00 00 00 00 4c 89 d2 4c 89 c6 e9 bd fd ff ff 0f 1f 44 00 00 31 8 [ 8.293360] RSP: 002b:00007ffc20932818 EFLAGS: 00000206 ORIG_RAX: 0000000000000028 [ 8.293800] RAX: ffffffffffffffda RBX: 0000000001000000 RCX: 00007f24fd7dca0a [ 8.294153] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 [ 8.294504] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [ 8.294867] R10: 0000000001000000 R11: 0000000000000206 R12: 0000000000000003 [ 8.295217] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 [ 8.295782] ---[ end trace 63317af81b3ca24b ]--- Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03net: l2tp: reduce log level of messages in receive path, add counter insteadMatthias Schiffer
Commit 5ee759cda51b ("l2tp: use standard API for warning log messages") changed a number of warnings about invalid packets in the receive path so that they are always shown, instead of only when a special L2TP debug flag is set. Even with rate limiting these warnings can easily cause significant log spam - potentially triggered by a malicious party sending invalid packets on purpose. In addition these warnings were noticed by projects like Tunneldigger [1], which uses L2TP for its data path, but implements its own control protocol (which is sufficiently different from L2TP data packets that it would always be passed up to userspace even with future extensions of L2TP). Some of the warnings were already redundant, as l2tp_stats has a counter for these packets. This commit adds one additional counter for invalid packets that are passed up to userspace. Packets with unknown session are not counted as invalid, as there is nothing wrong with the format of these packets. With the additional counter, all of these messages are either redundant or benign, so we reduce them to pr_debug_ratelimited(). [1] https://github.com/wlanslovenija/tunneldigger/issues/160 Fixes: 5ee759cda51b ("l2tp: use standard API for warning log messages") Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03xfrm: Use actual socket sk instead of skb socket for xfrm_output_resumeEvan Nimmo
A situation can occur where the interface bound to the sk is different to the interface bound to the sk attached to the skb. The interface bound to the sk is the correct one however this information is lost inside xfrm_output2 and instead the sk on the skb is used in xfrm_output_resume instead. This assumes that the sk bound interface and the bound interface attached to the sk within the skb are the same which can lead to lookup failures inside ip_route_me_harder resulting in the packet being dropped. We have an l2tp v3 tunnel with ipsec protection. The tunnel is in the global VRF however we have an encapsulated dot1q tunnel interface that is within a different VRF. We also have a mangle rule that marks the packets causing them to be processed inside ip_route_me_harder. Prior to commit 31c70d5956fc ("l2tp: keep original skb ownership") this worked fine as the sk attached to the skb was changed from the dot1q encapsulated interface to the sk for the tunnel which meant the interface bound to the sk and the interface bound to the skb were identical. Commit 46d6c5ae953c ("netfilter: use actual socket sk rather than skb sk when routing harder") fixed some of these issues however a similar problem existed in the xfrm code. Fixes: 31c70d5956fc ("l2tp: keep original skb ownership") Signed-off-by: Evan Nimmo <evan.nimmo@alliedtelesis.co.nz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-03-03vti6: fix ipv4 pmtu check to honor ip header dfEyal Birger
Frag needed should only be sent if the header enables DF. This fix allows IPv4 packets larger than MTU to pass the vti6 interface and be fragmented after encapsulation, aligning behavior with non-vti6 xfrm. Fixes: ccd740cbc6e0 ("vti6: Add pmtu handling to vti6_xmit.") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-03-03vti: fix ipv4 pmtu check to honor ip header dfEyal Birger
Frag needed should only be sent if the header enables DF. This fix allows packets larger than MTU to pass the vti interface and be fragmented after encapsulation, aligning behavior with non-vti xfrm. Fixes: d6af1a31cc72 ("vti: Add pmtu handling to vti_xmit.") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-03-02net: ieee802154: nl-mac: fix check on panidAlexander Aring
This patch fixes a null pointer derefence for panid handle by move the check for the netlink variable directly before accessing them. Reported-by: syzbot+d4c07de0144f6f63be3a@syzkaller.appspotmail.com Signed-off-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20210228151817.95700-4-aahringo@redhat.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2021-03-02netfilter: nftables: disallow updates on table ownershipPablo Neira Ayuso
Disallow updating the ownership bit on an existing table: Do not allow to grab ownership on an existing table. Do not allow to drop ownership on an existing table. Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-01tcp: add sanity tests to TCP_QUEUE_SEQEric Dumazet
Qingyu Li reported a syzkaller bug where the repro changes RCV SEQ _after_ restoring data in the receive queue. mprotect(0x4aa000, 12288, PROT_READ) = 0 mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000 mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000 mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000 socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3 setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0 setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0x0000000000000003\0\0", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20 setsockopt(3, SOL_TCP, TCP_REPAIR, [0], 4) = 0 setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [128], 4) = 0 recvfrom(3, NULL, 20, 0, NULL, NULL) = -1 ECONNRESET (Connection reset by peer) syslog shows: [ 111.205099] TCP recvmsg seq # bug 2: copied 80, seq 0, rcvnxt 80, fl 0 [ 111.207894] WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0 This should not be allowed. TCP_QUEUE_SEQ should only be used when queues are empty. This patch fixes this case, and the tx path as well. Fixes: ee9952831cfd ("tcp: Initial repair mode") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=212005 Reported-by: Qingyu Li <ieatmuttonchuan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01net: dsa: tag_mtk: fix 802.1ad VLAN egressDENG Qingfang
A different TPID bit is used for 802.1ad VLAN frames. Reported-by: Ilario Gelmetti <iochesonome@gmail.com> Fixes: f0af34317f4b ("net: dsa: mediatek: combine MediaTek tag with VLAN tag") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01net: expand textsearch ts_state to fit skb_seq_stateWillem de Bruijn
The referenced commit expands the skb_seq_state used by skb_find_text with a 4B frag_off field, growing it to 48B. This exceeds container ts_state->cb, causing a stack corruption: [ 73.238353] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: skb_find_text+0xc5/0xd0 [ 73.247384] CPU: 1 PID: 376 Comm: nping Not tainted 5.11.0+ #4 [ 73.252613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 73.260078] Call Trace: [ 73.264677] dump_stack+0x57/0x6a [ 73.267866] panic+0xf6/0x2b7 [ 73.270578] ? skb_find_text+0xc5/0xd0 [ 73.273964] __stack_chk_fail+0x10/0x10 [ 73.277491] skb_find_text+0xc5/0xd0 [ 73.280727] string_mt+0x1f/0x30 [ 73.283639] ipt_do_table+0x214/0x410 The struct is passed between skb_find_text and its callbacks skb_prepare_seq_read, skb_seq_read and skb_abort_seq read through the textsearch interface using TS_SKB_CB. I assumed that this mapped to skb->cb like other .._SKB_CB wrappers. skb->cb is 48B. But it maps to ts_state->cb, which is only 40B. skb->cb was increased from 40B to 48B after ts_state was introduced, in commit 3e3850e989c5 ("[NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder"). Increase ts_state.cb[] to 48 to fit the struct. Also add a BUILD_BUG_ON to avoid a repeat. The alternative is to directly add a dependency from textsearch onto linux/skbuff.h, but I think the intent is textsearch to have no such dependencies on its callers. Link: https://bugzilla.kernel.org/show_bug.cgi?id=211911 Fixes: 97550f6fa592 ("net: compound page support in skb_seq_read") Reported-by: Kris Karas <bugs-a17@moonlit-rail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01inetpeer: use div64_ul() and clamp_val() calculate inet_peer_thresholdYejune Deng
In inet_initpeers(), struct inet_peer on IA32 uses 128 bytes in nowdays. Get rid of the cascade and use div64_ul() and clamp_val() calculate that will not need to be adjusted in the future as suggested by Eric Dumazet. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01net/qrtr: fix __netdev_alloc_skb callPavel Skripkin
syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER. It was caused by a huge length value passed from userspace to qrtr_tun_write_iter(), which tries to allocate skb. Since the value comes from the untrusted source there is no need to raise a warning in __alloc_pages_nodemask(). [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014 Call Trace: __alloc_pages include/linux/gfp.h:511 [inline] __alloc_pages_node include/linux/gfp.h:524 [inline] alloc_pages_node include/linux/gfp.h:538 [inline] kmalloc_large_node+0x60/0x110 mm/slub.c:3999 __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496 __kmalloc_reserve net/core/skbuff.c:150 [inline] __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210 __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446 netdev_alloc_skb include/linux/skbuff.h:2832 [inline] qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442 qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98 call_write_iter include/linux/fs.h:1901 [inline] new_sync_write+0x426/0x650 fs/read_write.c:518 vfs_write+0x791/0xa30 fs/read_write.c:605 ksys_write+0x12d/0x250 fs/read_write.c:658 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Acked-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01net: always use icmp{,v6}_ndo_send from ndo_start_xmitJason A. Donenfeld
There were a few remaining tunnel drivers that didn't receive the prior conversion to icmp{,v6}_ndo_send. Knowing now that this could lead to memory corrution (see ee576c47db60 ("net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending") for details), there's even more imperative to have these all converted. So this commit goes through the remaining cases that I could find and does a boring translation to the ndo variety. The Fixes: line below is the merge that originally added icmp{,v6}_ ndo_send and converted the first batch of icmp{,v6}_send users. The rationale then for the change applies equally to this patch. It's just that these drivers were left out of the initial conversion because these network devices are hiding in net/ rather than in drivers/net/. Cc: Florian Westphal <fw@strlen.de> Cc: Willem de Bruijn <willemb@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Ahern <dsahern@kernel.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Fixes: 803381f9f117 ("Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer'") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01net: dsa: tag_rtl4_a: fix egress tagsDENG Qingfang
Commit 86dd9868b878 has several issues, but was accepted too soon before anyone could take a look. - Double free. dsa_slave_xmit() will free the skb if the xmit function returns NULL, but the skb is already freed by eth_skb_pad(). Use __skb_put_padto() to avoid that. - Unnecessary allocation. It has been done by DSA core since commit a3b0b6479700. - A u16 pointer points to skb data. It should be __be16 for network byte order. - Typo in comments. "numer" -> "number". Fixes: 86dd9868b878 ("net: dsa: tag_rtl4_a: Support also egress tags") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-28net: Fix gro aggregation for udp encaps with zero csumDaniel Borkmann
We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the csum for the UDP header itself is 0. In that case, GRO aggregation does not take place on the phys dev, but instead is deferred to the vxlan/geneve driver (see trace below). The reason is essentially that GRO aggregation bails out in udp_gro_receive() for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e, others) where for non-zero csums 2abb7cdc0dc8 ("udp: Add support for doing checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE and napi context has csum_valid set. This is however not the case for zero UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false). At the same time 57c67ff4bd92 ("udp: additional GRO support") added matches on !uh->check ^ !uh2->check as part to determine candidates for aggregation, so it certainly is expected to handle zero csums in udp_gro_receive(). The purpose of the check added via 662880f44203 ("net: Allow GRO to use and set levels of checksum unnecessary") seems to catch bad csum and stop aggregation right away. One way to fix aggregation in the zero case is to only perform the !csum_valid check in udp_gro_receive() if uh->check is infact non-zero. Before: [...] swapper 0 [008] 731.946506: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100400 len=1500 (1) swapper 0 [008] 731.946507: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100200 len=1500 swapper 0 [008] 731.946507: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101100 len=1500 swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101700 len=1500 swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101b00 len=1500 swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100600 len=1500 swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100f00 len=1500 swapper 0 [008] 731.946509: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100a00 len=1500 swapper 0 [008] 731.946516: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100500 len=1500 swapper 0 [008] 731.946516: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100700 len=1500 swapper 0 [008] 731.946516: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101d00 len=1500 (2) swapper 0 [008] 731.946517: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101000 len=1500 swapper 0 [008] 731.946517: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101c00 len=1500 swapper 0 [008] 731.946517: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101400 len=1500 swapper 0 [008] 731.946518: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100e00 len=1500 swapper 0 [008] 731.946518: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101600 len=1500 swapper 0 [008] 731.946521: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100800 len=774 swapper 0 [008] 731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1) swapper 0 [008] 731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112 (2) [...] # netperf -H 10.55.10.4 -t TCP_STREAM -l 20 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 20.01 13129.24 After: [...] swapper 0 [026] 521.862641: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff93ab0d479000 len=11286 (1) swapper 0 [026] 521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1) swapper 0 [026] 521.862650: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff93ab0d478500 len=2898 (2) swapper 0 [026] 521.862650: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff93ab0d479f00 len=8490 (3) swapper 0 [026] 521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848 (2) swapper 0 [026] 521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440 (3) [...] # netperf -H 10.55.10.4 -t TCP_STREAM -l 20 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 20.01 24576.53 Fixes: 57c67ff4bd92 ("udp: additional GRO support") Fixes: 662880f44203 ("net: Allow GRO to use and set levels of checksum unnecessary") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tom Herbert <tom@herbertland.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20210226212248.8300-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-28ethtool: fix the check logic of at least one channel for RX/TXYinjun Zhang
The command "ethtool -L <intf> combined 0" may clean the RX/TX channel count and skip the error path, since the attrs tb[ETHTOOL_A_CHANNELS_RX_COUNT] and tb[ETHTOOL_A_CHANNELS_TX_COUNT] are NULL in this case when recent ethtool is used. Tested using ethtool v5.10. Fixes: 7be92514b99c ("ethtool: check if there is at least one channel for TX/RX in the core") Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Link: https://lore.kernel.org/r/20210225125102.23989-1-simon.horman@netronome.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-28netfilter: x_tables: gpf inside xt_find_revision()Vasily Averin
nested target/match_revfn() calls work with xt[NFPROTO_UNSPEC] lists without taking xt[NFPROTO_UNSPEC].mutex. This can race with module unload and cause host to crash: general protection fault: 0000 [#1] Modules linked in: ... [last unloaded: xt_cluster] CPU: 0 PID: 542455 Comm: iptables RIP: 0010:[<ffffffff8ffbd518>] [<ffffffff8ffbd518>] strcmp+0x18/0x40 RDX: 0000000000000003 RSI: ffff9a5a5d9abe10 RDI: dead000000000111 R13: ffff9a5a5d9abe10 R14: ffff9a5a5d9abd8c R15: dead000000000100 (VvS: %R15 -- &xt_match, %RDI -- &xt_match.name, xt_cluster unregister match in xt[NFPROTO_UNSPEC].match list) Call Trace: [<ffffffff902ccf44>] match_revfn+0x54/0xc0 [<ffffffff902ccf9f>] match_revfn+0xaf/0xc0 [<ffffffff902cd01e>] xt_find_revision+0x6e/0xf0 [<ffffffffc05a5be0>] do_ipt_get_ctl+0x100/0x420 [ip_tables] [<ffffffff902cc6bf>] nf_getsockopt+0x4f/0x70 [<ffffffff902dd99e>] ip_getsockopt+0xde/0x100 [<ffffffff903039b5>] raw_getsockopt+0x25/0x50 [<ffffffff9026c5da>] sock_common_getsockopt+0x1a/0x20 [<ffffffff9026b89d>] SyS_getsockopt+0x7d/0xf0 [<ffffffff903cbf92>] system_call_fastpath+0x25/0x2a Fixes: 656caff20e1 ("netfilter 04/09: x_tables: fix match/target revision lookup") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>