summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-01-15bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chainingJohn Fastabend
Its possible through a set of push, pop, apply helper calls to construct a skmsg, which is just a ring of scatterlist elements, with the start value larger than the end value. For example, end start |_0_|_1_| ... |_n_|_n+1_| Where end points at 1 and start points and n so that valid elements is the set {n, n+1, 0, 1}. Currently, because we don't build the correct chain only {n, n+1} will be sent. This adds a check and sg_chain call to correctly submit the above to the crypto and tls send path. Fixes: d3b18ad31f93d ("tls: add bpf support to sk_msg handling") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-8-john.fastabend@gmail.com
2020-01-15bpf: Sockmap/tls, tls_sw can create a plaintext buf > encrypt bufJohn Fastabend
It is possible to build a plaintext buffer using push helper that is larger than the allocated encrypt buffer. When this record is pushed to crypto layers this can result in a NULL pointer dereference because the crypto API expects the encrypt buffer is large enough to fit the plaintext buffer. Kernel splat below. To resolve catch the cases this can happen and split the buffer into two records to send individually. Unfortunately, there is still one case to handle where the split creates a zero sized buffer. In this case we merge the buffers and unmark the split. This happens when apply is zero and user pushed data beyond encrypt buffer. This fixes the original case as well because the split allocated an encrypt buffer larger than the plaintext buffer and the merge simply moves the pointers around so we now have a reference to the new (larger) encrypt buffer. Perhaps its not ideal but it seems the best solution for a fixes branch and avoids handling these two cases, (a) apply that needs split and (b) non apply case. The are edge cases anyways so optimizing them seems not necessary unless someone wants later in next branches. [ 306.719107] BUG: kernel NULL pointer dereference, address: 0000000000000008 [...] [ 306.747260] RIP: 0010:scatterwalk_copychunks+0x12f/0x1b0 [...] [ 306.770350] Call Trace: [ 306.770956] scatterwalk_map_and_copy+0x6c/0x80 [ 306.772026] gcm_enc_copy_hash+0x4b/0x50 [ 306.772925] gcm_hash_crypt_remain_continue+0xef/0x110 [ 306.774138] gcm_hash_crypt_continue+0xa1/0xb0 [ 306.775103] ? gcm_hash_crypt_continue+0xa1/0xb0 [ 306.776103] gcm_hash_assoc_remain_continue+0x94/0xa0 [ 306.777170] gcm_hash_assoc_continue+0x9d/0xb0 [ 306.778239] gcm_hash_init_continue+0x8f/0xa0 [ 306.779121] gcm_hash+0x73/0x80 [ 306.779762] gcm_encrypt_continue+0x6d/0x80 [ 306.780582] crypto_gcm_encrypt+0xcb/0xe0 [ 306.781474] crypto_aead_encrypt+0x1f/0x30 [ 306.782353] tls_push_record+0x3b9/0xb20 [tls] [ 306.783314] ? sk_psock_msg_verdict+0x199/0x300 [ 306.784287] bpf_exec_tx_verdict+0x3f2/0x680 [tls] [ 306.785357] tls_sw_sendmsg+0x4a3/0x6a0 [tls] test_sockmap test signature to trigger bug, [TEST]: (1, 1, 1, sendmsg, pass,redir,start 1,end 2,pop (1,2),ktls,): Fixes: d3b18ad31f93d ("tls: add bpf support to sk_msg handling") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-7-john.fastabend@gmail.com
2020-01-15bpf: Sockmap/tls, msg_push_data may leave end mark in placeJohn Fastabend
Leaving an incorrect end mark in place when passing to crypto layer will cause crypto layer to stop processing data before all data is encrypted. To fix clear the end mark on push data instead of expecting users of the helper to clear the mark value after the fact. This happens when we push data into the middle of a skmsg and have room for it so we don't do a set of copies that already clear the end flag. Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-6-john.fastabend@gmail.com
2020-01-15bpf: Sockmap, skmsg helper overestimates push, pull, and pop boundsJohn Fastabend
In the push, pull, and pop helpers operating on skmsg objects to make data writable or insert/remove data we use this bounds check to ensure specified data is valid, /* Bounds checks: start and pop must be inside message */ if (start >= offset + l || last >= msg->sg.size) return -EINVAL; The problem here is offset has already included the length of the current element the 'l' above. So start could be past the end of the scatterlist element in the case where start also points into an offset on the last skmsg element. To fix do the accounting slightly different by adding the length of the previous entry to offset at the start of the iteration. And ensure its initialized to zero so that the first iteration does nothing. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data") Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-5-john.fastabend@gmail.com
2020-01-15bpf: Sockmap/tls, push write_space updates through ulp updatesJohn Fastabend
When sockmap sock with TLS enabled is removed we cleanup bpf/psock state and call tcp_update_ulp() to push updates to TLS ULP on top. However, we don't push the write_space callback up and instead simply overwrite the op with the psock stored previous op. This may or may not be correct so to ensure we don't overwrite the TLS write space hook pass this field to the ULP and have it fixup the ctx. This completes a previous fix that pushed the ops through to the ULP but at the time missed doing this for write_space, presumably because write_space TLS hook was added around the same time. Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
2020-01-15bpf: Sockmap, ensure sock lock held during tear downJohn Fastabend
The sock_map_free() and sock_hash_free() paths used to delete sockmap and sockhash maps walk the maps and destroy psock and bpf state associated with the socks in the map. When done the socks no longer have BPF programs attached and will function normally. This can happen while the socks in the map are still "live" meaning data may be sent/received during the walk. Currently, though we don't take the sock_lock when the psock and bpf state is removed through this path. Specifically, this means we can be writing into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc. while they are also being called from the networking side. This is not safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we believed it was safe. Further its not clear to me its even a good idea to try and do this on "live" sockets while networking side might also be using the socket. Instead of trying to reason about using the socks from both sides lets realize that every use case I'm aware of rarely deletes maps, in fact kubernetes/Cilium case builds map at init and never tears it down except on errors. So lets do the simple fix and grab sock lock. This patch wraps sock deletes from maps in sock lock and adds some annotations so we catch any other cases easier. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-3-john.fastabend@gmail.com
2020-01-15Merge tag 'batadv-net-for-davem-20200114' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Fix DAT candidate selection on little endian systems, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15tcp: fix marked lost packets not being retransmittedPengcheng Yang
When the packet pointed to by retransmit_skb_hint is unlinked by ACK, retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue(). If packet loss is detected at this time, retransmit_skb_hint will be set to point to the current packet loss in tcp_verify_retransmit_hint(), then the packets that were previously marked lost but not retransmitted due to the restriction of cwnd will be skipped and cannot be retransmitted. To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can be reset only after all marked lost packets are retransmitted (retrans_out >= lost_out), otherwise we need to traverse from tcp_rtx_queue_head in tcp_xmit_retransmit_queue(). Packetdrill to demonstrate: // Disable RACK and set max_reordering to keep things simple 0 `sysctl -q net.ipv4.tcp_recovery=0` +0 `sysctl -q net.ipv4.tcp_max_reordering=3` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.01 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 8 data segments +0 write(4, ..., 8000) = 8000 +0 > P. 1:8001(8000) ack 1 // Enter recovery and 1:3001 is marked lost +.01 < . 1:1(0) ack 1 win 257 <sack 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:6001 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:7001 3001:4001,nop,nop> // Retransmit 1:1001, now retransmit_skb_hint points to 1001:2001 +0 > . 1:1001(1000) ack 1 // 1001:2001 was ACKed causing retransmit_skb_hint to be set to NULL +.01 < . 1:1(0) ack 2001 win 257 <sack 5001:8001 3001:4001,nop,nop> // Now retransmit_skb_hint points to 4001:5001 which is now marked lost // BUG: 2001:3001 was not retransmitted +0 > . 2001:3001(1000) ack 1 Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Merge tag 'mac80211-for-net-2020-01-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A few fixes: * -O3 enablement fallout, thanks to Arnd who ran this * fixes for a few leaks, thanks to Felix * channel 12 regulatory fix for custom regdomains * check for a crash reported by syzbot (NULL function is called on drivers that don't have it) * fix TKIP replay protection after setup with some APs (from Jouni) * restrict obtaining some mesh data to avoid WARN_ONs * fix deadlocks with auto-disconnect (socket owner) * fix radar detection events with multiple devices ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15xfrm: support output_mark for offload ESP packetsUlrich Weber
Commit 9b42c1f179a6 ("xfrm: Extend the output_mark") added output_mark support but missed ESP offload support. xfrm_smark_get() is not called within xfrm_input() for packets coming from esp4_gro_receive() or esp6_gro_receive(). Therefore call xfrm_smark_get() directly within these functions. Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking.") Signed-off-by: Ulrich Weber <ulrich.weber@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-01-15cfg80211: fix page refcount issue in A-MSDU decapFelix Fietkau
The fragments attached to a skb can be part of a compound page. In that case, page_ref_inc will increment the refcount for the wrong page. Fix this by using get_page instead, which calls page_ref_inc on the compound head and also checks for overflow. Fixes: 2b67f944f88c ("cfg80211: reuse existing page fragments in A-MSDU rx") Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200113182107.20461-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: check for set_wiphy_paramsJohannes Berg
Check if set_wiphy_params is assigned and return an error if not, some drivers (e.g. virt_wifi where syzbot reported it) don't have it. Reported-by: syzbot+e8a797964a4180eb57d5@syzkaller.appspotmail.com Reported-by: syzbot+34b582cf32c1db008f8e@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200113125358.ac07f276efff.Ibd85ee1b12e47b9efb00a2adc5cd3fac50da791a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix memory leak in cfg80211_cqm_rssi_updateFelix Fietkau
The per-tid statistics need to be released after the call to rdev_get_station Cc: stable@vger.kernel.org Fixes: 8689c051a201 ("cfg80211: dynamically allocate per-tid stats for station info") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200108170630.33680-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix memory leak in nl80211_probe_mesh_linkFelix Fietkau
The per-tid statistics need to be released after the call to rdev_get_station Cc: stable@vger.kernel.org Fixes: 5ab92e7fe49a ("cfg80211: add support to probe unexercised mesh link") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200108170630.33680-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: fix deadlocks in autodisconnect workMarkus Theil
Use methods which do not try to acquire the wdev lock themselves. Cc: stable@vger.kernel.org Fixes: 37b1c004685a3 ("cfg80211: Support all iftypes in autodisconnect_wk") Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200108115536.2262-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15wireless: wext: avoid gcc -O3 warningArnd Bergmann
After the introduction of CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3, the wext code produces a bogus warning: In function 'iw_handler_get_iwstats', inlined from 'ioctl_standard_call' at net/wireless/wext-core.c:1015:9, inlined from 'wireless_process_ioctl' at net/wireless/wext-core.c:935:10, inlined from 'wext_ioctl_dispatch.part.8' at net/wireless/wext-core.c:986:8, inlined from 'wext_handle_ioctl': net/wireless/wext-core.c:671:3: error: argument 1 null where non-null expected [-Werror=nonnull] memcpy(extra, stats, sizeof(struct iw_statistics)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from arch/x86/include/asm/string.h:5, net/wireless/wext-core.c: In function 'wext_handle_ioctl': arch/x86/include/asm/string_64.h:14:14: note: in a call to function 'memcpy' declared here The problem is that ioctl_standard_call() sometimes calls the handler with a NULL argument that would cause a problem for iw_handler_get_iwstats. However, iw_handler_get_iwstats never actually gets called that way. Marking that function as noinline avoids the warning and leads to slightly smaller object code as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107200741.3588770-1-arnd@arndb.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15mac80211: Fix TKIP replay protection immediately after key setupJouni Malinen
TKIP replay protection was skipped for the very first frame received after a new key is configured. While this is potentially needed to avoid dropping a frame in some cases, this does leave a window for replay attacks with group-addressed frames at the station side. Any earlier frame sent by the AP using the same key would be accepted as a valid frame and the internal RSC would then be updated to the TSC from that frame. This would allow multiple previously transmitted group-addressed frames to be replayed until the next valid new group-addressed frame from the AP is received by the station. Fix this by limiting the no-replay-protection exception to apply only for the case where TSC=0, i.e., when this is for the very first frame protected using the new key, and the local RSC had not been set to a higher value when configuring the key (which may happen with GTK). Signed-off-by: Jouni Malinen <j@w1.fi> Link: https://lore.kernel.org/r/20200107153545.10934-1-j@w1.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15cfg80211: Fix radar event during another phy CACOrr Mazor
In case a radar event of CAC_FINISHED or RADAR_DETECTED happens during another phy is during CAC we might need to cancel that CAC. If we got a radar in a channel that another phy is now doing CAC on then the CAC should be canceled there. If, for example, 2 phys doing CAC on the same channels, or on comptable channels, once on of them will finish his CAC the other might need to cancel his CAC, since it is no longer relevant. To fix that the commit adds an callback and implement it in mac80211 to end CAC. This commit also adds a call to said callback if after a radar event we see the CAC is no longer relevant Signed-off-by: Orr Mazor <Orr.Mazor@tandemg.com> Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20191222145449.15792-1-Orr.Mazor@tandemg.com [slightly reformat/reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15wireless: fix enabling channel 12 for custom regulatory domainGanapathi Bhat
Commit e33e2241e272 ("Revert "cfg80211: Use 5MHz bandwidth by default when checking usable channels"") fixed a broken regulatory (leaving channel 12 open for AP where not permitted). Apply a similar fix to custom regulatory domain processing. Signed-off-by: Cathy Luo <xiaohua.luo@nxp.com> Signed-off-by: Ganapathi Bhat <ganapathi.bhat@nxp.com> Link: https://lore.kernel.org/r/1576836859-8945-1-git-send-email-ganapathi.bhat@nxp.com [reword commit message, fix coding style, add a comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-14hv_sock: Remove the accept port restrictionSunil Muthuswamy
Currently, hv_sock restricts the port the guest socket can accept connections on. hv_sock divides the socket port namespace into two parts for server side (listening socket), 0-0x7FFFFFFF & 0x80000000-0xFFFFFFFF (there are no restrictions on client port namespace). The first part (0-0x7FFFFFFF) is reserved for sockets where connections can be accepted. The second part (0x80000000-0xFFFFFFFF) is reserved for allocating ports for the peer (host) socket, once a connection is accepted. This reservation of the port namespace is specific to hv_sock and not known by the generic vsock library (ex: af_vsock). This is problematic because auto-binds/ephemeral ports are handled by the generic vsock library and it has no knowledge of this port reservation and could allocate a port that is not compatible with hv_sock (and legitimately so). The issue hasn't surfaced so far because the auto-bind code of vsock (__vsock_bind_stream) prior to the change 'VSOCK: bind to random port for VMADDR_PORT_ANY' would start walking up from LAST_RESERVED_PORT (1023) and start assigning ports. That will take a large number of iterations to hit 0x7FFFFFFF. But, after the above change to randomize port selection, the issue has started coming up more frequently. There has really been no good reason to have this port reservation logic in hv_sock from the get go. Reserving a local port for peer ports is not how things are handled generally. Peer ports should reflect the peer port. This fixes the issue by lifting the port reservation, and also returns the right peer port. Since the code converts the GUID to the peer port (by using the first 4 bytes), there is a possibility of conflicts, but that seems like a reasonable risk to take, given this is limited to vsock and that only applies to all local sockets. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14xprtrdma: Fix oops in Receive handler after device removalChuck Lever
Since v5.4, a device removal occasionally triggered this oops: Dec 2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219 Dec 2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode Dec 2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page Dec 2 17:13:53 manet kernel: PGD 0 P4D 0 Dec 2 17:13:53 manet kernel: Oops: 0000 [#1] SMP Dec 2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G W 5.4.0-00050-g53717e43af61 #883 Dec 2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015 Dec 2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] Dec 2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma] Dec 2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0 Dec 2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246 Dec 2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048 Dec 2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001 Dec 2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000 Dec 2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428 Dec 2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010 Dec 2 17:13:53 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000 Dec 2 17:13:53 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0 Dec 2 17:13:53 manet kernel: Call Trace: Dec 2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core] Dec 2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core] Dec 2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf Dec 2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf Dec 2 17:13:53 manet kernel: kthread+0xf4/0xf9 Dec 2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74 Dec 2 17:13:53 manet kernel: ret_from_fork+0x24/0x30 The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that is still pointing to the old ib_device, which has been freed. The only way that is possible is if this rpcrdma_rep was not destroyed by rpcrdma_ia_remove. Debugging showed that was indeed the case: this rpcrdma_rep was still in use by a completing RPC at the time of the device removal, and thus wasn't on the rep free list. So, it was not found by rpcrdma_reps_destroy(). The fix is to introduce a list of all rpcrdma_reps so that they all can be found when a device is removed. That list is used to perform only regbuf DMA unmapping, replacing that call to rpcrdma_reps_destroy(). Meanwhile, to prevent corruption of this list, I've moved the destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs(). rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus protecting the rb_all_reps list. Fixes: b0b227f071a0 ("xprtrdma: Use an llist to manage free rpcrdma_reps") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-14xprtrdma: Fix completion wait during device removalChuck Lever
I've found that on occasion, "rmmod <dev>" will hang while if an NFS is under load. Ensure that ri_remove_done is initialized only just before the transport is woken up to force a close. This avoids the completion possibly getting initialized again while the CM event handler is waiting for a wake-up. Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from under an NFS mount") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-14xprtrdma: Fix create_qp crash on device unloadChuck Lever
On device re-insertion, the RDMA device driver crashes trying to set up a new QP: Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0 Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page Nov 27 16:32:06 manet kernel: PGD 0 P4D 0 Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G W 5.4.0 #852 Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015 Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma] Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12 Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 <f0> 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00 Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046 Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000 Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0 Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000 Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8 Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000 Nov 27 16:32:06 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000 Nov 27 16:32:06 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0 Nov 27 16:32:06 manet kernel: Call Trace: Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib] Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139 Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib] The fix is to copy the qp_init_attr struct that was just created by rpcrdma_ep_create() instead of using the one from the previous connection instance. Fixes: 98ef77d1aaa7 ("xprtrdma: Send Queue size grows after a reconnect") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-14xfrm: interface: do not confirm neighbor when do pmtu updateXu Wang
When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-01-14xfrm interface: fix packet tx through bpf_redirect()Nicolas Dichtel
With an ebpf program that redirects packets through a xfrm interface, packets are dropped because no dst is attached to skb. This could also be reproduced with an AF_PACKET socket, with the following python script (xfrm1 is a xfrm interface): import socket send_s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, 0) # scapy # p = IP(src='10.100.0.2', dst='10.200.0.1')/ICMP(type='echo-request') # raw(p) req = b'E\x00\x00\x1c\x00\x01\x00\x00@\x01e\xb2\nd\x00\x02\n\xc8\x00\x01\x08\x00\xf7\xff\x00\x00\x00\x00' send_s.sendto(req, ('xfrm1', 0x800, 0, 0)) It was also not possible to send an ip packet through an AF_PACKET socket because a LL header was expected. Let's remove those LL header constraints. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-01-14vti[6]: fix packet tx through bpf_redirect()Nicolas Dichtel
With an ebpf program that redirects packets through a vti[6] interface, the packets are dropped because no dst is attached. This could also be reproduced with an AF_PACKET socket, with the following python script (vti1 is an ip_vti interface): import socket send_s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, 0) # scapy # p = IP(src='10.100.0.2', dst='10.200.0.1')/ICMP(type='echo-request') # raw(p) req = b'E\x00\x00\x1c\x00\x01\x00\x00@\x01e\xb2\nd\x00\x02\n\xc8\x00\x01\x08\x00\xf7\xff\x00\x00\x00\x00' send_s.sendto(req, ('vti1', 0x800, 0, 0)) Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-01-13netfilter: arp_tables: init netns pointer in xt_tgdtor_param structFlorian Westphal
An earlier commit (1b789577f655060d98d20e, "netfilter: arp_tables: init netns pointer in xt_tgchk_param struct") fixed missing net initialization for arptables, but turns out it was incomplete. We can get a very similar struct net NULL deref during error unwinding: general protection fault: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:xt_rateest_put+0xa1/0x440 net/netfilter/xt_RATEEST.c:77 xt_rateest_tg_destroy+0x72/0xa0 net/netfilter/xt_RATEEST.c:175 cleanup_entry net/ipv4/netfilter/arp_tables.c:509 [inline] translate_table+0x11f4/0x1d80 net/ipv4/netfilter/arp_tables.c:587 do_replace net/ipv4/netfilter/arp_tables.c:981 [inline] do_arpt_set_ctl+0x317/0x650 net/ipv4/netfilter/arp_tables.c:1461 Also init the netns pointer in xt_tgdtor_param struct. Fixes: add67461240c1d ("netfilter: add struct net * to target parameters") Reported-by: syzbot+91bdd8eece0f6629ec8b@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-13netfilter: fix a use-after-free in mtype_destroy()Cong Wang
map->members is freed by ip_set_free() right before using it in mtype_ext_cleanup() again. So we just have to move it down. Reported-by: syzbot+4c3cc6dbe7259dbf9054@syzkaller.appspotmail.com Fixes: 40cd63bf33b2 ("netfilter: ipset: Support extensions which need a per data destroy function") Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-11devlink: correct misspelling of snapshotJacob Keller
The function to obtain a unique snapshot id was mistakenly typo'd as devlink_region_shapshot_id_get. Fix this typo by renaming the function and all of its users. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-10devlink: Wait longer before warning about unset port typeIdo Schimmel
The commit cited below causes devlink to emit a warning if a type was not set on a devlink port for longer than 30 seconds to "prevent misbehavior of drivers". This proved to be problematic when unregistering the backing netdev. The flow is always: devlink_port_type_clear() // schedules the warning unregister_netdev() // blocking devlink_port_unregister() // cancels the warning The call to unregister_netdev() can block for long periods of time for various reasons: RTNL lock is contended, large amounts of configuration to unroll following dismantle of the netdev, etc. This results in devlink emitting a warning despite the driver behaving correctly. In emulated environments (of future hardware) which are usually very slow, the warning can also be emitted during port creation as more than 30 seconds can pass between the time the devlink port is registered and when its type is set. In addition, syzbot has hit this warning [1] 1974 times since 07/11/19 without being able to produce a reproducer. Probably because reproduction depends on the load or other bugs (e.g., RTNL not being released). To prevent bogus warnings, increase the timeout to 1 hour. [1] https://syzkaller.appspot.com/bug?id=e99b59e9c024a666c9f7450dc162a4b74d09d9cb Fixes: 136bf27fc0e9 ("devlink: add warning in case driver does not set port type") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: syzbot+b0a18ed7b08b735d2f41@syzkaller.appspotmail.com Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-10ipv4: Detect rollover in specific fib table dumpDavid Ahern
Sven-Haegar reported looping on fib dumps when 255.255.255.255 route has been added to a table. The looping is caused by the key rolling over from FFFFFFFF to 0. When dumping a specific table only, we need a means to detect when the table dump is done. The key and count saved to cb args are both 0 only at the start of the table dump. If key is 0 and count > 0, then we are in the rollover case. Detect and return to avoid looping. This only affects dumps of a specific table; for dumps of all tables (the case prior to the change in the Fixes tag) inet_dump_fib moved the entry counter to the next table and reset the cb args used by fib_table_dump and fn_trie_dump_leaf, so the rollover ffffffff back to 0 did not cause looping with the dumps. Fixes: effe67926624 ("net: Enable kernel side filtering of route dumps") Reported-by: Sven-Haegar Koch <haegar@sdinet.de> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-10net/tls: fix async operationJakub Kicinski
Mallesham reports the TLS with async accelerator was broken by commit d10523d0b3d7 ("net/tls: free the record on encryption error") because encryption can return -EINPROGRESS in such setups, which should not be treated as an error. The error is also present in the BPF path (likely copied from there). Reported-by: Mallesham Jatharakonda <mallesham.jatharakonda@oneconvergence.com> Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Fixes: d10523d0b3d7 ("net/tls: free the record on encryption error") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-10net/tls: avoid spurious decryption error with HW resyncJakub Kicinski
When device loses sync mid way through a record - kernel has to re-encrypt the part of the record which the device already decrypted to be able to decrypt and authenticate the record in its entirety. The re-encryption piggy backs on the decryption routine, but obviously because the partially decrypted record can't be authenticated crypto API returns an error which is then ignored by tls_device_reencrypt(). Commit 5c5ec6685806 ("net/tls: add TlsDecryptError stat") added a statistic to count decryption errors, this statistic can't be incremented when we see the expected re-encryption error. Move the inc to the caller. Reported-and-tested-by: David Beckett <david.beckett@netronome.com> Fixes: 5c5ec6685806 ("net/tls: add TlsDecryptError stat") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-10net: bpf: Don't leak time wait and request socketsLorenz Bauer
It's possible to leak time wait and request sockets via the following BPF pseudo code:   sk = bpf_skc_lookup_tcp(...) if (sk) bpf_sk_release(sk) If sk->sk_state is TCP_NEW_SYN_RECV or TCP_TIME_WAIT the refcount taken by bpf_skc_lookup_tcp is not undone by bpf_sk_release. This is because sk_flags is re-used for other data in both kinds of sockets. The check !sock_flag(sk, SOCK_RCU_FREE) therefore returns a bogus result. Check that sk_flags is valid by calling sk_fullsock. Skip checking SOCK_RCU_FREE if we already know that sk is not a full socket. Fixes: edbf8c01de5a ("bpf: add skc_lookup_tcp helper") Fixes: f7355a6c0497 ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110132336.26099-1-lmb@cloudflare.com
2020-01-09net/x25: fix nonblocking connectMartin Schiller
This patch fixes 2 issues in x25_connect(): 1. It makes absolutely no sense to reset the neighbour and the connection state after a (successful) nonblocking call of x25_connect. This prevents any connection from being established, since the response (call accept) cannot be processed. 2. Any further calls to x25_connect() while a call is pending should simply return, instead of creating new Call Request (on different logical channels). This patch should also fix the "KASAN: null-ptr-deref Write in x25_connect" and "BUG: unable to handle kernel NULL pointer dereference in x25_connect" bugs reported by syzbot. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Reported-by: syzbot+429c200ffc8772bfe070@syzkaller.appspotmail.com Reported-by: syzbot+eec0c87f31a7c3b66f7b@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-09bpf/sockmap: Read psock ingress_msg before sk_receive_queueLingpeng Chen
Right now in tcp_bpf_recvmsg, sock read data first from sk_receive_queue if not empty than psock->ingress_msg otherwise. If a FIN packet arrives and there's also some data in psock->ingress_msg, the data in psock->ingress_msg will be purged. It is always happen when request to a HTTP1.0 server like python SimpleHTTPServer since the server send FIN packet after data is sent out. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: Arika Chen <eaglesora@gmail.com> Suggested-by: Arika Chen <eaglesora@gmail.com> Signed-off-by: Lingpeng Chen <forrest0579@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200109014833.18951-1-forrest0579@gmail.com
2020-01-08tipc: fix wrong connect() return codeTuong Lien
The current 'tipc_wait_for_connect()' function does a wait-loop for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the socket connecting has done. However, when the condition is met, it returns '0' even in the case the connecting is actually failed, the socket state is set to 'TIPC_DISCONNECTING' (e.g. when the server socket has closed..). This results in a wrong return code for the 'connect()' call from user, making it believe that the connection is established and go ahead with building, sending a message, etc. but finally failed e.g. '-EPIPE'. This commit fixes the issue by changing the wait condition to the 'tipc_sk_connected(sk)', so the function will return '0' only when the connection is really established. Otherwise, either the socket 'sk_err' if any or '-ETIMEDOUT'/'-EINTR' will be returned correspondingly. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08tipc: fix link overflow issue at socket shutdownTuong Lien
When a socket is suddenly shutdown or released, it will reject all the unreceived messages in its receive queue. This applies to a connected socket too, whereas there is only one 'FIN' message required to be sent back to its peer in this case. In case there are many messages in the queue and/or some connections with such messages are shutdown at the same time, the link layer will easily get overflowed at the 'TIPC_SYSTEM_IMPORTANCE' backlog level because of the message rejections. As a result, the link will be taken down. Moreover, immediately when the link is re-established, the socket layer can continue to reject the messages and the same issue happens... The commit refactors the '__tipc_shutdown()' function to only send one 'FIN' in the situation mentioned above. For the connectionless case, it is unavoidable but usually there is no rejections for such socket messages because they are 'dest-droppable' by default. In addition, the new code makes the other socket states clear (e.g.'TIPC_LISTEN') and treats as a separate case to avoid misbehaving. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Missing netns context in arp_tables, from Florian Westphal. 2) Underflow in flowtable reference counter, from wenxu. 3) Fix incorrect ethernet destination address in flowtable offload, from wenxu. 4) Check for status of neighbour entry, from wenxu. 5) Fix NAT port mangling, from wenxu. 6) Unbind callbacks from destroy path to cleanup hardware properly on flowtable removal. 7) Fix missing casting statistics timestamp, add nf_flowtable_time_stamp and use it. 8) NULL pointer exception when timeout argument is null in conntrack dccp and sctp protocol helpers, from Florian Westphal. 9) Possible nul-dereference in ipset with IPSET_ATTR_LINENO, also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is presentFlorian Westphal
The set uadt functions assume lineno is never NULL, but it is in case of ip_set_utest(). syzkaller managed to generate a netlink message that calls this with LINENO attr present: general protection fault: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:hash_mac4_uadt+0x1bc/0x470 net/netfilter/ipset/ip_set_hash_mac.c:104 Call Trace: ip_set_utest+0x55b/0x890 net/netfilter/ipset/ip_set_core.c:1867 nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 nfnetlink_rcv+0x1ba/0x460 net/netfilter/nfnetlink.c:563 pass a dummy lineno storage, its easier than patching all set implementations. This seems to be a day-0 bug. Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Reported-by: syzbot+34bd2369d38707f3f4a7@syzkaller.appspotmail.com Fixes: a7b4f989a6294 ("netfilter: ipset: IP set core support") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-08netfilter: conntrack: dccp, sctp: handle null timeout argumentFlorian Westphal
The timeout pointer can be NULL which means we should modify the per-nets timeout instead. All do this, except sctp and dccp which instead give: general protection fault: 0000 [#1] PREEMPT SMP KASAN net/netfilter/nf_conntrack_proto_dccp.c:682 ctnl_timeout_parse_policy+0x150/0x1d0 net/netfilter/nfnetlink_cttimeout.c:67 cttimeout_default_set+0x150/0x1c0 net/netfilter/nfnetlink_cttimeout.c:368 nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 Reported-by: syzbot+46a4ad33f345d1dd346e@syzkaller.appspotmail.com Fixes: c779e849608a8 ("netfilter: conntrack: remove get_timeout() indirection") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-08net: sch_prio: When ungrafting, replace with FIFOPetr Machata
When a child Qdisc is removed from one of the PRIO Qdisc's bands, it is replaced unconditionally by a NOOP qdisc. As a result, any traffic hitting that band gets dropped. That is incorrect--no Qdisc was explicitly added when PRIO was created, and after removal, none should have to be added either. Fix PRIO by first attempting to create a default Qdisc and only falling back to noop when that fails. This pattern of attempting to create an invisible FIFO, using NOOP only as a fallback, is also seen in other Qdiscs. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08pkt_sched: fq: do not accept silly TCA_FQ_QUANTUMEric Dumazet
As diagnosed by Florian : If TCA_FQ_QUANTUM is set to 0x80000000, fq_deueue() can loop forever in : if (f->credit <= 0) { f->credit += q->quantum; goto begin; } ... because f->credit is either 0 or -2147483648. Let's limit TCA_FQ_QUANTUM to no more than 1 << 20 : This max value should limit risks of breaking user setups while fixing this bug. Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Florian Westphal <fw@strlen.de> Reported-by: syzbot+dc9071cc5a85950bdfce@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08tipc: remove meaningless assignment in MakefileMasahiro Yamada
There is no module named tipc_diag. The assignment to tipc_diag-y has no effect. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08tipc: do not add socket.o to tipc-y twiceMasahiro Yamada
net/tipc/Makefile adds socket.o twice. tipc-y += addr.o bcast.o bearer.o \ core.o link.o discover.o msg.o \ name_distr.o subscr.o monitor.o name_table.o net.o \ netlink.o netlink_compat.o node.o socket.o eth_media.o \ ^^^^^^^^ topsrv.o socket.o group.o trace.o ^^^^^^^^ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07vlan: vlan_changelink() should propagate errorsEric Dumazet
Both vlan_dev_change_flags() and vlan_dev_set_egress_priority() can return an error. vlan_changelink() should not ignore them. Fixes: 07b5b17e157b ("[VLAN]: Use rtnl_link API") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07vlan: fix memory leak in vlan_dev_set_egress_priorityEric Dumazet
There are few cases where the ndo_uninit() handler might be not called if an error happens while device is initialized. Since vlan_newlink() calls vlan_changelink() before trying to register the netdevice, we need to make sure vlan_dev_uninit() has been called at least once, or we might leak allocated memory. BUG: memory leak unreferenced object 0xffff888122a206c0 (size 32): comm "syz-executor511", pid 7124, jiffies 4294950399 (age 32.240s) hex dump (first 32 bytes): 00 00 00 00 00 00 61 73 00 00 00 00 00 00 00 00 ......as........ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000000eb3bb85>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<000000000eb3bb85>] slab_post_alloc_hook mm/slab.h:586 [inline] [<000000000eb3bb85>] slab_alloc mm/slab.c:3320 [inline] [<000000000eb3bb85>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549 [<000000007b99f620>] kmalloc include/linux/slab.h:556 [inline] [<000000007b99f620>] vlan_dev_set_egress_priority+0xcc/0x150 net/8021q/vlan_dev.c:194 [<000000007b0cb745>] vlan_changelink+0xd6/0x140 net/8021q/vlan_netlink.c:126 [<0000000065aba83a>] vlan_newlink+0x135/0x200 net/8021q/vlan_netlink.c:181 [<00000000fb5dd7a2>] __rtnl_newlink+0x89a/0xb80 net/core/rtnetlink.c:3305 [<00000000ae4273a1>] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3363 [<00000000decab39f>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424 [<00000000accba4ee>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477 [<00000000319fe20f>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442 [<00000000d51938dc>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] [<00000000d51938dc>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328 [<00000000e539ac79>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917 [<000000006250c27e>] sock_sendmsg_nosec net/socket.c:639 [inline] [<000000006250c27e>] sock_sendmsg+0x54/0x70 net/socket.c:659 [<00000000e2a156d1>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330 [<000000008c87466e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384 [<00000000110e3054>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417 [<00000000d71077c8>] __do_sys_sendmsg net/socket.c:2426 [inline] [<00000000d71077c8>] __se_sys_sendmsg net/socket.c:2424 [inline] [<00000000d71077c8>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424 Fixe: 07b5b17e157b ("[VLAN]: Use rtnl_link API") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLYXin Long
This patch is to fix a memleak caused by no place to free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY. This issue occurs when failing to process a cmd while there're still SCTP_CMD_REPLY cmds on the cmd seq with an allocated chunk in cmd->obj.chunk. So fix it by freeing cmd->obj.chunk for each SCTP_CMD_REPLY cmd left on the cmd seq when any cmd returns error. While at it, also remove 'nomem' label. Reported-by: syzbot+107c4aff5f392bf1517f@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06tipc: eliminate KMSAN: uninit-value in __tipc_nl_compat_dumpit errorYing Xue
syzbot found the following crash on: ===================================================== BUG: KMSAN: uninit-value in __nlmsg_parse include/net/netlink.h:661 [inline] BUG: KMSAN: uninit-value in nlmsg_parse_deprecated include/net/netlink.h:706 [inline] BUG: KMSAN: uninit-value in __tipc_nl_compat_dumpit+0x553/0x11e0 net/tipc/netlink_compat.c:215 CPU: 0 PID: 12425 Comm: syz-executor062 Not tainted 5.5.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108 __msan_warning+0x57/0xa0 mm/kmsan/kmsan_instr.c:245 __nlmsg_parse include/net/netlink.h:661 [inline] nlmsg_parse_deprecated include/net/netlink.h:706 [inline] __tipc_nl_compat_dumpit+0x553/0x11e0 net/tipc/netlink_compat.c:215 tipc_nl_compat_dumpit+0x761/0x910 net/tipc/netlink_compat.c:308 tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline] tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311 genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline] genl_family_rcv_msg net/netlink/genetlink.c:717 [inline] genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 genl_rcv+0x63/0x80 net/netlink/genetlink.c:745 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x444179 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b d8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffd2d6409c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000444179 RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003 RBP: 00000000006ce018 R08: 0000000000000000 R09: 00000000004002e0 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401e20 R13: 0000000000401eb0 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline] kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:86 slab_alloc_node mm/slub.c:2774 [inline] __kmalloc_node_track_caller+0xe47/0x11f0 mm/slub.c:4382 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x309/0xa50 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] nlmsg_new include/net/netlink.h:888 [inline] tipc_nl_compat_dumpit+0x6e4/0x910 net/tipc/netlink_compat.c:301 tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline] tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311 genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline] genl_family_rcv_msg net/netlink/genetlink.c:717 [inline] genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 genl_rcv+0x63/0x80 net/netlink/genetlink.c:745 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ===================================================== The complaint above occurred because the memory region pointed by attrbuf variable was not initialized. To eliminate this warning, we use kcalloc() rather than kmalloc_array() to allocate memory for attrbuf. Reported-by: syzbot+b1fd2bf2c89d8407e15f@syzkaller.appspotmail.com Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06netfilter: flowtable: add nf_flowtable_time_stampPablo Neira Ayuso
This patch adds nf_flowtable_time_stamp and updates the existing code to use it. This patch is also implicitly fixing up hardware statistic fetching via nf_flow_offload_stats() where casting to u32 is missing. Use nf_flow_timeout_delta() to fix this. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: wenxu <wenxu@ucloud.cn>