Age | Commit message (Collapse) | Author |
|
A B.A.T.M.A.N. V virtual interface has an OGM2 packet buffer which is
initialized using data from the netdevice notifier and other rtnetlink
related hooks. It is sent regularly via various slave interfaces of the
batadv virtual interface and in this process also modified (realloced) to
integrate additional state information via TVLV containers.
It must be avoided that the worker item is executed without a common lock
with the netdevice notifier/rtnetlink helpers. Otherwise it can either
happen that half modified data is sent out or the functions modifying the
OGM2 buffer try to access already freed memory regions.
Fixes: 0da0035942d4 ("batman-adv: OGMv2 - add basic infrastructure")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
A few more small things, nothing really stands out:
* minstrel improvements from Felix
* a TX aggregation simplification
* some additional capabilities for hwsim
* minor cleanups & docs updates
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing
to a separate function") moved attribute buffer allocation and attribute
parsing from genl_family_rcv_msg_doit() into a separate function
genl_family_rcv_msg_attrs_parse() which, unlike the previous code, calls
__nlmsg_parse() even if family->maxattr is 0 (i.e. the family does its own
parsing). The parser error is ignored and does not propagate out of
genl_family_rcv_msg_attrs_parse() but an error message ("Unknown attribute
type") is set in extack and if further processing generates no error or
warning, it stays there and is interpreted as a warning by userspace.
Dumpit requests are not affected as genl_family_rcv_msg_dumpit() bypasses
the call of genl_family_rcv_msg_attrs_parse() if family->maxattr is zero.
Move this logic inside genl_family_rcv_msg_attrs_parse() so that we don't
have to handle it in each caller.
v3: put the check inside genl_family_rcv_msg_attrs_parse()
v2: adjust also argument of genl_family_rcv_msg_attrs_free()
Fixes: c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to a separate function")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcp_zerocopy_receive() rounds down the zc->length a multiple of
PAGE_SIZE. This results in two issues:
- tcp_zerocopy_receive sets recv_skip_hint to the length of the
receive queue if the zc->length input is smaller than the
PAGE_SIZE, even though the data in receive queue could be
zerocopied.
- tcp_zerocopy_receive would set recv_skip_hint of 0, in cases
where we have a little bit of data after the perfectly-sized
packets.
To fix these issues, do not store the rounded down value in
zc->length. Round down the length passed to zap_page_range(),
and return min(inq, zc->length) when the zap_range is 0.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_wmem_queued while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
sk_wmem_queued_add() helper is added so that we can in
the future convert to ADD_ONCE() or equivalent if/when
available.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_sndbuf while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
Note that other transports probably need similar fixes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_rcvbuf while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
Note that other transports probably need similar fixes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There two places where we fetch tp->urg_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write side use corresponding WRITE_ONCE() to avoid
store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are few places where we fetch tp->snd_nxt while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are few places where we fetch tp->write_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are few places where we fetch tp->copied_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Note that tcp_inq_hint() was already using READ_ONCE(tp->copied_seq)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are few places where we fetch tp->rcv_nxt while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Note that tcp_inq_hint() was already using READ_ONCE(tp->rcv_nxt)
syzbot reported :
BUG: KCSAN: data-race in tcp_poll / tcp_queue_rcv
write to 0xffff888120425770 of 4 bytes by interrupt on cpu 0:
tcp_rcv_nxt_update net/ipv4/tcp_input.c:3365 [inline]
tcp_queue_rcv+0x180/0x380 net/ipv4/tcp_input.c:4638
tcp_rcv_established+0xbf1/0xf50 net/ipv4/tcp_input.c:5616
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
tcp_v4_rcv+0x1a03/0x1bf0 net/ipv4/tcp_ipv4.c:1923
ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
napi_skb_finish net/core/dev.c:5671 [inline]
napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
read to 0xffff888120425770 of 4 bytes by task 7254 on cpu 1:
tcp_stream_is_readable net/ipv4/tcp.c:480 [inline]
tcp_poll+0x204/0x6b0 net/ipv4/tcp.c:554
sock_poll+0xed/0x250 net/socket.c:1256
vfs_poll include/linux/poll.h:90 [inline]
ep_item_poll.isra.0+0x90/0x190 fs/eventpoll.c:892
ep_send_events_proc+0x113/0x5c0 fs/eventpoll.c:1749
ep_scan_ready_list.constprop.0+0x189/0x500 fs/eventpoll.c:704
ep_send_events fs/eventpoll.c:1793 [inline]
ep_poll+0xe3/0x900 fs/eventpoll.c:1930
do_epoll_wait+0x162/0x180 fs/eventpoll.c:2294
__do_sys_epoll_pwait fs/eventpoll.c:2325 [inline]
__se_sys_epoll_pwait fs/eventpoll.c:2311 [inline]
__x64_sys_epoll_pwait+0xcd/0x170 fs/eventpoll.c:2311
do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 7254 Comm: syz-fuzzer Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Both tcp_v4_err() and tcp_v6_err() do the following operations
while they do not own the socket lock :
fastopen = tp->fastopen_rsk;
snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;
The problem is that without appropriate barrier, the compiler
might reload tp->fastopen_rsk and trigger a NULL deref.
request sockets are protected by RCU, we can simply add
the missing annotations and barriers to solve the issue.
Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2019-10-12
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) a bunch of small fixes. Nothing critical.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If an ICMP packet comes in on the UDP socket backing an AF_RXRPC socket as
the UDP socket is being shut down, rxrpc_error_report() may get called to
deal with it after sk_user_data on the UDP socket has been cleared, leading
to a NULL pointer access when this local endpoint record gets accessed.
Fix this by just returning immediately if sk_user_data was NULL.
The oops looks like the following:
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
...
RIP: 0010:rxrpc_error_report+0x1bd/0x6a9
...
Call Trace:
? sock_queue_err_skb+0xbd/0xde
? __udp4_lib_err+0x313/0x34d
__udp4_lib_err+0x313/0x34d
icmp_unreach+0x1ee/0x207
icmp_rcv+0x25b/0x28f
ip_protocol_deliver_rcu+0x95/0x10e
ip_local_deliver+0xe9/0x148
__netif_receive_skb_one_core+0x52/0x6e
process_backlog+0xdc/0x177
net_rx_action+0xf9/0x270
__do_softirq+0x1b6/0x39a
? smpboot_register_percpu_thread+0xce/0xce
run_ksoftirqd+0x1d/0x42
smpboot_thread_fn+0x19e/0x1b3
kthread+0xf1/0xf6
? kthread_delayed_work_timer_fn+0x83/0x83
ret_from_fork+0x24/0x30
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Reported-by: syzbot+611164843bd48cc2190c@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During health reporter operations, driver might want to fill-up
the extack message, so propagate extack down to the health reporter ops.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If reporter state is healthy, don't call into a driver for recover and
don't increase recovery count.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove pointless use of size return variable by directly returning
sizes.
Signed-off-by: Vito Caputo <vcaputo@pengaru.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove pointless return variable dance.
Appears vestigial from when the function did locking as seen in
unix_find_socket_byinode(), but locking is handled in
unix_find_socket_byname() for __unix_find_socket_byname().
Signed-off-by: Vito Caputo <vcaputo@pengaru.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull NFS client bugfixes from Anna Schumaker:
"Stable bugfixes:
- Fix O_DIRECT accounting of number of bytes read/written # v4.1+
Other fixes:
- Fix nfsi->nrequests count error on nfs_inode_remove_request()
- Remove redundant mirror tracking in O_DIRECT
- Fix leak of clp->cl_acceptor string
- Fix race to sk_err after xs_error_report"
* tag 'nfs-for-5.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: fix race to sk_err after xs_error_report
NFSv4: Fix leak of clp->cl_acceptor string
NFS: Remove redundant mirror tracking in O_DIRECT
NFS: Fix O_DIRECT accounting of number of bytes read/written
nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
|
|
Fix typo 'boolian' into 'boolean'.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191011084303.28418-1-anton.ivanov@cambridgegreys.com
|
|
It is currently not possible to detach the flow dissector program and
attach a new one in an atomic fashion, that is with a single syscall.
Attempts to do so will be met with EEXIST error.
This makes updates to flow dissector program hard. Traffic steering that
relies on BPF-powered flow dissection gets disrupted while old program has
been already detached but the new one has not been attached yet.
There is also a window of opportunity to attach a flow dissector to a
non-root namespace while updating the root flow dissector, thus blocking
the update.
Lastly, the behavior is inconsistent with cgroup BPF programs, which can be
replaced with a single bpf(BPF_PROG_ATTACH, ...) syscall without any
restrictions.
Allow attaching a new flow dissector program when another one is already
present with a restriction that it can't be the same program.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191011082946.22695-2-jakub@cloudflare.com
|
|
fallthrough will become a pseudo reserved keyword so this only use of
fallthrough is better renamed to allow it.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Reduces per-rate data structure size
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20191008171139.96476-3-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Rate success probability usually fluctuates a lot under normal conditions.
With a simple EWMA, noise and fluctuation can be reduced by increasing the
window length, but that comes at the cost of introducing lag on sudden
changes.
This change replaces the EWMA implementation with a moving average that's
designed to significantly reduce lag while keeping a bigger window size
by being better at filtering out noise.
It is only slightly more expensive than the simple EWMA and still avoids
divisions in its calculation.
The algorithm is adapted from an implementation intended for a completely
different field (stock market trading), where the tradeoff of lag vs
noise filtering is equally important. It is based on the "smoothing filter"
from http://www.stockspotter.com/files/PredictiveIndicators.pdf.
I have adapted it to fixed-point math with some constants so that it uses
only addition, bit shifts and multiplication
To better make use of the filtering and bigger window size, the update
interval time is cut in half.
For testing, the algorithm can be reverted to the older one via debugfs
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20191008171139.96476-2-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Use a slightly different threshold for downgrading spatial streams to
make it easier to calculate without divisions.
Slightly reduces CPU overhead.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20191008171139.96476-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
cfg80211_assign_cookie already checks & prevents a 0 from being
returned, so the explicit loop is unnecessary.
Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Link: https://lore.kernel.org/r/20191008164350.2836-1-denkenz@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
smc_rx_recvmsg() first checks if data is available, and then if
RCV_SHUTDOWN is set. There is a race when smc_cdc_msg_recv_action() runs
in between these 2 checks, receives data and sets RCV_SHUTDOWN.
In that case smc_rx_recvmsg() would return from receive without to
process the available data.
Fix that with a final check for data available if RCV_SHUTDOWN is set.
Move the check for data into a function and call it twice.
And use the existing helper smc_rx_data_available().
Fixes: 952310ccf2d8 ("smc: receive data from RMBE")
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
smc_cdc_rxed_any_close_or_senddone() is used as an end condition for the
receive loop. This conflicts with smc_cdc_msg_recv_action() which could
run in parallel and set the bits checked by
smc_cdc_rxed_any_close_or_senddone() before the receive is processed.
In that case we could return from receive with no data, although data is
available. The same applies to smc_rx_wait().
Fix this by checking for RCV_SHUTDOWN only, which is set in
smc_cdc_msg_recv_action() after the receive was actually processed.
Fixes: 952310ccf2d8 ("smc: receive data from RMBE")
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
If creation of an SMCD link group with VLAN id fails, the initial
smc_ism_get_vlan() step has to be reverted as well.
Fixes: c6ba7c9ba43d ("net/smc: add base infrastructure for SMC-D and ISM")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Since commit 4f8943f80883 ("SUNRPC: Replace direct task wakeups from
softirq context") there has been a race to the value of the sk_err if both
XPRT_SOCK_WAKE_ERROR and XPRT_SOCK_WAKE_DISCONNECT are set. In that case,
we may end up losing the sk_err value that existed when xs_error_report was
called.
Fix this by reverting to the previous behavior: instead of using SO_ERROR
to retrieve the value at a later time (which might also return sk_err_soft),
copy the sk_err value onto struct sock_xprt, and use that value to wake
pending tasks.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 4f8943f80883 ("SUNRPC: Replace direct task wakeups from softirq context")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
sk->sk_backlog.len can be written by BH handlers, and read
from process contexts in a lockless way.
Note the write side should also use WRITE_ONCE() or a variant.
We need some agreement about the best way to do this.
syzbot reported :
BUG: KCSAN: data-race in tcp_add_backlog / tcp_grow_window.isra.0
write to 0xffff88812665f32c of 4 bytes by interrupt on cpu 1:
sk_add_backlog include/net/sock.h:934 [inline]
tcp_add_backlog+0x4a0/0xcc0 net/ipv4/tcp_ipv4.c:1737
tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
napi_skb_finish net/core/dev.c:5671 [inline]
napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
virtnet_receive drivers/net/virtio_net.c:1323 [inline]
virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
napi_poll net/core/dev.c:6352 [inline]
net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
read to 0xffff88812665f32c of 4 bytes by task 7292 on cpu 0:
tcp_space include/net/tcp.h:1373 [inline]
tcp_grow_window.isra.0+0x6b/0x480 net/ipv4/tcp_input.c:413
tcp_event_data_recv+0x68f/0x990 net/ipv4/tcp_input.c:717
tcp_rcv_established+0xbfe/0xf50 net/ipv4/tcp_input.c:5618
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2427
release_sock+0x61/0x160 net/core/sock.c:2943
tcp_recvmsg+0x63b/0x1a30 net/ipv4/tcp.c:2181
inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
sock_recvmsg_nosec net/socket.c:871 [inline]
sock_recvmsg net/socket.c:889 [inline]
sock_recvmsg+0x92/0xb0 net/socket.c:885
sock_read_iter+0x15f/0x1e0 net/socket.c:967
call_read_iter include/linux/fs.h:1864 [inline]
new_sync_read+0x389/0x4f0 fs/read_write.c:414
__vfs_read+0xb1/0xc0 fs/read_write.c:427
vfs_read fs/read_write.c:461 [inline]
vfs_read+0x143/0x2c0 fs/read_write.c:446
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7292 Comm: syz-fuzzer Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
sock_rcvlowat() or int_sk_rcvlowat() might be called without the socket
lock for example from tcp_poll().
Use READ_ONCE() to document the fact that other cpus might change
sk->sk_rcvlowat under us and avoid KCSAN splats.
Use WRITE_ONCE() on write sides too.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
sk_add_backlog() callers usually read sk->sk_rcvbuf without
owning the socket lock. This means sk_rcvbuf value can
be changed by other cpus, and KCSAN complains.
Add READ_ONCE() annotations to document the lockless nature
of these reads.
Note that writes over sk_rcvbuf should also use WRITE_ONCE(),
but this will be done in separate patches to ease stable
backports (if we decide this is relevant for stable trees).
BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg
write to 0xffff88812ab369f8 of 8 bytes by interrupt on cpu 1:
__sk_add_backlog include/net/sock.h:902 [inline]
sk_add_backlog include/net/sock.h:933 [inline]
tcp_add_backlog+0x45a/0xcc0 net/ipv4/tcp_ipv4.c:1737
tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
napi_skb_finish net/core/dev.c:5671 [inline]
napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
virtnet_receive drivers/net/virtio_net.c:1323 [inline]
virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
napi_poll net/core/dev.c:6352 [inline]
net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
read to 0xffff88812ab369f8 of 8 bytes by task 7271 on cpu 0:
tcp_recvmsg+0x470/0x1a30 net/ipv4/tcp.c:2047
inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
sock_recvmsg_nosec net/socket.c:871 [inline]
sock_recvmsg net/socket.c:889 [inline]
sock_recvmsg+0x92/0xb0 net/socket.c:885
sock_read_iter+0x15f/0x1e0 net/socket.c:967
call_read_iter include/linux/fs.h:1864 [inline]
new_sync_read+0x389/0x4f0 fs/read_write.c:414
__vfs_read+0xb1/0xc0 fs/read_write.c:427
vfs_read fs/read_write.c:461 [inline]
vfs_read+0x143/0x2c0 fs/read_write.c:446
ksys_read+0xd5/0x1b0 fs/read_write.c:587
__do_sys_read fs/read_write.c:597 [inline]
__se_sys_read fs/read_write.c:595 [inline]
__x64_sys_read+0x4c/0x60 fs/read_write.c:595
do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7271 Comm: syz-fuzzer Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
tcp_memory_pressure is read without holding any lock,
and its value could be changed on other cpus.
Use READ_ONCE() to annotate these lockless reads.
The write side is already using atomic ops.
Fixes: b8da51ebb1aa ("tcp: introduce tcp_under_memory_pressure()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
reqsk_queue_empty() is called from inet_csk_listen_poll() while
other cpus might write ->rskq_accept_head value.
Use {READ|WRITE}_ONCE() to avoid compiler tricks
and potential KCSAN splats.
Fixes: fff1f3001cc5 ("tcp: add a spinlock to protect struct request_sock_queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
As mentioned in https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance
a C compiler can legally transform :
if (memory_pressure && *memory_pressure)
*memory_pressure = 0;
to :
if (memory_pressure)
*memory_pressure = 0;
Fixes: 0604475119de ("tcp: add TCPMemoryPressuresChrono counter")
Fixes: 180d8cd942ce ("foundations of per-cgroup memory pressure controlling.")
Fixes: 3ab224be6d69 ("[NET] CORE: Introducing new memory accounting interface.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
As hinted by KCSAN, we need at least one READ_ONCE()
to prevent a compiler optimization.
More details on :
https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance
sysbot report :
BUG: KCSAN: data-race in __nf_ct_refresh_acct / __nf_ct_refresh_acct
read to 0xffff888123eb4f08 of 4 bytes by interrupt on cpu 0:
__nf_ct_refresh_acct+0xd4/0x1b0 net/netfilter/nf_conntrack_core.c:1796
nf_ct_refresh_acct include/net/netfilter/nf_conntrack.h:201 [inline]
nf_conntrack_tcp_packet+0xd40/0x3390 net/netfilter/nf_conntrack_proto_tcp.c:1161
nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1633 [inline]
nf_conntrack_in+0x410/0xaa0 net/netfilter/nf_conntrack_core.c:1727
ipv4_conntrack_in+0x27/0x40 net/netfilter/nf_conntrack_proto.c:178
nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline]
nf_hook_slow+0x83/0x160 net/netfilter/core.c:512
nf_hook include/linux/netfilter.h:260 [inline]
NF_HOOK include/linux/netfilter.h:303 [inline]
ip_rcv+0x12f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
napi_skb_finish net/core/dev.c:5671 [inline]
napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
virtnet_receive drivers/net/virtio_net.c:1323 [inline]
virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
napi_poll net/core/dev.c:6352 [inline]
net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
__do_softirq+0x115/0x33f kernel/softirq.c:292
write to 0xffff888123eb4f08 of 4 bytes by task 7191 on cpu 1:
__nf_ct_refresh_acct+0xfb/0x1b0 net/netfilter/nf_conntrack_core.c:1797
nf_ct_refresh_acct include/net/netfilter/nf_conntrack.h:201 [inline]
nf_conntrack_tcp_packet+0xd40/0x3390 net/netfilter/nf_conntrack_proto_tcp.c:1161
nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1633 [inline]
nf_conntrack_in+0x410/0xaa0 net/netfilter/nf_conntrack_core.c:1727
ipv4_conntrack_local+0xbe/0x130 net/netfilter/nf_conntrack_proto.c:200
nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline]
nf_hook_slow+0x83/0x160 net/netfilter/core.c:512
nf_hook include/linux/netfilter.h:260 [inline]
__ip_local_out+0x1f7/0x2b0 net/ipv4/ip_output.c:114
ip_local_out+0x31/0x90 net/ipv4/ip_output.c:123
__ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
ip_queue_xmit+0x45/0x60 include/net/ip.h:236
__tcp_transmit_skb+0xdeb/0x1cd0 net/ipv4/tcp_output.c:1158
__tcp_send_ack+0x246/0x300 net/ipv4/tcp_output.c:3685
tcp_send_ack+0x34/0x40 net/ipv4/tcp_output.c:3691
tcp_cleanup_rbuf+0x130/0x360 net/ipv4/tcp.c:1575
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 7191 Comm: syz-fuzzer Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: cc16921351d8 ("netfilter: conntrack: avoid same-timeout update")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
The flag NLM_F_ECHO aims to reply to the user the message notified to all
listeners.
It was not the case with the command RTM_NEWNSID, let's fix this.
Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
Reported-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Tested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Make sure a terminated SMC socket reaches the CLOSED state.
Even if sending of close flags fails, change the socket state to
the intended state to avoid dangling sockets not reaching the
CLOSED state.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Add a "going_away" indication to ISM devices and IB ports and
avoid creation of new connections on such disappearing devices.
And do not handle ISM events if ISM device is disappearing.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
SMCD link groups belong to certain ISM-devices and SMCR link group
links belong to certain IB-devices. Increase the refcount for
these devices, as long as corresponding link groups exist.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
This patch introduces separate locks for the split SMCD and SMCR
link group lists.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Currently SMCD and SMCR link groups are maintained in one list.
To facilitate abnormal termination handling they are split into
a separate list for SMCR link groups and separate lists for SMCD
link groups per SMCD device.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
If tcf_register_action failed, mirred_device_notifier
should be unregistered.
Fixes: 3b87956ea645 ("net sched: fix race in mirred device removal")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
When configuring a taprio instance if "flags" is not specified (or
it's zero), taprio currently replies with an "Invalid argument" error.
So, set the return value to zero after we are done with all the
checks.
Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
This patch is to add a new event SCTP_SEND_FAILED_EVENT described in
rfc6458#section-6.1.11. It's a update of SCTP_SEND_FAILED event:
struct sctp_sndrcvinfo ssf_info is replaced with
struct sctp_sndinfo ssfe_info in struct sctp_send_failed_event.
SCTP_SEND_FAILED is being deprecated, but we don't remove it in this
patch. Both are being processed in sctp_datamsg_destroy() when the
corresp event flag is set.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
sctp_ulpevent_nofity_peer_addr_change() would be called in
sctp_assoc_set_primary() to send SCTP_ADDR_MADE_PRIM event
when this transport is set to the primary path of the asoc.
This event is described in rfc6458#section-6.1.2:
SCTP_ADDR_MADE_PRIM: This address has now been made the primary
destination address. This notification is provided whenever an
address is made primary.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
sctp_ulpevent_nofity_peer_addr_change() is called in
sctp_assoc_rm_peer() to send SCTP_ADDR_REMOVED event
when this transport is removed from the asoc.
This event is described in rfc6458#section-6.1.2:
SCTP_ADDR_REMOVED: The address is no longer part of the
association.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
A helper sctp_ulpevent_nofity_peer_addr_change() will be extracted
to make peer_addr_change event and enqueue it, and the helper will
be called in sctp_assoc_add_peer() to send SCTP_ADDR_ADDED event.
This event is described in rfc6458#section-6.1.2:
SCTP_ADDR_ADDED: The address is now part of the association.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|