Age | Commit message (Collapse) | Author |
|
TIPC looks concentrated in itself, and other pernet_operations
seem not touching its entities.
tipc_net_ops look pernet-divided, and they should be safe to
be executed in parallel for several net the same time.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations create and destroy net::sctp::ctl_sock.
Since pernet_operations do not send sctp packets each other,
they look safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations have a deal with sysctl, /proc
entries and statistics. Also, there are freeing of
net::sctp::addr_waitq queue and net::sctp::local_addr_list
in exit method. All of them look pernet-divided, and it
seems these items are only interesting for sctp_defaults_ops,
which are safe to be executed in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Return error code -EINVAL in the address len check error handling
case since 'err' can be overwrite to 0 by 'err = sctp_verify_addr()'
in the for loop.
Fixes: 2c0dbaa0c43d ("sctp: add support for SCTP_DSTADDRV4/6 Information for sendmsg")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2018-03-13
1) Refuse to insert 32 bit userspace socket policies on 64
bit systems like we do it for standard policies. We don't
have a compat layer, so inserting socket policies from
32 bit userspace will lead to a broken configuration.
2) Make the policy hold queue work without the flowcache.
Dummy bundles are not chached anymore, so we need to
generate a new one on each lookup as long as the SAs
are not yet in place.
3) Fix the validation of the esn replay attribute. The
The sanity check in verify_replay() is bypassed if
the XFRM_STATE_ESN flag is not set. Fix this by doing
the sanity check uncoditionally.
From Florian Westphal.
4) After most of the dst_entry garbage collection code
is removed, we may leak xfrm_dst entries as they are
neither cached nor tracked somewhere. Fix this by
reusing the 'uncached_list' to track xfrm_dst entries
too. From Xin Long.
5) Fix a rcu_read_lock/rcu_read_unlock imbalance in
xfrm_get_tos() From Xin Long.
6) Fix an infinite loop in xfrm_get_dst_nexthop. On
transport mode we fetch the child dst_entry after
we continue, so this pointer is never updated.
Fix this by fetching it before we continue.
7) Fix ESN sequence number gap after IPsec GSO packets.
We accidentally increment the sequence number counter
on the xfrm_state by one packet too much in the ESN
case. Fix this by setting the sequence number to the
correct value.
8) Reset the ethernet protocol after decapsulation only if a
mac header was set. Otherwise it breaks configurations
with TUN devices. From Yossi Kuperman.
9) Fix __this_cpu_read() usage in preemptible code. Use
this_cpu_read() instead in ipcomp_alloc_tfms().
From Greg Hackmann.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a
__this_cpu_read() call inside ipcomp_alloc_tfms().
At the time, __this_cpu_read() required the caller to either not care
about races or to handle preemption/interrupt issues. 3.15 tightened
the rules around some per-cpu operations, and now __this_cpu_read()
should never be used in a preemptible context. On 3.15 and later, we
need to use this_cpu_read() instead.
syzkaller reported this leading to the following kernel BUG while
fuzzing sendmsg:
BUG: using __this_cpu_read() in preemptible [00000000] code: repro/3101
caller is ipcomp_init_state+0x185/0x990
CPU: 3 PID: 3101 Comm: repro Not tainted 4.16.0-rc4-00123-g86f84779d8e9 #154
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
dump_stack+0xb9/0x115
check_preemption_disabled+0x1cb/0x1f0
ipcomp_init_state+0x185/0x990
? __xfrm_init_state+0x876/0xc20
? lock_downgrade+0x5e0/0x5e0
ipcomp4_init_state+0xaa/0x7c0
__xfrm_init_state+0x3eb/0xc20
xfrm_init_state+0x19/0x60
pfkey_add+0x20df/0x36f0
? pfkey_broadcast+0x3dd/0x600
? pfkey_sock_destruct+0x340/0x340
? pfkey_seq_stop+0x80/0x80
? __skb_clone+0x236/0x750
? kmem_cache_alloc+0x1f6/0x260
? pfkey_sock_destruct+0x340/0x340
? pfkey_process+0x62a/0x6f0
pfkey_process+0x62a/0x6f0
? pfkey_send_new_mapping+0x11c0/0x11c0
? mutex_lock_io_nested+0x1390/0x1390
pfkey_sendmsg+0x383/0x750
? dump_sp+0x430/0x430
sock_sendmsg+0xc0/0x100
___sys_sendmsg+0x6c8/0x8b0
? copy_msghdr_from_user+0x3b0/0x3b0
? pagevec_lru_move_fn+0x144/0x1f0
? find_held_lock+0x32/0x1c0
? do_huge_pmd_anonymous_page+0xc43/0x11e0
? lock_downgrade+0x5e0/0x5e0
? get_kernel_page+0xb0/0xb0
? _raw_spin_unlock+0x29/0x40
? do_huge_pmd_anonymous_page+0x400/0x11e0
? __handle_mm_fault+0x553/0x2460
? __fget_light+0x163/0x1f0
? __sys_sendmsg+0xc7/0x170
__sys_sendmsg+0xc7/0x170
? SyS_shutdown+0x1a0/0x1a0
? __do_page_fault+0x5a0/0xca0
? lock_downgrade+0x5e0/0x5e0
SyS_sendmsg+0x27/0x40
? __sys_sendmsg+0x170/0x170
do_syscall_64+0x19f/0x640
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7f0ee73dfb79
RSP: 002b:00007ffe14fc15a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ee73dfb79
RDX: 0000000000000000 RSI: 00000000208befc8 RDI: 0000000000000004
RBP: 00007ffe14fc15b0 R08: 00007ffe14fc15c0 R09: 00007ffe14fc15c0
R10: 0000000000000000 R11: 0000000000000207 R12: 0000000000400440
R13: 00007ffe14fc16b0 R14: 0000000000000000 R15: 0000000000000000
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
During the conversion to dsa_is_user_port(), a condition ended up being
reversed, which would prevent the creation of any user port when using
the legacy binding and/or platform data, fix that.
Fixes: 4a5b85ffe2a0 ("net: dsa: use dsa_is_user_port everywhere")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The l2tp_tunnel_create() function checks for v4mapped ipv6
sockets and cache that flag, so that l2tp core code can
reusing it at xmit time.
If the socket is provided by the userspace, the connection
status of the tunnel sockets can change between the tunnel
creation and the xmit call, so that syzbot is able to
trigger the following splat:
BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:192
[inline]
BUG: KASAN: use-after-free in ip6_xmit+0x1f76/0x2260
net/ipv6/ip6_output.c:264
Read of size 8 at addr ffff8801bd949318 by task syz-executor4/23448
CPU: 0 PID: 23448 Comm: syz-executor4 Not tainted 4.16.0-rc4+ #65
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report+0x23c/0x360 mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
ip6_dst_idev include/net/ip6_fib.h:192 [inline]
ip6_xmit+0x1f76/0x2260 net/ipv6/ip6_output.c:264
inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139
l2tp_xmit_core net/l2tp/l2tp_core.c:1053 [inline]
l2tp_xmit_skb+0x105f/0x1410 net/l2tp/l2tp_core.c:1148
pppol2tp_sendmsg+0x470/0x670 net/l2tp/l2tp_ppp.c:341
sock_sendmsg_nosec net/socket.c:630 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:640
___sys_sendmsg+0x767/0x8b0 net/socket.c:2046
__sys_sendmsg+0xe5/0x210 net/socket.c:2080
SYSC_sendmsg net/socket.c:2091 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2087
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x453e69
RSP: 002b:00007f819593cc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f819593d6d4 RCX: 0000000000453e69
RDX: 0000000000000081 RSI: 000000002037ffc8 RDI: 0000000000000004
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000004c3 R14: 00000000006f72e8 R15: 0000000000000000
This change addresses the issues:
* explicitly checking for TCP_ESTABLISHED for user space provided sockets
* dropping the v4mapped flag usage - it can become outdated - and
explicitly invoking ipv6_addr_v4mapped() instead
The issue is apparently there since ancient times.
v1 -> v2: (many thanks to Guillaume)
- with csum issue introduced in v1
- replace pr_err with pr_debug
- fix build issue with IPV6 disabled
- move l2tp_sk_is_v4mapped in l2tp_core.c
v2 -> v3:
- don't update inet_daddr for v4mapped address, unneeded
- drop rendundant check at creation time
Reported-and-tested-by: syzbot+92fa328176eb07e4ac1a@syzkaller.appspotmail.com
Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On unsuccesful ip6_datagram_connect(), if the failure is caused by
ip6_datagram_dst_update(), the sk peer information are cleared, but
the sk->sk_state is preserved.
If the socket was already in an established status, the overall sk
status is inconsistent and fouls later checks in datagram code.
Fix this saving the old peer information and restoring them in
case of failure. This also aligns ipv6 datagram connect() behavior
with ipv4.
v1 -> v2:
- added missing Fixes tag
Fixes: 85cb73ff9b74 ("net: ipv6: reset daddr and dport in sk if connect() fails")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Avoid VLA[1] by using an already allocated buffer passed
by the caller.
[1] https://lkml.org/lkml/2018/3/7/621
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Avoid VLA[1] by using an already allocated buffer passed
by the caller.
[1] https://lkml.org/lkml/2018/3/7/621
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree, they are:
1) Fixed hashtable representation doesn't support timeout flag, skip it
otherwise rules to add elements from the packet fail bogusly fail with
EOPNOTSUPP.
2) Fix bogus error with 32-bits ebtables userspace and 64-bits kernel,
patch from Florian Westphal.
3) Sanitize proc names in several x_tables extensions, also from Florian.
4) Add sanitization to ebt_among wormhash logic, from Florian.
5) Missing release of hook array in flowtable.
====================
|
|
Same as LRO, hardware GRO cannot be enabled with RX-FCS.
When both are requested, hardware GRO will be dropped.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Avoid a VLA[1] by using a real constant expression instead of a variable.
The compiler should be able to optimize the original code and avoid using
an actual VLA. Anyway this change is useful because it will avoid a false
positive with -Wvla, it might also help the compiler generating better
code.
[1] https://lkml.org/lkml/2018/3/7/621
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Variable sg_off is assigned a value but it is never read, hence it is
redundant and can be removed.
Cleans up clang warning:
net/rds/message.c:373:2: warning: Value stored to 'sg_off' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make use of the new helper.
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
registered
Now when using 'ss' in iproute, kernel would try to load all _diag
modules, which also causes corresponding family and proto modules
to be loaded as well due to module dependencies.
Like after running 'ss', sctp, dccp, af_packet (if it works as a module)
would be loaded.
For example:
$ lsmod|grep sctp
$ ss
$ lsmod|grep sctp
sctp_diag 16384 0
sctp 323584 5 sctp_diag
inet_diag 24576 4 raw_diag,tcp_diag,sctp_diag,udp_diag
libcrc32c 16384 3 nf_conntrack,nf_nat,sctp
As these family and proto modules are loaded unintentionally, it
could cause some problems, like:
- Some debug tools use 'ss' to collect the socket info, which loads all
those diag and family and protocol modules. It's noisy for identifying
issues.
- Users usually expect to drop sctp init packet silently when they
have no sense of sctp protocol instead of sending abort back.
- It wastes resources (especially with multiple netns), and SCTP module
can't be unloaded once it's loaded.
...
In short, it's really inappropriate to have these family and proto
modules loaded unexpectedly when just doing debugging with inet_diag.
This patch is to introduce sock_load_diag_module() where it loads
the _diag module only when it's corresponding family or proto has
been already registered.
Note that we can't just load _diag module without the family or
proto loaded, as some symbols used in _diag module are from the
family or proto module.
v1->v2:
- move inet proto check to inet_diag to avoid a compiling err.
v2->v3:
- define sock_load_diag_module in sock.c and export one symbol
only.
- improve the changelog.
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Max delat_t should be the full_bucket/rate instead of the full_bucket.
Also report EINVAL if the rate is zero.
Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: zhangliping <zhangliping02@baidu.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Otherwise we leak this array.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
ebt_among is special, it has a dynamic match size and is exempt
from the central size checks.
commit c4585a2823edf ("bridge: ebt_among: add missing match size checks")
added validation for pool size, but missed fact that the macros
ebt_among_wh_src/dst can already return out-of-bound result because
they do not check value of wh_src/dst_ofs (an offset) vs. the size
of the match that userspace gave to us.
v2:
check that offset has correct alignment.
Paolo Abeni points out that we should also check that src/dst
wormhash arrays do not overlap, and src + length lines up with
start of dst (or vice versa).
v3: compact wormhash_sizes_valid() part
NB: Fixes tag is intentionally wrong, this bug exists from day
one when match was added for 2.6 kernel. Tag is there so stable
maintainers will notice this one too.
Tested with same rules from the earlier patch.
Fixes: c4585a2823edf ("bridge: ebt_among: add missing match size checks")
Reported-by: <syzbot+bdabab6f1983a03fc009@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
recent and hashlimit both create /proc files, but only check that
name is 0 terminated.
This can trigger WARN() from procfs when name is "" or "/".
Add helper for this and then use it for both.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: <syzbot+0502b00edac2a0680b61@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The last rule in the blob has next_entry offset that is same as total size.
This made "ebtables32 -A OUTPUT -d de:ad:be:ef:01:02" fail on 64 bit kernel.
Fixes: b71812168571fa ("netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsets")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The patch adds skb_cow_header() to ensure enough headroom
at ip6erspan_tunnel_xmit before pushing the erspan header
to the skb.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When users fill in incorrect erspan version number through
the struct erspan_metadata uapi, current code skips pushing
the erspan header but continue pushing the gre header, which
is incorrect. The patch fixes it by returning error.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The patch adds the erspan v2 proto in ip6gre_tunnel_lookup
so the erspan v2 tunnel can be found correctly.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some network devices - notably ipvlan slave - are not compatible with
any kind of rx_handler. Currently the hook can be installed but any
configuration (bridge, bond, macsec, ...) is nonfunctional.
This change allocates a priv_flag bit to mark such devices and explicitly
forbid installing a rx_handler if such bit is set. The new bit is used
by ipvlan slave device.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation to enabling -Wvla, remove VLA usage and replace it
with a fixed-length array instead.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As well as the basic conversion, I noticed that a lot of the
SCTP code checks gso_type without first checking skb_is_gso()
so I have added that where appropriate.
Also, document the helper.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce routines to calculate size of the shared tc netlink attributes
and the full message size including netlink header and tc service header.
Update add/delete action logic to have the size for event messages,
the size is passed to tcf_add_notify() and tcf_del_notify() where the
notification message is being allocated and constructed.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce a new function argument to carry total attributes size for
correct allocation of skb in event messages.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
fallback tunnels (like tunl0, gre0, gretap0, erspan0, sit0,
ip6tnl0, ip6gre0) are automatically created when the corresponding
module is loaded.
These tunnels are also automatically created when a new network
namespace is created, at a great cost.
In many cases, netns are used for isolation purposes, and these
extra network devices are a waste of resources. We are using
thousands of netns per host, and hit the netns creation/delete
bottleneck a lot. (Many thanks to Kirill for recent work on this)
Add a new sysctl so that we can opt-out from this automatic creation.
Note that these tunnels are still created for the initial namespace,
to be the least intrusive for typical setups.
Tested:
lpk43:~# cat add_del_unshare.sh
for i in `seq 1 40`
do
(for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait
lpk43:~# echo 0 >/proc/sys/net/core/fb_tunnels_only_for_init_net
lpk43:~# time ./add_del_unshare.sh
real 0m37.521s
user 0m0.886s
sys 7m7.084s
lpk43:~# echo 1 >/proc/sys/net/core/fb_tunnels_only_for_init_net
lpk43:~# time ./add_del_unshare.sh
real 0m4.761s
user 0m0.851s
sys 1m8.343s
lpk43:~#
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A tun device type can trivially be set to arbitrary value using
TUNSETLINK ioctl().
Therefore, lowpan_device_event() must really check that ieee802154_ptr
is not NULL.
Fixes: 2c88b5283f60d ("ieee802154: 6lowpan: remove check on null")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@osg.samsung.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the following slab-out-of-bounds kasan report in
ndisc_fill_redirect_hdr_option when the incoming ipv6 packet is not
linear and the accessed data are not in the linear data region of orig_skb.
[ 1503.122508] ==================================================================
[ 1503.122832] BUG: KASAN: slab-out-of-bounds in ndisc_send_redirect+0x94e/0x990
[ 1503.123036] Read of size 1184 at addr ffff8800298ab6b0 by task netperf/1932
[ 1503.123220] CPU: 0 PID: 1932 Comm: netperf Not tainted 4.16.0-rc2+ #124
[ 1503.123347] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
[ 1503.123527] Call Trace:
[ 1503.123579] <IRQ>
[ 1503.123638] print_address_description+0x6e/0x280
[ 1503.123849] kasan_report+0x233/0x350
[ 1503.123946] memcpy+0x1f/0x50
[ 1503.124037] ndisc_send_redirect+0x94e/0x990
[ 1503.125150] ip6_forward+0x1242/0x13b0
[...]
[ 1503.153890] Allocated by task 1932:
[ 1503.153982] kasan_kmalloc+0x9f/0xd0
[ 1503.154074] __kmalloc_track_caller+0xb5/0x160
[ 1503.154198] __kmalloc_reserve.isra.41+0x24/0x70
[ 1503.154324] __alloc_skb+0x130/0x3e0
[ 1503.154415] sctp_packet_transmit+0x21a/0x1810
[ 1503.154533] sctp_outq_flush+0xc14/0x1db0
[ 1503.154624] sctp_do_sm+0x34e/0x2740
[ 1503.154715] sctp_primitive_SEND+0x57/0x70
[ 1503.154807] sctp_sendmsg+0xaa6/0x1b10
[ 1503.154897] sock_sendmsg+0x68/0x80
[ 1503.154987] ___sys_sendmsg+0x431/0x4b0
[ 1503.155078] __sys_sendmsg+0xa4/0x130
[ 1503.155168] do_syscall_64+0x171/0x3f0
[ 1503.155259] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 1503.155436] Freed by task 1932:
[ 1503.155527] __kasan_slab_free+0x134/0x180
[ 1503.155618] kfree+0xbc/0x180
[ 1503.155709] skb_release_data+0x27f/0x2c0
[ 1503.155800] consume_skb+0x94/0xe0
[ 1503.155889] sctp_chunk_put+0x1aa/0x1f0
[ 1503.155979] sctp_inq_pop+0x2f8/0x6e0
[ 1503.156070] sctp_assoc_bh_rcv+0x6a/0x230
[ 1503.156164] sctp_inq_push+0x117/0x150
[ 1503.156255] sctp_backlog_rcv+0xdf/0x4a0
[ 1503.156346] __release_sock+0x142/0x250
[ 1503.156436] release_sock+0x80/0x180
[ 1503.156526] sctp_sendmsg+0xbb0/0x1b10
[ 1503.156617] sock_sendmsg+0x68/0x80
[ 1503.156708] ___sys_sendmsg+0x431/0x4b0
[ 1503.156799] __sys_sendmsg+0xa4/0x130
[ 1503.156889] do_syscall_64+0x171/0x3f0
[ 1503.156980] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 1503.157158] The buggy address belongs to the object at ffff8800298ab600
which belongs to the cache kmalloc-1024 of size 1024
[ 1503.157444] The buggy address is located 176 bytes inside of
1024-byte region [ffff8800298ab600, ffff8800298aba00)
[ 1503.157702] The buggy address belongs to the page:
[ 1503.157820] page:ffffea0000a62a00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
[ 1503.158053] flags: 0x4000000000008100(slab|head)
[ 1503.158171] raw: 4000000000008100 0000000000000000 0000000000000000 00000001800e000e
[ 1503.158350] raw: dead000000000100 dead000000000200 ffff880036002600 0000000000000000
[ 1503.158523] page dumped because: kasan: bad access detected
[ 1503.158698] Memory state around the buggy address:
[ 1503.158816] ffff8800298ab900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1503.158988] ffff8800298ab980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1503.159165] >ffff8800298aba00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1503.159338] ^
[ 1503.159436] ffff8800298aba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1503.159610] ffff8800298abb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1503.159785] ==================================================================
[ 1503.159964] Disabling lock debugging due to kernel taint
The test scenario to trigger the issue consists of 4 devices:
- H0: data sender, connected to LAN0
- H1: data receiver, connected to LAN1
- GW0 and GW1: routers between LAN0 and LAN1. Both of them have an
ethernet connection on LAN0 and LAN1
On H{0,1} set GW0 as default gateway while on GW0 set GW1 as next hop for
data from LAN0 to LAN1.
Moreover create an ip6ip6 tunnel between H0 and H1 and send 3 concurrent
data streams (TCP/UDP/SCTP) from H0 to H1 through ip6ip6 tunnel (send
buffer size is set to 16K). While data streams are active flush the route
cache on HA multiple times.
I have not been able to identify a given commit that introduced the issue
since, using the reproducer described above, the kasan report has been
triggered from 4.14 and I have not gone back further.
Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We use a two-step process to configure a filter with RSS spreading. First,
the RSS context is allocated and configured using ETHTOOL_SRSSH; this
returns an identifier (rss_context) which can then be passed to subsequent
invocations of ETHTOOL_SRXCLSRLINS to specify that the offset from the RSS
indirection table lookup should be added to the queue number (ring_cookie)
when delivering the packet. Drivers for devices which can only use the
indirection table entry directly (not add it to a base queue number)
should reject rule insertions combining RSS with a nonzero ring_cookie.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes: 9426bbc6de99 ("rds: use list structure to track information for zerocopy completion notification")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes: d40a126b16ea ("rds: refactor zcopy code into rds_message_zcopy_from_user")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are two error paths which are missing unlocks in this function.
Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We're supposed to use kfree_skb() to free these sk_buffs.
Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When sending a packet to a tunnel device, the dev's hard_header_len
could be larger than the skb->len in function packet_length().
In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen,
which is around 180, and an ARP packet sent to this tunnel has
skb->len = 42. This causes the 'unsign int length' to become super
large because it is negative value, causing the later ovs_vport_send
to drop it due to over-mtu size. The patch fixes it by setting it to 0.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations are similar to ipv4_net_ops.
They are safe to be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations register and unregister bunch
of nf_conntrack_l4proto. Exit method unregisters related
sysctl, init method calls init_net and get_net_proto.
The whole builtin_l4proto4 array has pretty simple
init_net and get_net_proto methods. The first one register
sysctl table, the second one is just RO memory dereference.
So, these pernet_operations are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations unregister net::ipv4::iptable_security table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations unregister net::ipv4::iptable_raw table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations unregister net::ipv4::nat_table table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations unregister net::ipv4::iptable_mangle table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations unregister net::ipv4::arptable_filter.
Another net/pernet_operations do not send arp packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations create per-net pktgen threads
and /proc entries. These pernet subsys looks closed
in itself, and there are no pernet_operations outside
this file, which are interested in the threads.
Init and/or exit methods look safe to be executed
in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations register and unregister net::nf::queue_handler
and /proc entry. The handler is accessed only under RCU, so this looks
safe to convert them.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations create and destroy /proc entries.
Also, exit method unsets nfulnl_logger. The logger is not
set by default, and it becomes bound via userspace request.
So, they look safe to be made async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|