Age | Commit message (Collapse) | Author |
|
Syzkaller hit a warning:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 7890 at lib/refcount.c:25 refcount_warn_saturate+0xdf/0x1d0
Modules linked in:
CPU: 3 PID: 7890 Comm: tun Not tainted 6.10.0-rc3-00100-gcaa4f9578aba-dirty #310
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:refcount_warn_saturate+0xdf/0x1d0
Code: 41 49 04 31 ff 89 de e8 9f 1e cd fe 84 db 75 9c e8 76 26 cd fe c6 05 b6 41 49 04 01 90 48 c7 c7 b8 8e 25 86 e8 d2 05 b5 fe 90 <0f> 0b 90 90 e9 79 ff ff ff e8 53 26 cd fe 0f b6 1
RSP: 0018:ffff8881067b7da0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff811c72ac
RDX: ffff8881026a2140 RSI: ffffffff811c72b5 RDI: 0000000000000001
RBP: ffff8881067b7db0 R08: 0000000000000000 R09: 205b5d3730353139
R10: 0000000000000000 R11: 205d303938375420 R12: ffff8881086500c4
R13: ffff8881086500c4 R14: ffff8881086500b0 R15: ffff888108650040
FS: 00007f5b2961a4c0(0000) GS:ffff88823bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055d7ed36fd18 CR3: 00000001482f6000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? show_regs+0xa3/0xc0
? __warn+0xa5/0x1c0
? refcount_warn_saturate+0xdf/0x1d0
? report_bug+0x1fc/0x2d0
? refcount_warn_saturate+0xdf/0x1d0
? handle_bug+0xa1/0x110
? exc_invalid_op+0x3c/0xb0
? asm_exc_invalid_op+0x1f/0x30
? __warn_printk+0xcc/0x140
? __warn_printk+0xd5/0x140
? refcount_warn_saturate+0xdf/0x1d0
get_net_ns+0xa4/0xc0
? __pfx_get_net_ns+0x10/0x10
open_related_ns+0x5a/0x130
__tun_chr_ioctl+0x1616/0x2370
? __sanitizer_cov_trace_switch+0x58/0xa0
? __sanitizer_cov_trace_const_cmp2+0x1c/0x30
? __pfx_tun_chr_ioctl+0x10/0x10
tun_chr_ioctl+0x2f/0x40
__x64_sys_ioctl+0x11b/0x160
x64_sys_call+0x1211/0x20d0
do_syscall_64+0x9e/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5b28f165d7
Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 8
RSP: 002b:00007ffc2b59c5e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5b28f165d7
RDX: 0000000000000000 RSI: 00000000000054e3 RDI: 0000000000000003
RBP: 00007ffc2b59c650 R08: 00007f5b291ed8c0 R09: 00007f5b2961a4c0
R10: 0000000029690010 R11: 0000000000000246 R12: 0000000000400730
R13: 00007ffc2b59cf40 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Kernel panic - not syncing: kernel: panic_on_warn set ...
This is trigger as below:
ns0 ns1
tun_set_iff() //dev is tun0
tun->dev = dev
//ip link set tun0 netns ns1
put_net() //ref is 0
__tun_chr_ioctl() //TUNGETDEVNETNS
net = dev_net(tun->dev);
open_related_ns(&net->ns, get_net_ns); //ns1
get_net_ns()
get_net() //addition on 0
Use maybe_get_net() in get_net_ns in case net's ref is zero to fix this
Fixes: 0c3e0e3bb623 ("tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20240614131302.2698509-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ip6_dst_idev() can return NULL, xfrm6_get_saddr() must act accordingly.
syzbot reported:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 PID: 12 Comm: kworker/u8:1 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker
RIP: 0010:xfrm6_get_saddr+0x93/0x130 net/ipv6/xfrm6_policy.c:64
Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 97 00 00 00 4c 8b ab d8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 86 00 00 00 4d 8b 6d 00 e8 ca 13 47 01 48 b8 00
RSP: 0018:ffffc90000117378 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffff88807b079dc0 RCX: ffffffff89a0d6d7
RDX: 0000000000000000 RSI: ffffffff89a0d6e9 RDI: ffff88807b079e98
RBP: ffff88807ad73248 R08: 0000000000000007 R09: fffffffffffff000
R10: ffff88807b079dc0 R11: 0000000000000007 R12: ffffc90000117480
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4586d00440 CR3: 0000000079042000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
xfrm_get_saddr net/xfrm/xfrm_policy.c:2452 [inline]
xfrm_tmpl_resolve_one net/xfrm/xfrm_policy.c:2481 [inline]
xfrm_tmpl_resolve+0xa26/0xf10 net/xfrm/xfrm_policy.c:2541
xfrm_resolve_and_create_bundle+0x140/0x2570 net/xfrm/xfrm_policy.c:2835
xfrm_bundle_lookup net/xfrm/xfrm_policy.c:3070 [inline]
xfrm_lookup_with_ifid+0x4d1/0x1e60 net/xfrm/xfrm_policy.c:3201
xfrm_lookup net/xfrm/xfrm_policy.c:3298 [inline]
xfrm_lookup_route+0x3b/0x200 net/xfrm/xfrm_policy.c:3309
ip6_dst_lookup_flow+0x15c/0x1d0 net/ipv6/ip6_output.c:1256
send6+0x611/0xd20 drivers/net/wireguard/socket.c:139
wg_socket_send_skb_to_peer+0xf9/0x220 drivers/net/wireguard/socket.c:178
wg_socket_send_buffer_to_peer+0x12b/0x190 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation+0x227/0x360 drivers/net/wireguard/send.c:40
wg_packet_handshake_send_worker+0x1c/0x30 drivers/net/wireguard/send.c:51
process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
process_scheduled_works kernel/workqueue.c:3312 [inline]
worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240615154231.234442-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot caught a NULL dereference in rt6_probe() [1]
Bail out if __in6_dev_get() returns NULL.
[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cb: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000658-0x000000000000065f]
CPU: 1 PID: 22444 Comm: syz-executor.0 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
RIP: 0010:rt6_probe net/ipv6/route.c:656 [inline]
RIP: 0010:find_match+0x8c4/0xf50 net/ipv6/route.c:758
Code: 14 fd f7 48 8b 85 38 ff ff ff 48 c7 45 b0 00 00 00 00 48 8d b8 5c 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 19
RSP: 0018:ffffc900034af070 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90004521000
RDX: 00000000000000cb RSI: ffffffff8990d0cd RDI: 000000000000065c
RBP: ffffc900034af150 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000002 R12: 000000000000000a
R13: 1ffff92000695e18 R14: ffff8880244a1d20 R15: 0000000000000000
FS: 00007f4844a5a6c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b31b27000 CR3: 000000002d42c000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
rt6_nh_find_match+0xfa/0x1a0 net/ipv6/route.c:784
nexthop_for_each_fib6_nh+0x26d/0x4a0 net/ipv4/nexthop.c:1496
__find_rr_leaf+0x6e7/0xe00 net/ipv6/route.c:825
find_rr_leaf net/ipv6/route.c:853 [inline]
rt6_select net/ipv6/route.c:897 [inline]
fib6_table_lookup+0x57e/0xa30 net/ipv6/route.c:2195
ip6_pol_route+0x1cd/0x1150 net/ipv6/route.c:2231
pol_lookup_func include/net/ip6_fib.h:616 [inline]
fib6_rule_lookup+0x386/0x720 net/ipv6/fib6_rules.c:121
ip6_route_output_flags_noref net/ipv6/route.c:2639 [inline]
ip6_route_output_flags+0x1d0/0x640 net/ipv6/route.c:2651
ip6_dst_lookup_tail.constprop.0+0x961/0x1760 net/ipv6/ip6_output.c:1147
ip6_dst_lookup_flow+0x99/0x1d0 net/ipv6/ip6_output.c:1250
rawv6_sendmsg+0xdab/0x4340 net/ipv6/raw.c:898
inet_sendmsg+0x119/0x140 net/ipv4/af_inet.c:853
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
sock_write_iter+0x4b8/0x5c0 net/socket.c:1160
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0x6b6/0x1140 fs/read_write.c:590
ksys_write+0x1f8/0x260 fs/read_write.c:643
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 52e1635631b3 ("[IPV6]: ROUTE: Add router_probe_interval sysctl.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240615151454.166404-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reminds us that in6_dev_get() can return NULL.
fib6_nh_init()
ip6_validate_gw( &idev )
ip6_route_check_nh( idev )
*idev = in6_dev_get(dev); // can be NULL
Oops: general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7]
CPU: 0 PID: 11237 Comm: syz-executor.3 Not tainted 6.10.0-rc2-syzkaller-00249-gbe27b8965297 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
RIP: 0010:fib6_nh_init+0x640/0x2160 net/ipv6/route.c:3606
Code: 00 00 fc ff df 4c 8b 64 24 58 48 8b 44 24 28 4c 8b 74 24 30 48 89 c1 48 89 44 24 28 48 8d 98 e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 0f 85 b3 17 00 00 8b 1b 31 ff 89 de e8 b8 8b
RSP: 0018:ffffc900032775a0 EFLAGS: 00010202
RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000000000
RDX: 0000000000000010 RSI: ffffc90003277a54 RDI: ffff88802b3a08d8
RBP: ffffc900032778b0 R08: 00000000000002fc R09: 0000000000000000
R10: 00000000000002fc R11: 0000000000000000 R12: ffff88802b3a08b8
R13: 1ffff9200064eec8 R14: ffffc90003277a00 R15: dffffc0000000000
FS: 00007f940feb06c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000245e8000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ip6_route_info_create+0x99e/0x12b0 net/ipv6/route.c:3809
ip6_route_add+0x28/0x160 net/ipv6/route.c:3853
ipv6_route_ioctl+0x588/0x870 net/ipv6/route.c:4483
inet6_ioctl+0x21a/0x280 net/ipv6/af_inet6.c:579
sock_do_ioctl+0x158/0x460 net/socket.c:1222
sock_ioctl+0x629/0x8e0 net/socket.c:1341
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f940f07cea9
Fixes: 428604fb118f ("ipv6: do not set routes if disable_ipv6 has been enabled")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240614082002.26407-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some applications were reporting ETIMEDOUT errors on apparently
good looking flows, according to packet dumps.
We were able to root cause the issue to an accidental setting
of tp->retrans_stamp in the following scenario:
- client sends TFO SYN with data.
- server has TFO disabled, ACKs only SYN but not payload.
- client receives SYNACK covering only SYN.
- tcp_ack() eats SYN and sets tp->retrans_stamp to 0.
- tcp_rcv_fastopen_synack() calls tcp_xmit_retransmit_queue()
to retransmit TFO payload w/o SYN, sets tp->retrans_stamp to "now",
but we are not in any loss recovery state.
- TFO payload is ACKed.
- we are not in any loss recovery state, and don't see any dupacks,
so we don't get to any code path that clears tp->retrans_stamp.
- tp->retrans_stamp stays non-zero for the lifetime of the connection.
- after first RTO, tcp_clamp_rto_to_user_timeout() clamps second RTO
to 1 jiffy due to bogus tp->retrans_stamp.
- on clamped RTO with non-zero icsk_retransmits, retransmits_timed_out()
sets start_ts from tp->retrans_stamp from TFO payload retransmit
hours/days ago, and computes bogus long elapsed time for loss recovery,
and suffers ETIMEDOUT early.
Fixes: a7abf3cd76e1 ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()")
CC: stable@vger.kernel.org
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Co-developed-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240614130615.396837-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported a memory leak in nr_create() [0].
Commit 409db27e3a2e ("netrom: Fix use-after-free of a listening socket.")
added sock_hold() to the nr_heartbeat_expiry() function, where
a) a socket has a SOCK_DESTROY flag or
b) a listening socket has a SOCK_DEAD flag.
But in the case "a," when the SOCK_DESTROY flag is set, the file descriptor
has already been closed and the nr_release() function has been called.
So it makes no sense to hold the reference count because no one will
call another nr_destroy_socket() and put it as in the case "b."
nr_connect
nr_establish_data_link
nr_start_heartbeat
nr_release
switch (nr->state)
case NR_STATE_3
nr->state = NR_STATE_2
sock_set_flag(sk, SOCK_DESTROY);
nr_rx_frame
nr_process_rx_frame
switch (nr->state)
case NR_STATE_2
nr_state2_machine()
nr_disconnect()
nr_sk(sk)->state = NR_STATE_0
sock_set_flag(sk, SOCK_DEAD)
nr_heartbeat_expiry
switch (nr->state)
case NR_STATE_0
if (sock_flag(sk, SOCK_DESTROY) ||
(sk->sk_state == TCP_LISTEN
&& sock_flag(sk, SOCK_DEAD)))
sock_hold() // ( !!! )
nr_destroy_socket()
To fix the memory leak, let's call sock_hold() only for a listening socket.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with Syzkaller.
[0]: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16
Reported-by: syzbot+d327a1f3b12e1e206c16@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16
Fixes: 409db27e3a2e ("netrom: Fix use-after-free of a listening socket.")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
-EINVAL will interrupt the dump. The correct error to return
if we have more data to dump is -EMSGSIZE.
Discovered by doing:
for i in `seq 80`; do ip link add type veth; done
./cli.py --dbg-small-recv 5300 --spec netdev.yaml --dump dev-get >> /dev/null
[...]
nl_len = 64 (48) nl_flags = 0x0 nl_type = 19
nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
error: -22
Fixes: d3d854fd6a1d ("netdev-genl: create a simple family for netdev stuff")
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240613213044.3675745-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-06-14
We've added 8 non-merge commits during the last 2 day(s) which contain
a total of 9 files changed, 92 insertions(+), 11 deletions(-).
The main changes are:
1) Silence a syzkaller splat under CONFIG_DEBUG_NET=y in pskb_pull_reason()
triggered via __bpf_try_make_writable(), from Florian Westphal.
2) Fix removal of kfuncs during linking phase which then throws a kernel
build warning via resolve_btfids about unresolved symbols,
from Tony Ambardar.
3) Fix a UML x86_64 compilation failure from BPF as pcpu_hot symbol
is not available on User Mode Linux, from Maciej Żenczykowski.
4) Fix a register corruption in reg_set_min_max triggering an invariant
violation in BPF verifier, from Daniel Borkmann.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Harden __bpf_kfunc tag against linker kfunc removal
compiler_types.h: Define __retain for __attribute__((__retain__))
bpf: Avoid splat in pskb_pull_reason
bpf: fix UML x86_64 compile failure
selftests/bpf: Add test coverage for reg_set_min_max handling
bpf: Reduce stack consumption in check_stack_write_fixed_off
bpf: Fix reg_set_min_max corruption of fake_reg
MAINTAINERS: mailmap: Update Stanislav's email address
====================
Link: https://lore.kernel.org/r/20240614203223.26500-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzkaller builds (CONFIG_DEBUG_NET=y) frequently trigger a debug
hint in pskb_may_pull.
We'd like to retain this debug check because it might hint at integer
overflows and other issues (kernel code should pull headers, not huge
value).
In bpf case, this splat isn't interesting at all: such (nonsensical)
bpf programs are typically generated by a fuzzer anyway.
Do what Eric suggested and suppress such warning.
For CONFIG_DEBUG_NET=n we don't need the extra check because
pskb_may_pull will do the right thing: return an error without the
WARN() backtrace.
Fixes: 219eee9c0d16 ("net: skbuff: add overflow debug check to pull/push helpers")
Reported-by: syzbot+0c4150bff9fff3bf023c@syzkaller.appspotmail.com
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://syzkaller.appspot.com/bug?extid=0c4150bff9fff3bf023c
Link: https://lore.kernel.org/netdev/9f254c96-54f2-4457-b7ab-1d9f6187939c@gmail.com/
Link: https://lore.kernel.org/bpf/20240614101801.9496-1-fw@strlen.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Various fixes:
* cfg80211: wext scan
* mac80211: monitor regression, scan counted_by, offload
* iwlwifi: locking, 6 GHz scan, remain-on-channel
* tag 'wireless-2024-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: fix monitor channel with chanctx emulation
wifi: mac80211: Avoid address calculations via out of bounds array indexing
wifi: mac80211: Recalc offload when monitor stop
wifi: iwlwifi: scan: correctly check if PSC listen period is needed
wifi: iwlwifi: mvm: fix ROC version check
wifi: iwlwifi: mvm: unlock mvm mutex
wifi: cfg80211: wext: add extra SIOCSIWSCAN data check
wifi: cfg80211: wext: set ssids=NULL for passive scans
====================
Link: https://lore.kernel.org/r/20240614085710.24103-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As the comment in this function says, the code currently just clears the
CIPSO part with IPOPT_NOP, rather than removing it completely and
trimming the packet. The other cipso_v4_*_delattr() functions, however,
do the proper removal and also calipso_skbuff_delattr() makes an effort
to remove the CALIPSO options instead of replacing them with padding.
Some routers treat IPv4 packets with anything (even NOPs) in the option
header as a special case and take them through a slower processing path.
Consequently, hardening guides such as STIG recommend to configure such
routers to drop packets with non-empty IP option headers [1][2]. Thus,
users might expect NetLabel to produce packets with minimal padding (or
at least with no padding when no actual options are present).
Implement the proper option removal to address this and to be closer to
what the peer functions do.
[1] https://www.stigviewer.com/stig/juniper_router_rtr/2019-09-27/finding/V-90937
[2] https://www.stigviewer.com/stig/cisco_ios_xe_router_rtr/2021-03-26/finding/V-217001
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As evident from the definition of ip_options_get(), the IP option
IPOPT_END is used to pad the IP option data array, not IPOPT_NOP. Yet
the loop that walks the IP options to determine the total IP options
length in cipso_v4_delopt() doesn't take IPOPT_END into account.
Fix it by recognizing the IPOPT_END value as the end of actual options.
Fixes: 014ab19a69c3 ("selinux: Set socket NetLabel based on connection endpoint")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After the channel context emulation, there were reports that
changing the monitor channel no longer works. This is because
those drivers don't have WANT_MONITOR_VIF, so the setting the
channel always exits out quickly.
Fix this by always allocating the virtual monitor sdata, and
simply not telling the driver about it unless it wanted to.
This way, we have an interface/sdata to bind the chanctx to,
and the emulation can work correctly.
Cc: stable@vger.kernel.org
Fixes: 0a44dfc07074 ("wifi: mac80211: simplify non-chanctx drivers")
Reported-and-tested-by: Savyasaachi Vanga <savyasaachiv@gmail.com>
Closes: https://lore.kernel.org/r/chwoymvpzwtbmzryrlitpwmta5j6mtndocxsyqvdyikqu63lon@gfds653hkknl
Link: https://msgid.link/20240612122351.b12d4a109dde.I1831a44417faaab92bea1071209abbe4efbe3fba@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
req->n_channels must be set before req->channels[] can be used.
This patch fixes one of the issues encountered in [1].
[ 83.964255] UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:364:4
[ 83.964258] index 0 is out of range for type 'struct ieee80211_channel *[]'
[...]
[ 83.964264] Call Trace:
[ 83.964267] <TASK>
[ 83.964269] dump_stack_lvl+0x3f/0xc0
[ 83.964274] __ubsan_handle_out_of_bounds+0xec/0x110
[ 83.964278] ieee80211_prep_hw_scan+0x2db/0x4b0
[ 83.964281] __ieee80211_start_scan+0x601/0x990
[ 83.964291] nl80211_trigger_scan+0x874/0x980
[ 83.964295] genl_family_rcv_msg_doit+0xe8/0x160
[ 83.964298] genl_rcv_msg+0x240/0x270
[...]
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218810
Co-authored-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Kenton Groombridge <concord@gentoo.org>
Link: https://msgid.link/20240605152218.236061-1-concord@gentoo.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and netfilter.
Slim pickings this time, probably a combination of summer, DevConf.cz,
and the end of first half of the year at corporations.
Current release - regressions:
- Revert "igc: fix a log entry using uninitialized netdev", it traded
lack of netdev name in a printk() for a crash
Previous releases - regressions:
- Bluetooth: L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
- geneve: fix incorrectly setting lengths of inner headers in the
skb, confusing the drivers and causing mangled packets
- sched: initialize noop_qdisc owner to avoid false-positive
recursion detection (recursing on CPU 0), which bubbles up to user
space as a sendmsg() error, while noop_qdisc should silently drop
- netdevsim: fix backwards compatibility in nsim_get_iflink()
Previous releases - always broken:
- netfilter: ipset: fix race between namespace cleanup and gc in the
list:set type"
* tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send()
af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response
gve: Clear napi->skb before dev_kfree_skb_any()
ionic: fix use after netif_napi_del()
Revert "igc: fix a log entry using uninitialized netdev"
net: bridge: mst: fix suspicious rcu usage in br_mst_set_state
net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state
net/ipv6: Fix the RT cache flush via sysctl using a previous delay
net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters
gve: ignore nonrelevant GSO type bits when processing TSO headers
net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPP
netfilter: Use flowlabel flow key when re-routing mangled packets
netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
netfilter: nft_inner: validate mandatory meta and payload
tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
mailmap: map Geliang's new email address
mptcp: pm: update add_addr counters after connect
mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
mptcp: ensure snd_una is properly initialized on connect
...
|
|
Pull NFS client fixes from Trond Myklebust:
"Bugfixes:
- NFSv4.2: Fix a memory leak in nfs4_set_security_label
- NFSv2/v3: abort nfs_atomic_open_v23 if the name is too long.
- NFS: Add appropriate memory barriers to the sillyrename code
- Propagate readlink errors in nfs_symlink_filler
- NFS: don't invalidate dentries on transient errors
- NFS: fix unnecessary synchronous writes in random write workloads
- NFSv4.1: enforce rootpath check when deciding whether or not to trunk
Other:
- Change email address for Trond Myklebust due to email server concerns"
* tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: add barriers when testing for NFS_FSDATA_BLOCKED
SUNRPC: return proper error from gss_wrap_req_priv
NFSv4.1 enforce rootpath check in fs_location query
NFS: abort nfs_atomic_open_v23 if name is too long.
nfs: don't invalidate dentries on transient errors
nfs: Avoid flushing many pages with NFS_FILE_SYNC
nfs: propagate readlink errors in nfs_symlink_filler
MAINTAINERS: Change email address for Trond Myklebust
NFSv4: Fix memory leak in nfs4_set_security_label
|
|
Read with MSG_PEEK flag loops if the first byte to read is an OOB byte.
commit 22dd70eb2c3d ("af_unix: Don't peek OOB data without MSG_OOB.")
addresses the loop issue but does not address the issue that no data
beyond OOB byte can be read.
>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c1.send(b'a', MSG_OOB)
1
>>> c1.send(b'b')
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'
>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c2.setsockopt(SOL_SOCKET, SO_OOBINLINE, 1)
>>> c1.send(b'a', MSG_OOB)
1
>>> c1.send(b'b')
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'
>>>
Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240611084639.2248934-1-Rao.Shoaib@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I converted br_mst_set_state to RCU to avoid a vlan use-after-free
but forgot to change the vlan group dereference helper. Switch to vlan
group RCU deref helper to fix the suspicious rcu usage warning.
Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free")
Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240609103654.914987-3-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pass the already obtained vlan group pointer to br_mst_vlan_set_state()
instead of dereferencing it again. Each caller has already correctly
dereferenced it for their context. This change is required for the
following suspicious RCU dereference fix. No functional changes
intended.
Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free")
Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240609103654.914987-2-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The net.ipv6.route.flush system parameter takes a value which specifies
a delay used during the flush operation for aging exception routes. The
written value is however not used in the currently requested flush and
instead utilized only in the next one.
A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
of net->ipv6.sysctl.flush_delay into a local delay variable and then
calls proc_dointvec() which actually updates the sysctl based on the
provided input.
Fix the problem by switching the order of the two operations.
Fixes: 4990509f19e8 ("[NETNS][IPV6]: Make sysctls route per namespace.")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
Patch #1 fixes insufficient sanitization of netlink attributes for the
inner expression which can trigger nul-pointer dereference,
from Davide Ornaghi.
Patch #2 address a report that there is a race condition between
namespace cleanup and the garbage collection of the list:set
type. This patch resolves this issue with other minor issues
as well, from Jozsef Kadlecsik.
Patch #3 ip6_route_me_harder() ignores flowlabel/dsfield when ip dscp
has been mangled, this unbreaks ip6 dscp set $v,
from Florian Westphal.
All of these patches address issues that are present in several releases.
* tag 'nf-24-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: Use flowlabel flow key when re-routing mangled packets
netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
netfilter: nft_inner: validate mandatory meta and payload
====================
Link: https://lore.kernel.org/r/20240611220323.413713-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a monitor interface is started, ieee80211_recalc_offload() is
called and 802.11 encapsulation offloading support get disabled so
monitor interface could get native wifi frames directly. But when
this interface is stopped there is no need to keep the 802.11
encpasulation offloading off.
This call ieee80211_recalc_offload() when monitor interface is stopped
so 802.11 encapsulation offloading gets re-activated if possible.
Fixes: 6aea26ce5a4c ("mac80211: rework tx encapsulation offload API")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Link: https://msgid.link/840baab454f83718e6e16fd836ac597d924e85b9.1716048326.git.repk@triplefau.lt
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In 'cfg80211_wext_siwscan()', add extra check whether number of
channels passed via 'ioctl(sock, SIOCSIWSCAN, ...)' doesn't exceed
IW_MAX_FREQUENCIES and reject invalid request with -EINVAL otherwise.
Reported-by: syzbot+253cd2d2491df77c93ac@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=253cd2d2491df77c93ac
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://msgid.link/20240531032010.451295-1-dmantipov@yandex.ru
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In nl80211, we always set the ssids of a scan request to
NULL when n_ssids==0 (passive scan). Drivers have relied
on this behaviour in the past, so we fixed it in 6 GHz
scan requests as well, and added a warning so we'd have
assurance the API would always be called that way.
syzbot found that wext doesn't ensure that, so we reach
the check and trigger the warning. Fix the wext code to
set the ssids pointer to NULL when there are none.
Reported-by: syzbot+cd6135193ba6bb9ad158@syzkaller.appspotmail.com
Fixes: f7a8b10bfd61 ("wifi: cfg80211: fix 6 GHz scan request building")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_sync: fix not using correct handle
- L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
- L2CAP: fix connection setup in l2cap_connect
* tag 'for-net-2024-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: fix connection setup in l2cap_connect
Bluetooth: L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
Bluetooth: hci_sync: Fix not using correct handle
====================
Link: https://lore.kernel.org/r/20240610135803.920662-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'ip6 dscp set $v' in an nftables outpute route chain has no effect.
While nftables does detect the dscp change and calls the reroute hook.
But ip6_route_me_harder never sets the dscp/flowlabel:
flowlabel/dsfield routing rules are ignored and no reroute takes place.
Thanks to Yi Chen for an excellent reproducer script that I used
to validate this change.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Lion Ackermann reported that there is a race condition between namespace cleanup
in ipset and the garbage collection of the list:set type. The namespace
cleanup can destroy the list:set type of sets while the gc of the set type is
waiting to run in rcu cleanup. The latter uses data from the destroyed set which
thus leads use after free. The patch contains the following parts:
- When destroying all sets, first remove the garbage collectors, then wait
if needed and then destroy the sets.
- Fix the badly ordered "wait then remove gc" for the destroy a single set
case.
- Fix the missing rcu locking in the list:set type in the userspace test
case.
- Use proper RCU list handlings in the list:set type.
The patch depends on c1193d9bbbd3 (netfilter: ipset: Add list flush to cancel_gc).
Fixes: 97f7cf1cd80e (netfilter: ipset: fix performance regression in swap operation)
Reported-by: Lion Ackermann <nnamrec@gmail.com>
Tested-by: Lion Ackermann <nnamrec@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check for mandatory netlink attributes in payload and meta expression
when used embedded from the inner expression, otherwise NULL pointer
dereference is possible from userspace.
Fixes: a150d122b6bd ("netfilter: nft_meta: add inner match support")
Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
Signed-off-by: Davide Ornaghi <d.ornaghi97@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Due to timer wheel implementation, a timer will usually fire
after its schedule.
For instance, for HZ=1000, a timeout between 512ms and 4s
has a granularity of 64ms.
For this range of values, the extra delay could be up to 63ms.
For TCP, this means that tp->rcv_tstamp may be after
inet_csk(sk)->icsk_timeout whenever the timer interrupt
finally triggers, if one packet came during the extra delay.
We need to make sure tcp_rtx_probe0_timed_out() handles this case.
Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Menglong Dong <imagedong@tencent.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The creation of new subflows can fail for different reasons. If no
subflow have been created using the received ADD_ADDR, the related
counters should not be updated, otherwise they will never be decremented
for events related to this ID later on.
For the moment, the number of accepted ADD_ADDR is only decremented upon
the reception of a related RM_ADDR, and only if the remote address ID is
currently being used by at least one subflow. In other words, if no
subflow can be created with the received address, the counter will not
be decremented. In this case, it is then important not to increment
pm.add_addr_accepted counter, and not to modify pm.accept_addr bit.
Note that this patch does not modify the behaviour in case of failures
later on, e.g. if the MP Join is dropped or rejected.
The "remove invalid addresses" MP Join subtest has been modified to
validate this case. The broadcast IP address is added before the "valid"
address that will be used to successfully create a subflow, and the
limit is decreased by one: without this patch, it was not possible to
create the last subflow, because:
- the broadcast address would have been accepted even if it was not
usable: the creation of a subflow to this address results in an error,
- the limit of 2 accepted ADD_ADDR would have then been reached.
Fixes: 01cacb00b35c ("mptcp: add netlink-based PM")
Cc: stable@vger.kernel.org
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: YonglongLi <liyonglong@chinatelecom.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-3-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The RmAddr MIB counter is supposed to be incremented once when a valid
RM_ADDR has been received. Before this patch, it could have been
incremented as many times as the number of subflows connected to the
linked address ID, so it could have been 0, 1 or more than 1.
The "RmSubflow" is incremented after a local operation. In this case,
it is normal to tied it with the number of subflows that have been
actually removed.
The "remove invalid addresses" MP Join subtest has been modified to
validate this case. A broadcast IP address is now used instead: the
client will not be able to create a subflow to this address. The
consequence is that when receiving the RM_ADDR with the ID attached to
this broadcast IP address, no subflow linked to this ID will be found.
Fixes: 7a7e52e38a40 ("mptcp: add RM_ADDR related mibs")
Cc: stable@vger.kernel.org
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: YonglongLi <liyonglong@chinatelecom.cn>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-2-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is strictly related to commit fb7a0d334894 ("mptcp: ensure snd_nxt
is properly initialized on connect"). It turns out that syzkaller can
trigger the retransmit after fallback and before processing any other
incoming packet - so that snd_una is still left uninitialized.
Address the issue explicitly initializing snd_una together with snd_nxt
and write_seq.
Suggested-by: Mat Martineau <martineau@kernel.org>
Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-1-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the noop_qdisc owner isn't initialized, then it will be 0,
so packets will erroneously be regarded as having been subject
to recursion as long as only CPU 0 queues them. For non-SMP,
that's all packets, of course. This causes a change in what's
reported to userspace, normally noop_qdisc would drop packets
silently, but with this change the syscall returns -ENOBUFS if
RECVERR is also set on the socket.
Fix this by initializing the owner field to -1, just like it
would be for dynamically allocated qdiscs by qdisc_alloc().
Fixes: 0f022d32c3ec ("net/sched: Fix mirred deadlock on device recursion")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240607175340.786bfb938803.I493bf8422e36be4454c08880a8d3703cea8e421a@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The amp_id argument of l2cap_connect() was removed in
commit 84a4bb6548a2 ("Bluetooth: HCI: Remove HCI_AMP support")
It was always called with amp_id == 0, i.e. AMP_ID_BREDR == 0x00 (ie.
non-AMP controller). In the above commit, the code path for amp_id != 0
was preserved, although it should have used the amp_id == 0 one.
Restore the previous behavior of the non-AMP code path, to fix problems
with L2CAP connections.
Fixes: 84a4bb6548a2 ("Bluetooth: HCI: Remove HCI_AMP support")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This removes the bogus check for max > hcon->le_conn_max_interval since
the later is just the initial maximum conn interval not the maximum the
stack could support which is really 3200=4000ms.
In order to pass GAP/CONN/CPUP/BV-05-C one shall probably enter values
of the following fields in IXIT that would cause hci_check_conn_params
to fail:
TSPX_conn_update_int_min
TSPX_conn_update_int_max
TSPX_conn_update_peripheral_latency
TSPX_conn_update_supervision_timeout
Link: https://github.com/bluez/bluez/issues/847
Fixes: e4b019515f95 ("Bluetooth: Enforce validation on max value of connection interval")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
When setting up an advertisement the code shall always attempt to use
the handle set by the instance since it may not be equal to the instance
ID.
Fixes: e77f43d531af ("Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
tcp_v6_syn_recv_sock() calls ip6_dst_store() before
inet_sk(newsk)->pinet6 has been set up.
This means ip6_dst_store() writes over the parent (listener)
np->dst_cookie.
This is racy because multiple threads could share the same
parent and their final np->dst_cookie could be wrong.
Move ip6_dst_store() call after inet_sk(newsk)->pinet6
has been changed and after the copy of parent ipv6_pinfo.
Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix an occasional memory overwrite caused by a fix added in 6.10
* tag 'nfsd-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Fix loop termination condition in gss_free_in_token_pages()
|
|
Clang static checker (scan-build) warning:
net/ethtool/ioctl.c:line 2233, column 2
Called function pointer is null (null dereference).
Return '-EOPNOTSUPP' when 'ops->get_ethtool_phy_stats' is NULL to fix
this typo error.
Fixes: 201ed315f967 ("net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240605034742.921751-1-suhui@nfschina.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
syzbot found a race in __fib6_drop_pcpu_from() [1]
If compiler reads more than once (*ppcpu_rt),
second read could read NULL, if another cpu clears
the value in rt6_get_pcpu_route().
Add a READ_ONCE() to prevent this race.
Also add rcu_read_lock()/rcu_read_unlock() because
we rely on RCU protection while dereferencing pcpu_rt.
[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
CPU: 0 PID: 7543 Comm: kworker/u8:17 Not tainted 6.10.0-rc1-syzkaller-00013-g2bfcfd584ff5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Workqueue: netns cleanup_net
RIP: 0010:__fib6_drop_pcpu_from.part.0+0x10a/0x370 net/ipv6/ip6_fib.c:984
Code: f8 48 c1 e8 03 80 3c 28 00 0f 85 16 02 00 00 4d 8b 3f 4d 85 ff 74 31 e8 74 a7 fa f7 49 8d bf 90 00 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 0f 85 1e 02 00 00 49 8b 87 90 00 00 00 48 8b 0c 24 48
RSP: 0018:ffffc900040df070 EFLAGS: 00010206
RAX: 0000000000000012 RBX: 0000000000000001 RCX: ffffffff89932e16
RDX: ffff888049dd1e00 RSI: ffffffff89932d7c RDI: 0000000000000091
RBP: dffffc0000000000 R08: 0000000000000005 R09: 0000000000000007
R10: 0000000000000001 R11: 0000000000000006 R12: ffff88807fa080b8
R13: fffffbfff1a9a07d R14: ffffed100ff41022 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8880b9200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32c26000 CR3: 000000005d56e000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__fib6_drop_pcpu_from net/ipv6/ip6_fib.c:966 [inline]
fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1027 [inline]
fib6_purge_rt+0x7f2/0x9f0 net/ipv6/ip6_fib.c:1038
fib6_del_route net/ipv6/ip6_fib.c:1998 [inline]
fib6_del+0xa70/0x17b0 net/ipv6/ip6_fib.c:2043
fib6_clean_node+0x426/0x5b0 net/ipv6/ip6_fib.c:2205
fib6_walk_continue+0x44f/0x8d0 net/ipv6/ip6_fib.c:2127
fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2175
fib6_clean_tree+0xd7/0x120 net/ipv6/ip6_fib.c:2255
__fib6_clean_all+0x100/0x2d0 net/ipv6/ip6_fib.c:2271
rt6_sync_down_dev net/ipv6/route.c:4906 [inline]
rt6_disable_ip+0x7ed/0xa00 net/ipv6/route.c:4911
addrconf_ifdown.isra.0+0x117/0x1b40 net/ipv6/addrconf.c:3855
addrconf_notify+0x223/0x19e0 net/ipv6/addrconf.c:3778
notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1992
call_netdevice_notifiers_extack net/core/dev.c:2030 [inline]
call_netdevice_notifiers net/core/dev.c:2044 [inline]
dev_close_many+0x333/0x6a0 net/core/dev.c:1585
unregister_netdevice_many_notify+0x46d/0x19f0 net/core/dev.c:11193
unregister_netdevice_many net/core/dev.c:11276 [inline]
default_device_exit_batch+0x85b/0xae0 net/core/dev.c:11759
ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640
process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
process_scheduled_works kernel/workqueue.c:3312 [inline]
worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Fixes: d52d3997f843 ("ipv6: Create percpu rt6_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20240604193549.981839-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock().
Let's use READ_ONCE() to read sk->sk_shutdown.
Fixes: e4e541a84863 ("sock-diag: Report shutdown for inet and unix sockets (v2)")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We can dump the socket queue length via UNIX_DIAG by specifying
UDIAG_SHOW_RQLEN.
If sk->sk_state is TCP_LISTEN, we return the recv queue length,
but here we do not hold recvq lock.
Let's use skb_queue_len_lockless() in sk_diag_show_rqlen().
Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If the socket type is SOCK_STREAM or SOCK_SEQPACKET, unix_release_sock()
checks the length of the peer socket's recvq under unix_state_lock().
However, unix_stream_read_generic() calls skb_unlink() after releasing
the lock. Also, for SOCK_SEQPACKET, __skb_try_recv_datagram() unlinks
skb without unix_state_lock().
Thues, unix_state_lock() does not protect qlen.
Let's use skb_queue_empty_lockless() in unix_release_sock().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Once sk->sk_state is changed to TCP_LISTEN, it never changes.
unix_accept() takes advantage of this characteristics; it does not
hold the listener's unix_state_lock() and only acquires recvq lock
to pop one skb.
It means unix_state_lock() does not prevent the queue length from
changing in unix_stream_connect().
Thus, we need to use unix_recvq_full_lockless() to avoid data-race.
Now we remove unix_recvq_full() as no one uses it.
Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in
unix_recvq_full_lockless() because of the following reasons:
(1) For SOCK_DGRAM, it is a written-once field in unix_create1()
(2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the
listener's unix_state_lock() in unix_listen(), and we hold
the lock in unix_stream_connect()
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be
changed concurrently.
Let's use READ_ONCE() in unix_create1().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
sk_setsockopt() changes sk->sk_sndbuf under lock_sock(), but it's
not used in af_unix.c.
Let's use READ_ONCE() to read sk->sk_sndbuf in unix_writable(),
unix_dgram_sendmsg(), and unix_stream_sendmsg().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read
locklessly.
Let's use READ_ONCE() there.
Note that the result could be inconsistent if the socket is dumped
during the state change. This is common for other SOCK_DIAG and
similar interfaces.
Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report")
Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA")
Fixes: 45a96b9be6ec ("unix_diag: Dumping all sockets core")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
unix_stream_read_skb() is called from sk->sk_data_ready() context
where unix_state_lock() is not held.
Let's use READ_ONCE() there.
Fixes: 77462de14a43 ("af_unix: Add read_sock for stream socket types")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The following functions read sk->sk_state locklessly and proceed only if
the state is TCP_ESTABLISHED.
* unix_stream_sendmsg
* unix_stream_read_generic
* unix_seqpacket_sendmsg
* unix_seqpacket_recvmsg
Let's use READ_ONCE() there.
Fixes: a05d2ad1c1f3 ("af_unix: Only allow recv on connected seqpacket sockets.")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Once sk->sk_state is changed to TCP_LISTEN, it never changes.
unix_accept() takes the advantage and reads sk->sk_state without
holding unix_state_lock().
Let's use READ_ONCE() there.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|