Age | Commit message (Collapse) | Author |
|
As reported by Christoph, the mptcp protocol can run the
worker when the relevant msk socket is in an unexpected state:
connect()
// incoming reset + fastclose
// the mptcp worker is scheduled
mptcp_disconnect()
// msk is now CLOSED
listen()
mptcp_worker()
Leading to the following splat:
divide error: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21 Comm: kworker/1:0 Not tainted 6.3.0-rc1-gde5e8fd0123c #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:__tcp_select_window+0x22c/0x4b0 net/ipv4/tcp_output.c:3018
RSP: 0018:ffffc900000b3c98 EFLAGS: 00010293
RAX: 000000000000ffd7 RBX: 000000000000ffd7 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8214ce97 RDI: 0000000000000004
RBP: 000000000000ffd7 R08: 0000000000000004 R09: 0000000000010000
R10: 000000000000ffd7 R11: ffff888005afa148 R12: 000000000000ffd7
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88803ed00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000405270 CR3: 000000003011e006 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcp_select_window net/ipv4/tcp_output.c:262 [inline]
__tcp_transmit_skb+0x356/0x1280 net/ipv4/tcp_output.c:1345
tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline]
tcp_send_active_reset+0x13e/0x320 net/ipv4/tcp_output.c:3459
mptcp_check_fastclose net/mptcp/protocol.c:2530 [inline]
mptcp_worker+0x6c7/0x800 net/mptcp/protocol.c:2705
process_one_work+0x3bd/0x950 kernel/workqueue.c:2390
worker_thread+0x5b/0x610 kernel/workqueue.c:2537
kthread+0x138/0x170 kernel/kthread.c:376
ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308
</TASK>
This change addresses the issue explicitly checking for bad states
before running the mptcp worker.
Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/374
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Beyond reducing code duplication this also avoids scheduling
the mptcp_worker on a closed socket on some edge scenarios.
The addressed issue is actually older than the blamed commit
below, but this fix needs it as a pre-requisite.
Fixes: ba8f48f7a4d7 ("mptcp: introduce mptcp_schedule_work")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The usage_data[] array in rfc6803_encrypt_case() is uninitialised, so clear
it as it may cause the tests to fail otherwise.
Fixes: b958cff6b27b ("SUNRPC: Add encryption KUnit tests for the RFC 6803 encryption types")
Link: https://lore.kernel.org/r/380323.1681314997@warthog.procyon.org.uk/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Scott Mayhew <smayhew@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: linux-nfs@vger.kernel.org
cc: linux-crypto@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Remove redundant (base_addr >= pool->addrs_cnt) comparison from the
conditional.
In particular, addr is computed as:
addr = base_addr + offset
... where base_addr and offset are stored as 48-bit and 16-bit unsigned
integers, respectively. The above sum cannot overflow u64 since base_addr
has a maximum value of 0x0000ffffffffffff and offset has a maximum value
of 0xffff (implying a maximum sum of 0x000100000000fffe). Since overflow
is impossible, it follows that addr >= base_addr.
Now if (base_addr >= pool->addrs_cnt), then clearly:
addr >= base_addr
>= pool->addrs_cnt
Thus, (base_addr >= pool->addrs_cnt) implies (addr >= pool->addrs_cnt).
Subsequently, the former comparison is unnecessary in the conditional
since for any boolean expressions A and B, (A || B) && (A -> B) is
equivalent to B.
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20230411130025.19704-1-kal.conley@dectris.com
|
|
Perform the chunk boundary check like the page boundary check in
xp_desc_crosses_non_contig_pg(). This simplifies the implementation and
reduces the number of branches.
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20230410121841.643254-1-kal.conley@dectris.com
|
|
Currently, when traversing ifwdtsn skips with _sctp_walk_ifwdtsn, it only
checks the pos against the end of the chunk. However, the data left for
the last pos may be < sizeof(struct sctp_ifwdtsn_skip), and dereference
it as struct sctp_ifwdtsn_skip may cause coverflow.
This patch fixes it by checking the pos against "the end of the chunk -
sizeof(struct sctp_ifwdtsn_skip)" in sctp_ifwdtsn_skip, similar to
sctp_fwdtsn_skip.
Fixes: 0fc2ea922c8a ("sctp: implement validate_ftsn for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/2a71bffcd80b4f2c61fac6d344bb2f11c8fd74f7.1681155810.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Syzbot reported a bug as following:
=====================================================
BUG: KMSAN: uninit-value in qrtr_tx_resume+0x185/0x1f0 net/qrtr/af_qrtr.c:230
qrtr_tx_resume+0x185/0x1f0 net/qrtr/af_qrtr.c:230
qrtr_endpoint_post+0xf85/0x11b0 net/qrtr/af_qrtr.c:519
qrtr_tun_write_iter+0x270/0x400 net/qrtr/tun.c:108
call_write_iter include/linux/fs.h:2189 [inline]
aio_write+0x63a/0x950 fs/aio.c:1600
io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
__do_sys_io_submit fs/aio.c:2078 [inline]
__se_sys_io_submit+0x293/0x770 fs/aio.c:2048
__x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Uninit was created at:
slab_post_alloc_hook mm/slab.h:766 [inline]
slab_alloc_node mm/slub.c:3452 [inline]
__kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491
__do_kmalloc_node mm/slab_common.c:967 [inline]
__kmalloc_node_track_caller+0x114/0x3b0 mm/slab_common.c:988
kmalloc_reserve net/core/skbuff.c:492 [inline]
__alloc_skb+0x3af/0x8f0 net/core/skbuff.c:565
__netdev_alloc_skb+0x120/0x7d0 net/core/skbuff.c:630
qrtr_endpoint_post+0xbd/0x11b0 net/qrtr/af_qrtr.c:446
qrtr_tun_write_iter+0x270/0x400 net/qrtr/tun.c:108
call_write_iter include/linux/fs.h:2189 [inline]
aio_write+0x63a/0x950 fs/aio.c:1600
io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
__do_sys_io_submit fs/aio.c:2078 [inline]
__se_sys_io_submit+0x293/0x770 fs/aio.c:2048
__x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
It is because that skb->len requires at least sizeof(struct qrtr_ctrl_pkt)
in qrtr_tx_resume(). And skb->len equals to size in qrtr_endpoint_post().
But size is less than sizeof(struct qrtr_ctrl_pkt) when qrtr_cb->type
equals to QRTR_TYPE_RESUME_TX in qrtr_endpoint_post() under the syzbot
scenario. This triggers the uninit variable access bug.
Add size check when qrtr_cb->type equals to QRTR_TYPE_RESUME_TX in
qrtr_endpoint_post() to fix the bug.
Fixes: 5fdeb0d372ab ("net: qrtr: Implement outgoing flow control")
Reported-by: syzbot+4436c9630a45820fda76@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=c14607f0963d27d5a3d5f4c8639b500909e43540
Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230410012352.3997823-1-william.xuanziyang@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The commits referenced below allows userspace to use the NLM_F_ECHO flag
for RTM_NEW/DELLINK operations to receive unicast notifications for the
affected link. Prior to these changes, applications may have relied on
multicast notifications to learn the same information without specifying
the NLM_F_ECHO flag.
For such applications, the mentioned commits changed the behavior for
requests not using NLM_F_ECHO. Multicast notifications are still received,
but now use the portid of the requester and the sequence number of the
request instead of zero values used previously. For the application, this
message may be unexpected and likely handled as a response to the
NLM_F_ACKed request, especially if it uses the same socket to handle
requests and notifications.
To fix existing applications relying on the old notification behavior,
set the portid and sequence number in the notification only if the
request included the NLM_F_ECHO flag. This restores the old behavior
for applications not using it, but allows unicasted notifications for
others.
Fixes: f3a63cce1b4f ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link")
Fixes: d88e136cab37 ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_newlink_create")
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230411074319.24133-1-martin@strongswan.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add two new kfuncs that allow a BPF tc-hook, installed on an ipip
device in collect-metadata mode, to control FOU encap parameters on a
per-packet level. The set of kfuncs is registered with the fou module.
The bpf_skb_set_fou_encap kfunc is supposed to be used in tandem and after
a successful call to the bpf_skb_set_tunnel_key bpf-helper. UDP source and
destination ports can be controlled by passing a struct bpf_fou_encap. A
source port of zero will auto-assign a source port. enum bpf_fou_encap_type
is used to specify if the egress path should FOU or GUE encap the packet.
On the ingress path bpf_skb_get_fou_encap can be used to read UDP source
and destination ports from the receiver's point of view and allows for
packet multiplexing across different destination ports within a single
BPF program and ipip device.
Signed-off-by: Christian Ehrig <cehrig@cloudflare.com>
Link: https://lore.kernel.org/r/e17c94a646b63e78ce0dbf3f04b2c33dc948a32d.1680874078.git.cehrig@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Today ipip devices in collect-metadata mode don't allow for sending FOU
or GUE encapsulated packets. This patch lifts the restriction by adding
a struct ip_tunnel_encap to the tunnel metadata.
On the egress path, the members of this struct can be set by the
bpf_skb_set_fou_encap kfunc via a BPF tc-hook. Instead of dropping packets
wishing to use additional UDP encapsulation, ip_md_tunnel_xmit now
evaluates the contents of this struct and adds the corresponding FOU or
GUE header. Furthermore, it is making sure that additional header bytes
are taken into account for PMTU discovery.
On the ingress path, an ipip device in collect-metadata mode will fill this
struct and a BPF tc-hook can obtain the information via a call to the
bpf_skb_get_fou_encap kfunc.
The minor change to ip_tunnel_encap, which now takes a pointer to
struct ip_tunnel_encap instead of struct ip_tunnel, allows us to control
FOU encap type and parameters on a per packet-level.
Signed-off-by: Christian Ehrig <cehrig@cloudflare.com>
Link: https://lore.kernel.org/r/cfea47de655d0f870248abf725932f851b53960a.1680874078.git.cehrig@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When huang uses sched_switch tracepoint, the tracepoint
does only one thing in the mounted ebpf program, which
deletes the fixed elements in sockhash ([0])
It seems that elements in sockhash are rarely actively
deleted by users or ebpf program. Therefore, we do not
pay much attention to their deletion. Compared with hash
maps, sockhash only provides spin_lock_bh protection.
This causes it to appear to have self-locking behavior
in the interrupt context.
[0]:https://lore.kernel.org/all/CABcoxUayum5oOqFMMqAeWuS8+EzojquSOSyDA3J_2omY=2EeAg@mail.gmail.com/
Reported-by: Hsin-Wei Hung <hsinweih@uci.edu>
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230406122622.109978-1-liuxin350@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
With Eric's ref tracker, syzbot finally found a repro for
use-after-free in tcp_write_timer_handler() by kernel TCP
sockets. [0]
If SMC creates a kernel socket in __smc_create(), the kernel
socket is supposed to be freed in smc_clcsock_release() by
calling sock_release() when we close() the parent SMC socket.
However, at the end of smc_clcsock_release(), the kernel
socket's sk_state might not be TCP_CLOSE. This means that
we have not called inet_csk_destroy_sock() in __tcp_close()
and have not stopped the TCP timers.
The kernel socket's TCP timers can be fired later, so we
need to hold a refcnt for net as we do for MPTCP subflows
in mptcp_subflow_create_socket().
[0]:
leaked reference.
sk_alloc (./include/net/net_namespace.h:335 net/core/sock.c:2108)
inet_create (net/ipv4/af_inet.c:319 net/ipv4/af_inet.c:244)
__sock_create (net/socket.c:1546)
smc_create (net/smc/af_smc.c:3269 net/smc/af_smc.c:3284)
__sock_create (net/socket.c:1546)
__sys_socket (net/socket.c:1634 net/socket.c:1618 net/socket.c:1661)
__x64_sys_socket (net/socket.c:1672)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
==================================================================
BUG: KASAN: slab-use-after-free in tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594)
Read of size 1 at addr ffff888052b65e0d by task syzrepro/18091
CPU: 0 PID: 18091 Comm: syzrepro Tainted: G W 6.3.0-rc4-01174-gb5d54eb5899a #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:107)
print_report (mm/kasan/report.c:320 mm/kasan/report.c:430)
kasan_report (mm/kasan/report.c:538)
tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594)
tcp_write_timer (./include/linux/spinlock.h:390 net/ipv4/tcp_timer.c:643)
call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701)
__run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2022)
run_timer_softirq (kernel/time/timer.c:2037)
__do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572)
__irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650)
irq_exit_rcu (kernel/softirq.c:664)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1107 (discriminator 14))
</IRQ>
Fixes: ac7138746e14 ("smc: establish new socket family")
Reported-by: syzbot+7e1e1bdb852961150198@syzkaller.appspotmail.com
Link: https://lore.kernel.org/netdev/000000000000a3f51805f8bcc43a@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These are not as critical as the FDB/MDB trace points (I'm not aware of
outstanding VLAN related bugs), but maybe they are useful to somebody,
either debugging something or simply trying to learn more.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
DSA performs non-trivial housekeeping of unicast and multicast addresses
on shared (CPU and DSA) ports, and puts a bit of pressure on higher
layers, requiring them to behave correctly (remove these addresses
exactly as many times as they were added). Otherwise, either addresses
linger around forever, or DSA returns -ENOENT complaining that entries
that were already deleted must be deleted again.
To aid debugging, introduce some trace points specifically for FDB and
MDB - that's where some of the bugs still are right now.
Some bugs I have seen were also due to race conditions, see:
- 630fd4822af2 ("net: dsa: flush switchdev workqueue on bridge join error path")
- a2614140dc0f ("net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN")
so it would be good to not disturb the timing too much, hence the choice
to use trace points vs regular dev_dbg().
I've had these for some time on my computer in a less polished form, and
they've proven useful. What I found most useful was to enable
CONFIG_BOOTTIME_TRACING, add "trace_event=dsa" to the kernel cmdline,
and run "cat /sys/kernel/debug/tracing/trace". This is to debug more
complex environments with network managers started by the init system,
things like that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix not setting Dath Path for broadcast sink
- Fix not cleaning up on LE Connection failure
- SCO: Fix possible circular locking dependency
- L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
- Fix race condition in hidp_session_thread
- btbcm: Fix logic error in forming the board name
- btbcm: Fix use after free in btsdio_remove
* tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Bluetooth: Set ISO Data Path on broadcast sink
Bluetooth: hci_conn: Fix possible UAF
Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt
Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm
bluetooth: btbcm: Fix logic error in forming the board name.
Bluetooth: btsdio: fix use after free bug in btsdio_remove due to race condition
Bluetooth: Fix race condition in hidp_session_thread
Bluetooth: Fix printing errors if LE Connection times out
Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure
====================
Link: https://lore.kernel.org/r/20230410172718.4067798-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adding verifier test for accessing u32 pointer argument in
tracing programs.
The test program loads 1nd argument of bpf_fentry_test9 function
which is u32 pointer and checks that verifier allows that.
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230410085908.98493-3-zhoufeng.zf@bytedance.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p fixes from Eric Van Hensbergen:
"These are some collected fixes for the 6.3-rc series that have been
passed our 9p regression tests and been in for-next for at least a
week.
They include a fix for a KASAN reported problem in the extended
attribute handling code and a use after free in the xen transport.
This also includes some updates for the MAINTAINERS file including the
transition of our development mailing list from sourceforge.net to
lists.linux.dev"
* tag '9p-6.3-fixes-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
Update email address and mailing list for v9fs
9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
9P FS: Fix wild-memory-access write in v9fs_get_acl
|
|
Similar to commit d0be8347c623 ("Bluetooth: L2CAP: Fix use-after-free
caused by l2cap_chan_put"), just use l2cap_chan_hold_unless_zero to
prevent referencing a channel that is about to be destroyed.
Cc: stable@kernel.org
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Min Li <lm0963hack@gmail.com>
|
|
This patch enables ISO data rx on broadcast sink.
Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections")
Signed-off-by: Claudia Draghicescu <claudia.rosu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This fixes the following trace:
==================================================================
BUG: KASAN: slab-use-after-free in hci_conn_del+0xba/0x3a0
Write of size 8 at addr ffff88800208e9c8 by task iso-tester/31
CPU: 0 PID: 31 Comm: iso-tester Not tainted 6.3.0-rc2-g991aa4a69a47
#4716
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc36
04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x1d/0x70
print_report+0xce/0x610
? __virt_addr_valid+0xd4/0x150
? hci_conn_del+0xba/0x3a0
kasan_report+0xdd/0x110
? hci_conn_del+0xba/0x3a0
hci_conn_del+0xba/0x3a0
hci_conn_hash_flush+0xf2/0x120
hci_dev_close_sync+0x388/0x920
hci_unregister_dev+0x122/0x260
vhci_release+0x4f/0x90
__fput+0x102/0x430
task_work_run+0xf1/0x160
? __pfx_task_work_run+0x10/0x10
? mark_held_locks+0x24/0x90
exit_to_user_mode_prepare+0x170/0x180
syscall_exit_to_user_mode+0x19/0x50
do_syscall_64+0x4e/0x90
entry_SYSCALL_64_after_hwframe+0x70/0xda
Fixes: 0f00cd322d22 ("Bluetooth: Free potentially unfreed SCO connection")
Link: https://syzkaller.appspot.com/bug?extid=8bb72f86fc823817bc5d
Cc: <stable@vger.kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This attempts to fix the following trace:
======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2-g68fcb3a7bf97 #4706 Not tainted
------------------------------------------------------
sco-tester/31 is trying to acquire lock:
ffff8880025b8070 (&hdev->lock){+.+.}-{3:3}, at:
sco_sock_getsockopt+0x1fc/0xa90
but task is already holding lock:
ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at:
sco_sock_getsockopt+0x104/0xa90
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
lock_sock_nested+0x32/0x80
sco_connect_cfm+0x118/0x4a0
hci_sync_conn_complete_evt+0x1e6/0x3d0
hci_event_packet+0x55c/0x7c0
hci_rx_work+0x34c/0xa00
process_one_work+0x575/0x910
worker_thread+0x89/0x6f0
kthread+0x14e/0x180
ret_from_fork+0x2b/0x50
-> #1 (hci_cb_list_lock){+.+.}-{3:3}:
__mutex_lock+0x13b/0xcc0
hci_sync_conn_complete_evt+0x1ad/0x3d0
hci_event_packet+0x55c/0x7c0
hci_rx_work+0x34c/0xa00
process_one_work+0x575/0x910
worker_thread+0x89/0x6f0
kthread+0x14e/0x180
ret_from_fork+0x2b/0x50
-> #0 (&hdev->lock){+.+.}-{3:3}:
__lock_acquire+0x18cc/0x3740
lock_acquire+0x151/0x3a0
__mutex_lock+0x13b/0xcc0
sco_sock_getsockopt+0x1fc/0xa90
__sys_getsockopt+0xe9/0x190
__x64_sys_getsockopt+0x5b/0x70
do_syscall_64+0x42/0x90
entry_SYSCALL_64_after_hwframe+0x70/0xda
other info that might help us debug this:
Chain exists of:
&hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
lock(hci_cb_list_lock);
lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
lock(&hdev->lock);
*** DEADLOCK ***
1 lock held by sco-tester/31:
#0: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0},
at: sco_sock_getsockopt+0x104/0xa90
Fixes: 248733e87d50 ("Bluetooth: Allow querying of supported offload codecs over SCO socket")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This attempts to fix the following trace:
======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2-g0b93eeba4454 #4703 Not tainted
------------------------------------------------------
kworker/u3:0/46 is trying to acquire lock:
ffff888001fd9130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at:
sco_connect_cfm+0x118/0x4a0
but task is already holding lock:
ffffffff831e3340 (hci_cb_list_lock){+.+.}-{3:3}, at:
hci_sync_conn_complete_evt+0x1ad/0x3d0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (hci_cb_list_lock){+.+.}-{3:3}:
__mutex_lock+0x13b/0xcc0
hci_sync_conn_complete_evt+0x1ad/0x3d0
hci_event_packet+0x55c/0x7c0
hci_rx_work+0x34c/0xa00
process_one_work+0x575/0x910
worker_thread+0x89/0x6f0
kthread+0x14e/0x180
ret_from_fork+0x2b/0x50
-> #1 (&hdev->lock){+.+.}-{3:3}:
__mutex_lock+0x13b/0xcc0
sco_sock_connect+0xfc/0x630
__sys_connect+0x197/0x1b0
__x64_sys_connect+0x37/0x50
do_syscall_64+0x42/0x90
entry_SYSCALL_64_after_hwframe+0x70/0xda
-> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
__lock_acquire+0x18cc/0x3740
lock_acquire+0x151/0x3a0
lock_sock_nested+0x32/0x80
sco_connect_cfm+0x118/0x4a0
hci_sync_conn_complete_evt+0x1e6/0x3d0
hci_event_packet+0x55c/0x7c0
hci_rx_work+0x34c/0xa00
process_one_work+0x575/0x910
worker_thread+0x89/0x6f0
kthread+0x14e/0x180
ret_from_fork+0x2b/0x50
other info that might help us debug this:
Chain exists of:
sk_lock-AF_BLUETOOTH-BTPROTO_SCO --> &hdev->lock --> hci_cb_list_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(hci_cb_list_lock);
lock(&hdev->lock);
lock(hci_cb_list_lock);
lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
*** DEADLOCK ***
4 locks held by kworker/u3:0/46:
#0: ffff8880028d1130 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
process_one_work+0x4c0/0x910
#1: ffff8880013dfde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
at: process_one_work+0x4c0/0x910
#2: ffff8880025d8070 (&hdev->lock){+.+.}-{3:3}, at:
hci_sync_conn_complete_evt+0xa6/0x3d0
#3: ffffffffb79e3340 (hci_cb_list_lock){+.+.}-{3:3}, at:
hci_sync_conn_complete_evt+0x1ad/0x3d0
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
There is a potential race condition in hidp_session_thread that may
lead to use-after-free. For instance, the timer is active while
hidp_del_timer is called in hidp_session_thread(). After hidp_session_put,
then 'session' will be freed, causing kernel panic when hidp_idle_timeout
is running.
The solution is to use del_timer_sync instead of del_timer.
Here is the call trace:
? hidp_session_probe+0x780/0x780
call_timer_fn+0x2d/0x1e0
__run_timers.part.0+0x569/0x940
hidp_session_probe+0x780/0x780
call_timer_fn+0x1e0/0x1e0
ktime_get+0x5c/0xf0
lapic_next_deadline+0x2c/0x40
clockevents_program_event+0x205/0x320
run_timer_softirq+0xa9/0x1b0
__do_softirq+0x1b9/0x641
__irq_exit_rcu+0xdc/0x190
irq_exit_rcu+0xe/0x20
sysvec_apic_timer_interrupt+0xa1/0xc0
Cc: stable@vger.kernel.org
Signed-off-by: Min Li <lm0963hack@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This fixes errors like bellow when LE Connection times out since that
is actually not a controller error:
Bluetooth: hci0: Opcode 0x200d failed: -110
Bluetooth: hci0: request failed to create LE connection: err -110
Instead the code shall properly detect if -ETIMEDOUT is returned and
send HCI_OP_LE_CREATE_CONN_CANCEL to give up on the connection.
Link: https://github.com/bluez/bluez/issues/340
Fixes: 8e8b92ee60de ("Bluetooth: hci_sync: Add hci_le_create_conn_sync")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
hci_connect_le_scan_cleanup shall always be invoked to cleanup the
states and re-enable passive scanning if necessary, otherwise it may
cause the pending action to stay active causing multiple attempts to
connect.
Fixes: 9b3628d79b46 ("Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
There was a sort of rush surrounding commit 88c0a6b503b7 ("net: create a
netdev notifier for DSA to reject PTP on DSA master"), due to a desire
to convert DSA's attempt to deny TX timestamping on a DSA master to
something that doesn't block the kernel-wide API conversion from
ndo_eth_ioctl() to ndo_hwtstamp_set().
What was required was a mechanism that did not depend on ndo_eth_ioctl(),
and what was provided was a mechanism that did not depend on
ndo_eth_ioctl(), while at the same time introducing something that
wasn't absolutely necessary - a new netdev notifier.
There have been objections from Jakub Kicinski that using notifiers in
general when they are not absolutely necessary creates complications to
the control flow and difficulties to maintainers who look at the code.
So there is a desire to not use notifiers.
In addition to that, the notifier chain gets called even if there is no
DSA in the system and no one is interested in applying any restriction.
Take the model of udp_tunnel_nic_ops and introduce a stub mechanism,
through which net/core/dev_ioctl.c can call into DSA even when
CONFIG_NET_DSA=m.
Compared to the code that existed prior to the notifier conversion, aka
what was added in commits:
- 4cfab3566710 ("net: dsa: Add wrappers for overloaded ndo_ops")
- 3369afba1e46 ("net: Call into DSA netdevice_ops wrappers")
this is different because we are not overloading any struct
net_device_ops of the DSA master anymore, but rather, we are exposing a
rather specific functionality which is orthogonal to which API is used
to enable it - ndo_eth_ioctl() or ndo_hwtstamp_set().
Also, what is similar is that both approaches use function pointers to
get from built-in code to DSA.
There is no point in replicating the function pointers towards
__dsa_master_hwtstamp_validate() once for every CPU port (dev->dsa_ptr).
Instead, it is sufficient to introduce a singleton struct dsa_stubs,
built into the kernel, which contains a single function pointer to
__dsa_master_hwtstamp_validate().
I find this approach preferable to what we had originally, because
dev->dsa_ptr->netdev_ops->ndo_do_ioctl() used to require going through
struct dsa_port (dev->dsa_ptr), and so, this was incompatible with any
attempts to add any data encapsulation and hide DSA data structures from
the outside world.
Link: https://lore.kernel.org/netdev/20230403083019.120b72fd@kernel.org/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After commit 217f69743681 ("net: busy-poll: allow preemption
in sk_busy_loop()"), a thread willing to use busy polling
is not hurting other threads anymore in a non preempt kernel.
I think it is safe to remove CAP_NET_ADMIN check.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230406194634.1804691-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:
====================
Improve IPsec limits, ESN and replay window
This series overcomes existing hardware limitations in Mellanox ConnectX
devices around handling IPsec soft and hard limits.
In addition, the ESN logic is tied and added an interface to configure
replay window sequence numbers through existing iproute2 interface.
ip xfrm state ... [ replay-seq SEQ ] [ replay-oseq SEQ ]
[ replay-seq-hi SEQ ] [ replay-oseq-hi SEQ ]
Link: https://lore.kernel.org/all/cover.1680162300.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
* tag 'ipsec-esn-replay' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5e: Simulate missing IPsec TX limits hardware functionality
net/mlx5e: Generalize IPsec work structs
net/mlx5e: Reduce contention in IPsec workqueue
net/mlx5e: Set IPsec replay sequence numbers
net/mlx5e: Remove ESN callbacks if it is not supported
xfrm: don't require advance ESN callback for packet offload
net/mlx5e: Overcome slow response for first IPsec ASO WQE
net/mlx5e: Add SW implementation to support IPsec 64 bit soft and hard limits
net/mlx5e: Prevent zero IPsec soft/hard limits
net/mlx5e: Factor out IPsec ASO update function
====================
Link: https://lore.kernel.org/r/20230406071902.712388-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
assume the following setup on a single machine:
1. An openvswitch instance with one bridge and default flows
2. two network namespaces "server" and "client"
3. two ovs interfaces "server" and "client" on the bridge
4. for each ovs interface a veth pair with a matching name and 32 rx and
tx queues
5. move the ends of the veth pairs to the respective network namespaces
6. assign ip addresses to each of the veth ends in the namespaces (needs
to be the same subnet)
7. start some http server on the server network namespace
8. test if a client in the client namespace can reach the http server
when following the actions below the host has a chance of getting a cpu
stuck in a infinite loop:
1. send a large amount of parallel requests to the http server (around
3000 curls should work)
2. in parallel delete the network namespace (do not delete interfaces or
stop the server, just kill the namespace)
there is a low chance that this will cause the below kernel cpu stuck
message. If this does not happen just retry.
Below there is also the output of bpftrace for the functions mentioned
in the output.
The series of events happening here is:
1. the network namespace is deleted calling
`unregister_netdevice_many_notify` somewhere in the process
2. this sets first `NETREG_UNREGISTERING` on both ends of the veth and
then runs `synchronize_net`
3. it then calls `call_netdevice_notifiers` with `NETDEV_UNREGISTER`
4. this is then handled by `dp_device_event` which calls
`ovs_netdev_detach_dev` (if a vport is found, which is the case for
the veth interface attached to ovs)
5. this removes the rx_handlers of the device but does not prevent
packages to be sent to the device
6. `dp_device_event` then queues the vport deletion to work in
background as a ovs_lock is needed that we do not hold in the
unregistration path
7. `unregister_netdevice_many_notify` continues to call
`netdev_unregister_kobject` which sets `real_num_tx_queues` to 0
8. port deletion continues (but details are not relevant for this issue)
9. at some future point the background task deletes the vport
If after 7. but before 9. a packet is send to the ovs vport (which is
not deleted at this point in time) which forwards it to the
`dev_queue_xmit` flow even though the device is unregistering.
In `skb_tx_hash` (which is called in the `dev_queue_xmit`) path there is
a while loop (if the packet has a rx_queue recorded) that is infinite if
`dev->real_num_tx_queues` is zero.
To prevent this from happening we update `do_output` to handle devices
without carrier the same as if the device is not found (which would
be the code path after 9. is done).
Additionally we now produce a warning in `skb_tx_hash` if we will hit
the infinite loop.
bpftrace (first word is function name):
__dev_queue_xmit server: real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 2, reg_state: 1
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 6, reg_state: 2
ovs_netdev_detach_dev server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, reg_state: 2
netdev_rx_handler_unregister server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
netdev_rx_handler_unregister ret server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 27, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 22, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 18, reg_state: 2
netdev_unregister_kobject: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
ovs_vport_send server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
__dev_queue_xmit server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
broken device server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024
ovs_dp_detach_port server: real_num_tx_queues: 0 cpu 9, pid: 9124, tid: 9124, reg_state: 2
synchronize_rcu_expedited: cpu 9, pid: 33604, tid: 33604
stuck message:
watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [curl:1929279]
Modules linked in: veth pktgen bridge stp llc ip_set_hash_net nft_counter xt_set nft_compat nf_tables ip_set_hash_ip ip_set nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tls binfmt_misc nls_iso8859_1 input_leds joydev serio_raw dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net ahci net_failover crypto_simd cryptd psmouse libahci virtio_blk failover
CPU: 5 PID: 1929279 Comm: curl Not tainted 5.15.0-67-generic #74-Ubuntu
Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:netdev_pick_tx+0xf1/0x320
Code: 00 00 8d 48 ff 0f b7 c1 66 39 ca 0f 86 e9 01 00 00 45 0f b7 ff 41 39 c7 0f 87 5b 01 00 00 44 29 f8 41 39 c7 0f 87 4f 01 00 00 <eb> f2 0f 1f 44 00 00 49 8b 94 24 28 04 00 00 48 85 d2 0f 84 53 01
RSP: 0018:ffffb78b40298820 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff9c8773adc2e0 RCX: 000000000000083f
RDX: 0000000000000000 RSI: ffff9c8773adc2e0 RDI: ffff9c870a25e000
RBP: ffffb78b40298858 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c870a25e000
R13: ffff9c870a25e000 R14: ffff9c87fe043480 R15: 0000000000000000
FS: 00007f7b80008f00(0000) GS:ffff9c8e5f740000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7b80f6a0b0 CR3: 0000000329d66000 CR4: 0000000000350ee0
Call Trace:
<IRQ>
netdev_core_pick_tx+0xa4/0xb0
__dev_queue_xmit+0xf8/0x510
? __bpf_prog_exit+0x1e/0x30
dev_queue_xmit+0x10/0x20
ovs_vport_send+0xad/0x170 [openvswitch]
do_output+0x59/0x180 [openvswitch]
do_execute_actions+0xa80/0xaa0 [openvswitch]
? kfree+0x1/0x250
? kfree+0x1/0x250
? kprobe_perf_func+0x4f/0x2b0
? flow_lookup.constprop.0+0x5c/0x110 [openvswitch]
ovs_execute_actions+0x4c/0x120 [openvswitch]
ovs_dp_process_packet+0xa1/0x200 [openvswitch]
? ovs_ct_update_key.isra.0+0xa8/0x120 [openvswitch]
? ovs_ct_fill_key+0x1d/0x30 [openvswitch]
? ovs_flow_key_extract+0x2db/0x350 [openvswitch]
ovs_vport_receive+0x77/0xd0 [openvswitch]
? __htab_map_lookup_elem+0x4e/0x60
? bpf_prog_680e8aff8547aec1_kfree+0x3b/0x714
? trace_call_bpf+0xc8/0x150
? kfree+0x1/0x250
? kfree+0x1/0x250
? kprobe_perf_func+0x4f/0x2b0
? kprobe_perf_func+0x4f/0x2b0
? __mod_memcg_lruvec_state+0x63/0xe0
netdev_port_receive+0xc4/0x180 [openvswitch]
? netdev_port_receive+0x180/0x180 [openvswitch]
netdev_frame_hook+0x1f/0x40 [openvswitch]
__netif_receive_skb_core.constprop.0+0x23d/0xf00
__netif_receive_skb_one_core+0x3f/0xa0
__netif_receive_skb+0x15/0x60
process_backlog+0x9e/0x170
__napi_poll+0x33/0x180
net_rx_action+0x126/0x280
? ttwu_do_activate+0x72/0xf0
__do_softirq+0xd9/0x2e7
? rcu_report_exp_cpu_mult+0x1b0/0x1b0
do_softirq+0x7d/0xb0
</IRQ>
<TASK>
__local_bh_enable_ip+0x54/0x60
ip_finish_output2+0x191/0x460
__ip_finish_output+0xb7/0x180
ip_finish_output+0x2e/0xc0
ip_output+0x78/0x100
? __ip_finish_output+0x180/0x180
ip_local_out+0x5e/0x70
__ip_queue_xmit+0x184/0x440
? tcp_syn_options+0x1f9/0x300
ip_queue_xmit+0x15/0x20
__tcp_transmit_skb+0x910/0x9c0
? __mod_memcg_state+0x44/0xa0
tcp_connect+0x437/0x4e0
? ktime_get_with_offset+0x60/0xf0
tcp_v4_connect+0x436/0x530
__inet_stream_connect+0xd4/0x3a0
? kprobe_perf_func+0x4f/0x2b0
? aa_sk_perm+0x43/0x1c0
inet_stream_connect+0x3b/0x60
__sys_connect_file+0x63/0x70
__sys_connect+0xa6/0xd0
? setfl+0x108/0x170
? do_fcntl+0xe8/0x5a0
__x64_sys_connect+0x18/0x20
do_syscall_64+0x5c/0xc0
? __x64_sys_fcntl+0xa9/0xd0
? exit_to_user_mode_prepare+0x37/0xb0
? syscall_exit_to_user_mode+0x27/0x50
? do_syscall_64+0x69/0xc0
? __sys_setsockopt+0xea/0x1e0
? exit_to_user_mode_prepare+0x37/0xb0
? syscall_exit_to_user_mode+0x27/0x50
? __x64_sys_setsockopt+0x1f/0x30
? do_syscall_64+0x69/0xc0
? irqentry_exit+0x1d/0x30
? exc_page_fault+0x89/0x170
entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f7b8101c6a7
Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34 24 89
RSP: 002b:00007ffffd6b2198 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b8101c6a7
RDX: 0000000000000010 RSI: 00007ffffd6b2360 RDI: 0000000000000005
RBP: 0000561f1370d560 R08: 00002795ad21d1ac R09: 0030312e302e302e
R10: 00007ffffd73f080 R11: 0000000000000246 R12: 0000561f1370c410
R13: 0000000000000000 R14: 0000000000000005 R15: 0000000000000000
</TASK>
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Co-developed-by: Luca Czesla <luca.czesla@mail.schwarz>
Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz>
Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZC0pBXBAgh7c76CA@kernel-bug-kernel-bug
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-04-08
We've added 4 non-merge commits during the last 11 day(s) which contain
a total of 5 files changed, 39 insertions(+), 6 deletions(-).
The main changes are:
1) Fix BPF TCP socket iterator to use correct helper for dropping
socket's refcount, that is, sock_gen_put instead of sock_put,
from Martin KaFai Lau.
2) Fix a BTI exception splat in BPF trampoline-generated code on arm64,
from Xu Kuohai.
3) Fix a LongArch JIT error from missing BPF_NOSPEC no-op, from George Guo.
4) Fix dynamic XDP feature detection of veth in xdp_redirect selftest,
from Lorenzo Bianconi.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: fix xdp_redirect xdp-features selftest for veth driver
bpf, arm64: Fixed a BTI error on returning to patched function
LoongArch, bpf: Fix jit to skip speculation barrier opcode
bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp
====================
Link: https://lore.kernel.org/r/20230407224642.30906-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
UBSAN: shift-out-of-bounds in net/ipv4/tcp_input.c:555:23
shift exponent 255 is too large for 32-bit type 'int'
CPU: 1 PID: 7907 Comm: ssh Not tainted 6.3.0-rc4-00161-g62bad54b26db-dirty #206
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x136/0x150
__ubsan_handle_shift_out_of_bounds+0x21f/0x5a0
tcp_init_transfer.cold+0x3a/0xb9
tcp_finish_connect+0x1d0/0x620
tcp_rcv_state_process+0xd78/0x4d60
tcp_v4_do_rcv+0x33d/0x9d0
__release_sock+0x133/0x3b0
release_sock+0x58/0x1b0
'maxwin' is int, shifting int for 32 or more bits is undefined behaviour.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
drivers/net/ethernet/google/gve/gve.h
3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts")
75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format")
https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/
https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/
Adjacent changes:
net/can/isotp.c
051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()")
96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and can.
Current release - regressions:
- wifi: mac80211:
- fix potential null pointer dereference
- fix receiving mesh packets in forwarding=0 networks
- fix mesh forwarding
Current release - new code bugs:
- virtio/vsock: fix leaks due to missing skb owner
Previous releases - regressions:
- raw: fix NULL deref in raw_get_next().
- sctp: check send stream number after wait_for_sndbuf
- qrtr:
- fix a refcount bug in qrtr_recvmsg()
- do not do DEL_SERVER broadcast after DEL_CLIENT
- wifi: brcmfmac: fix SDIO suspend/resume regression
- wifi: mt76: fix use-after-free in fw features query.
- can: fix race between isotp_sendsmg() and isotp_release()
- eth: mtk_eth_soc: fix remaining throughput regression
- eth: ice: reset FDIR counter in FDIR init stage
Previous releases - always broken:
- core: don't let netpoll invoke NAPI if in xmit context
- icmp: guard against too small mtu
- ipv6: fix an uninit variable access bug in __ip6_make_skb()
- wifi: mac80211: fix the size calculation of
ieee80211_ie_len_eht_cap()
- can: fix poll() to not report false EPOLLOUT events
- eth: gve: secure enough bytes in the first TX desc for all TCP
pkts"
* tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net: stmmac: check fwnode for phy device before scanning for phy
net: stmmac: Add queue reset into stmmac_xdp_open() function
selftests: net: rps_default_mask.sh: delete veth link specifically
net: fec: make use of MDIO C45 quirk
can: isotp: fix race between isotp_sendsmg() and isotp_release()
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
gve: Secure enough bytes in the first TX desc for all TCP pkts
netlink: annotate lockless accesses to nlk->max_recvmsg_len
ethtool: reset #lanes when lanes is omitted
ping: Fix potentail NULL deref for /proc/net/icmp.
raw: Fix NULL deref in raw_get_next().
ice: Reset FDIR counter in FDIR init stage
ice: fix wrong fallback logic for FDIR
net: stmmac: fix up RX flow hash indirection table when setting channels
net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
wifi: mt76: ignore key disable commands
wifi: ath11k: reduce the MHI timeout to 20s
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
...
|
|
Make sure unaligned descriptors that straddle the end of the UMEM are
considered invalid. Currently, descriptor validation is broken for
zero-copy mode which only checks descriptors at page granularity.
For example, descriptors in zero-copy mode that overrun the end of the
UMEM but not a page boundary are (incorrectly) considered valid. The
UMEM boundary check needs to happen before the page boundary and
contiguity checks in xp_desc_crosses_non_contig_pg(). Do this check in
xp_unaligned_validate_desc() instead like xp_check_unaligned() already
does.
Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230405235920.7305-2-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
For memory alloc that store user data from nla[NFTA_OBJ_USERDATA],
use GFP_KERNEL_ACCOUNT is more suitable.
Fixes: 33758c891479 ("memcg: enable accounting for nft objects")
Signed-off-by: Chen Aotian <chenaotian2@163.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Recent attempt to ensure PREROUTING hook is executed again when a
decrypted ipsec packet received on a bridge passes through the network
stack a second time broke the physdev match in INPUT hook.
We can't discard the nf_bridge info strct from sabotage_in hook, as
this is needed by the physdev match.
Keep the struct around and handle this with another conditional instead.
Fixes: 2b272bb558f1 ("netfilter: br_netfilter: disable sabotage_in hook after first suppression")
Reported-and-tested-by: Farid BENAMROUCHE <fariouche@yahoo.fr>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In packet offload mode, the hardware is responsible to manage
replay window and advance ESN. In that mode, there won't any
call to .xdo_dev_state_advance_esn callback.
So relax current check for existence of that callback.
Link: https://lore.kernel.org/r/9f3dfc3fef2cfcd191f0c5eee7cf0aa74e7f7786.1680162300.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
For the sake of readability, use the netlink payload helpers from
the 'nla_get_*()' family to parse the attributes.
tdc results:
1..5
ok 1 9903 - Add mqprio Qdisc to multi-queue device (8 queues)
ok 2 453a - Delete nonexistent mqprio Qdisc
ok 3 5292 - Delete mqprio Qdisc twice
ok 4 45a9 - Add mqprio Qdisc to single-queue device
ok 5 2ba9 - Show mqprio class
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230404203449.1627033-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.3
Smaller pull request this time, sending this early to fix the conflict
in mac80211. Nothing really special this time, only smaller changes.
* enable Wi-Fi 7 (EHT) mesh support
* tag 'wireless-next-2023-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (37 commits)
wifi: rt2x00: Fix memory leak when handling surveys
wifi: b43legacy: Remove the unused function prev_slot()
wifi: rtw89: Remove redundant pci_clear_master
wifi: rtw89: fix potential race condition between napi_init and napi_enable
wifi: rtw89: config EDCCA threshold during scan to prevent TX failed
wifi: rtw89: fix incorrect channel info during scan due to ppdu_sts filtering
wifi: rtw89: remove superfluous H2C of join_info
wifi: rtw89: set data lowest rate according to AP supported rate
wifi: rtw89: add counters of register-based H2C/C2H
wifi: rtw89: coex: Update Wi-Fi Bluetooth coexistence version to 7.0.1
wifi: rtw89: coex: Add report control v5 variation
wifi: rtw89: coex: Update RTL8852B LNA2 hardware parameter
wifi: rtw89: coex: Not to enable firmware report when WiFi is power saving
wifi: rtw89: coex: Add LPS protocol radio state for RTL8852B
bcma: remove unused mips_read32 function
bcma: Use of_address_to_resource()
wifi: mwifiex: remove unused evt_buf variable
wifi: brcmsmac: ampdu: remove unused suc_mpdu variable
wifi: rtlwifi: fix incorrect error codes in rtl_debugfs_set_write_reg()
wifi: rtlwifi: fix incorrect error codes in rtl_debugfs_set_write_rfreg()
...
====================
Link: https://lore.kernel.org/r/20230405111037.4792BC43443@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2023-04-04-2
The first patch is by Oliver Hartkopp and makes the maximum pdu size
of the CAN ISOTP protocol configurable.
The following 5 patches are by Dario Binacchi and add support for the
bxCAN controller by ST.
Geert Uytterhoeven's patch for the rcar_canfd driver fixes a sparse
warning.
Peng Fan's patch adds an optional power-domains property to the
flexcan device tree binding.
Frank Jungclaus adds support for CAN_CTRLMODE_BERR_REPORTING to the
esd_usb driver.
The last patch is by Oliver Hartkopp and converts the USB IDs of the
kvaser_usb driver to hexadecimal values.
* tag 'linux-can-next-for-6.4-20230404-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
kvaser_usb: convert USB IDs to hexadecimal values
can: esd_usb: Add support for CAN_CTRLMODE_BERR_REPORTING
dt-bindings: can: fsl,flexcan: add optional power-domains property
can: rcar_canfd: rcar_canfd_probe(): fix plain integer in transceivers[] init
can: bxcan: add support for ST bxCAN controller
ARM: dts: stm32: add pin map for CAN controller on stm32f4
ARM: dts: stm32: add CAN support on stm32f429
dt-bindings: net: can: add STM32 bxcan DT bindings
dt-bindings: arm: stm32: add compatible for syscon gcan node
can: isotp: add module parameter for maximum pdu size
====================
Link: https://lore.kernel.org/r/20230404145908.1714400-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As discussed with Dae R. Jeong and Hillf Danton here [1] the sendmsg()
function in isotp.c might get into a race condition when restoring the
former tx.state from the old_state.
Remove the old_state concept and implement proper locking for the
ISOTP_IDLE transitions in isotp_sendmsg(), inspired by a
simplification idea from Hillf Danton.
Introduce a new tx.state ISOTP_SHUTDOWN and use the same locking
mechanism from isotp_release() which resolves a potential race between
isotp_sendsmg() and isotp_release().
[1] https://lore.kernel.org/linux-can/ZB%2F93xJxq%2FBUqAgG@dragonet
v1: https://lore.kernel.org/all/20230331102114.15164-1-socketcan@hartkopp.net
v2: https://lore.kernel.org/all/20230331123600.3550-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_release()
v3: https://lore.kernel.org/all/20230331130654.9886-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_sendmsg() in the wait_tx_done case
v4: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_sendmsg() in ALL cases
Cc: Dae R. Jeong <threeearcat@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Fixes: 4f027cba8216 ("can: isotp: split tx timer into transmission and timeout")
Link: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
[mkl: rephrase commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
When using select()/poll()/epoll() with a non-blocking ISOTP socket to
wait for when non-blocking write is possible, a false EPOLLOUT event
is sometimes returned. This can happen at least after sending a
message which must be split to multiple CAN frames.
The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
not equal to ISOTP_IDLE and this behavior is not reflected in
datagram_poll(), which is used in isotp_ops.
This is fixed by introducing ISOTP-specific poll function, which
suppresses the EPOLLOUT events in that case.
v2: https://lore.kernel.org/all/20230302092812.320643-1-michal.sojka@cvut.cz
v1: https://lore.kernel.org/all/20230224010659.48420-1-michal.sojka@cvut.cz
https://lore.kernel.org/all/b53a04a2-ba1f-3858-84c1-d3eb3301ae15@hartkopp.net
Signed-off-by: Michal Sojka <michal.sojka@cvut.cz>
Reported-by: Jakub Jira <jirajak2@fel.cvut.cz>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20230331125511.372783-1-michal.sojka@cvut.cz
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
isotp.c was still using sock_recv_timestamp() which does not provide
control messages to detect dropped PDUs in the receive path.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230330170248.62342-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access
could occur during the memcpy() operation if the size of skb->cb is
larger than the size of struct j1939_sk_buff_cb. This is because the
memcpy() operation uses the size of skb->cb, leading to a read beyond
the struct j1939_sk_buff_cb.
Updated the memcpy() operation to use the size of struct
j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the
memcpy() operation only reads the memory within the bounds of struct
j1939_sk_buff_cb, preventing out-of-bounds memory access.
Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb
is greater than or equal to the size of struct j1939_sk_buff_cb. This
ensures that the skb->cb buffer is large enough to hold the
j1939_sk_buff_cb structure.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Reported-by: Shuangpeng Bai <sjb7183@psu.edu>
Tested-by: Shuangpeng Bai <sjb7183@psu.edu>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://groups.google.com/g/syzkaller/c/G_LL-C3plRs/m/-8xCi6dCAgAJ
Link: https://lore.kernel.org/all/20230404073128.3173900-1-o.rempel@pengutronix.de
Cc: stable@vger.kernel.org
[mkl: rephrase commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
syzbot reported a data-race in data-race in netlink_recvmsg() [1]
Indeed, netlink_recvmsg() can be run concurrently,
and netlink_dump() also needs protection.
[1]
BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0:
netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
__sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194
__do_sys_recvfrom net/socket.c:2212 [inline]
__se_sys_recvfrom net/socket.c:2208 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2208
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1:
netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
____sys_recvmsg+0x156/0x310 net/socket.c:2720
___sys_recvmsg net/socket.c:2762 [inline]
do_recvmmsg+0x2e5/0x710 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x0000000000000000 -> 0x0000000000001000
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48fdfcb #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the number of lanes was forced and then subsequently the user
omits this parameter, the ksettings->lanes is reset. The driver
should then reset the number of lanes to the device's default
for the specified speed.
However, although the ksettings->lanes is set to 0, the mod variable
is not set to true to indicate the driver and userspace should be
notified of the changes.
The consequence is that the same ethtool operation will produce
different results based on the initial state.
If the initial state is:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: on
then executing 'ethtool -s swp1 speed 50000 autoneg off' will yield:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: off
While if the initial state is:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 1
Duplex: Full
Auto-negotiation: off
executing the same 'ethtool -s swp1 speed 50000 autoneg off' results in:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 1
Duplex: Full
Auto-negotiation: off
This patch fixes this behavior. Omitting lanes will always results in
the driver choosing the default lane width for the chosen speed. In this
scenario, regardless of the initial state, the end state will be, e.g.,
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: off
Fixes: 012ce4dd3102 ("ethtool: Extend link modes settings uAPI with lanes")
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/ac238d6b-8726-8156-3810-6471291dbc7f@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After commit dbca1596bbb0 ("ping: convert to RCU lookups, get rid
of rwlock"), we use RCU for ping sockets, but we should use spinlock
for /proc/net/icmp to avoid a potential NULL deref mentioned in
the previous patch.
Let's go back to using spinlock there.
Note we can convert ping sockets to use hlist instead of hlist_nulls
because we do not use SLAB_TYPESAFE_BY_RCU for ping sockets.
Fixes: dbca1596bbb0 ("ping: convert to RCU lookups, get rid of rwlock")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Dae R. Jeong reported a NULL deref in raw_get_next() [0].
It seems that the repro was running these sequences in parallel so
that one thread was iterating on a socket that was being freed in
another netns.
unshare(0x40060200)
r0 = syz_open_procfs(0x0, &(0x7f0000002080)='net/raw\x00')
socket$inet_icmp_raw(0x2, 0x3, 0x1)
pread64(r0, &(0x7f0000000000)=""/10, 0xa, 0x10000000007f)
After commit 0daf07e52709 ("raw: convert raw sockets to RCU"), we
use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW
sockets. However, we should use spinlock for slow paths to avoid
the NULL deref.
Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object
is not reused during iteration in the grace period. In fact, the
lockless readers do not check the nulls marker with get_nulls_value().
So, SOCK_RAW should use hlist instead of hlist_nulls.
Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(),
let's convert hlist_nulls to hlist and use sk_for_each_rcu() for
fast paths and sk_for_each() and spinlock for /proc/net/raw.
[0]:
general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 2 PID: 20952 Comm: syz-executor.0 Not tainted 6.2.0-g048ec869bafd-dirty #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
RIP: 0010:sock_net include/net/sock.h:649 [inline]
RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
FS: 00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055bb9614b35f CR3: 000000003c672000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
seq_read_iter+0x4c6/0x10f0 fs/seq_file.c:225
seq_read+0x224/0x320 fs/seq_file.c:162
pde_read fs/proc/inode.c:316 [inline]
proc_reg_read+0x23f/0x330 fs/proc/inode.c:328
vfs_read+0x31e/0xd30 fs/read_write.c:468
ksys_pread64 fs/read_write.c:665 [inline]
__do_sys_pread64 fs/read_write.c:675 [inline]
__se_sys_pread64 fs/read_write.c:672 [inline]
__x64_sys_pread64+0x1e9/0x280 fs/read_write.c:672
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x478d29
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f843ae8dbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
RAX: ffffffffffffffda RBX: 0000000000791408 RCX: 0000000000478d29
RDX: 000000000000000a RSI: 0000000020000000 RDI: 0000000000000003
RBP: 00000000f477909a R08: 0000000000000000 R09: 0000000000000000
R10: 000010000000007f R11: 0000000000000246 R12: 0000000000791740
R13: 0000000000791414 R14: 0000000000791408 R15: 00007ffc2eb48a50
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
RIP: 0010:sock_net include/net/sock.h:649 [inline]
RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
FS: 00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f92ff166000 CR3: 000000003c672000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: 0daf07e52709 ("raw: convert raw sockets to RCU")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Dae R. Jeong <threeearcat@gmail.com>
Link: https://lore.kernel.org/netdev/ZCA2mGV_cmq7lIfV@dragonet/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bpf_[sk|inode|task|cgrp]_storage_[get|delete]() and bpf_get_socket_cookie() helpers
perform run-time check that sk|inode|task|cgrp pointer != NULL.
Teach verifier about this fact and allow bpf programs to pass
PTR_TO_BTF_ID | PTR_MAYBE_NULL into such helpers.
It will be used in the subsequent patch that will do
bpf_sk_storage_get(.., skb->sk, ...);
Even when 'skb' pointer is trusted the 'sk' pointer may be NULL.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230404045029.82870-5-alexei.starovoitov@gmail.com
|
|
Remove unused arguments from btf_struct_access() callback.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230404045029.82870-3-alexei.starovoitov@gmail.com
|