Age | Commit message (Collapse) | Author |
|
marvell/otx2 and mvpp2 do not support setting different
keys for different RSS contexts. Contexts have separate
indirection tables but key is shared with all other contexts.
This is likely fine, indirection table is the most important
piece.
Don't report the key-related parameters from such drivers.
This prevents driver-errors, e.g. otx2 always writes
the main key, even when user asks to change per-context key.
The second reason is that without this change tracking
the keys by the core gets complicated. Even if the driver
correctly reject setting key with rss_context != 0,
change of the main key would have to be reflected in
the XArray for all additional contexts.
Since the additional contexts don't have their own keys
not including the attributes (in Netlink speak) seems
intuitive. ethtool CLI seems to deal with it just fine.
Having to set the flag in majority of the drivers is
a bit tedious but not reporting the key is a safer
default.
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
cap_rss_ctx_supported was created because the API for creating
and configuring additional contexts is mux'ed with the normal
RSS API. Presence of ops does not imply driver can actually
support rss_context != 0 (in fact drivers mostly ignore that
field). cap_rss_ctx_supported lets core check that the driver
is context-aware before calling it.
Now that we have .create_rxfh_context, there is no such
ambiguity. We can depend on presence of the op.
Make setting the bit optional.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot exposes a race where a net used by l2tp is removed while an
existing pppol2tp socket is closed. In l2tp_pre_exit_net, l2tp queues
TUNNEL_DELETE work items to close each tunnel in the net. When these
are run, new SESSION_DELETE work items are queued to delete each
session in the tunnel. This all happens in drain_workqueue. However,
drain_workqueue allows only new work items if they are queued by other
work items which are already in the queue. If pppol2tp_release runs
after drain_workqueue has started, it may queue a SESSION_DELETE work
item, which results in the warning below in drain_workqueue.
Address this by flushing the workqueue before drain_workqueue such
that all queued TUNNEL_DELETE work items run before drain_workqueue is
started. This will queue SESSION_DELETE work items for each session in
the tunnel, hence pppol2tp_release or other API requests won't queue
SESSION_DELETE requests once drain_workqueue is started.
WARNING: CPU: 1 PID: 5467 at kernel/workqueue.c:2259 __queue_work+0xcd3/0xf50 kernel/workqueue.c:2258
Modules linked in:
CPU: 1 UID: 0 PID: 5467 Comm: syz.3.43 Not tainted 6.11.0-rc1-syzkaller-00247-g3608d6aca5e7 #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
RIP: 0010:__queue_work+0xcd3/0xf50 kernel/workqueue.c:2258
Code: ff e8 11 84 36 00 90 0f 0b 90 e9 1e fd ff ff e8 03 84 36 00 eb 13 e8 fc 83 36 00 eb 0c e8 f5 83 36 00 eb 05 e8 ee 83 36 00 90 <0f> 0b 90 48 83 c4 60 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc
RSP: 0018:ffffc90004607b48 EFLAGS: 00010093
RAX: ffffffff815ce274 RBX: ffff8880661fda00 RCX: ffff8880661fda00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff815cd6d4 R09: 0000000000000000
R10: ffffc90004607c20 R11: fffff520008c0f85 R12: ffff88802ac33800
R13: ffff88802ac339c0 R14: dffffc0000000000 R15: 0000000000000008
FS: 00005555713eb500(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000001eda6000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
queue_work_on+0x1c2/0x380 kernel/workqueue.c:2392
pppol2tp_release+0x163/0x230 net/l2tp/l2tp_ppp.c:445
__sock_release net/socket.c:659 [inline]
sock_close+0xbc/0x240 net/socket.c:1421
__fput+0x24a/0x8a0 fs/file_table.c:422
task_work_run+0x24f/0x310 kernel/task_work.c:228
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x168/0x370 kernel/entry/common.c:218
do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f061e9779f9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffff1c1fce8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
RAX: 0000000000000000 RBX: 000000000001017d RCX: 00007f061e9779f9
RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
RBP: 00007ffff1c1fdc0 R08: 0000000000000001 R09: 00007ffff1c1ffcf
R10: 00007f061e800000 R11: 0000000000000246 R12: 0000000000000032
R13: 00007ffff1c1fde0 R14: 00007ffff1c1fe00 R15: ffffffffffffffff
</TASK>
Fixes: fc7ec7f554d7 ("l2tp: delete sessions using work queue")
Reported-by: syzbot+0e85b10481d2f5478053@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0e85b10481d2f5478053
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
l2tp_eth uses old-style dev->stats for fastpath packet/byte
counters. Convert it to use dev->tstats per-cpu counters.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
l2tp_tunnel_inc_refcount and l2tp_session_inc_refcount wrap
refcount_inc. They add no value so just use the refcount APIs directly
and drop l2tp's helpers. l2tp already uses refcount_inc_not_zero
anyway.
Rename l2tp_tunnel_dec_refcount and l2tp_session_dec_refcount to
l2tp_tunnel_put and l2tp_session_put to better match their use pairing
various _get getters.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
l2tp netlink and procfs/debugfs iterate over tunnel and session lists
to obtain data. They currently use very inefficient get_nth functions
to do so. Replace these with get_next.
For netlink, use nl cb->ctx[] for passing state instead of the
obsolete cb->args[].
l2tp_tunnel_get_nth and l2tp_session_get_nth are no longer used so
they can be removed.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
l2tp management APIs and procfs/debugfs iterate over l2tp tunnel and
session lists. Since these lists are now implemented using IDR, we can
use IDR get_next APIs to iterate them. Add tunnel/session get_next
functions to do so.
The session get_next functions get the next session in a given tunnel
and need to account for l2tpv2 and l2tpv3 differences:
* l2tpv2 sessions are keyed by tunnel ID / session ID. Iteration for
a given tunnel ID, TID, can therefore start with a key given by
TID/0 and finish when the next entry's tunnel ID is not TID. This
is possible only because the tunnel ID part of the key is the upper
16 bits and the session ID part the lower 16 bits; when idr_next
increments the key value, it therefore finds the next sessions of
the current tunnel before those of the next tunnel. Entries with
session ID 0 are always skipped because they are used internally by
pppol2tp.
* l2tpv3 sessions are keyed by session ID. Iteration starts at the
first IDR entry and skips entries where the tunnel does not
match. Iteration must also consider session ID collisions and walk
the list of colliding sessions (if any) for one which matches the
supplied tunnel.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To handle colliding l2tpv3 session IDs, l2tp_v3_session_get searches a
hashed list keyed by ID and sk. Although unlikely, if hash keys
collide, it is possible that hash_for_each_possible loops over a
session which doesn't have the ID that we are searching for. So check
for session ID match when looping over possible hash key matches.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
l2tp_ip[6] have always used global socket tables. It is therefore not
possible to create l2tpip sockets in different namespaces with the
same socket address.
To support this, move l2tpip socket tables to pernet data.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update l2tp to remove the inline keyword from several functions in C
sources, since this is now discouraged.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Two minor fixes for recent changes
* tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't set SVC_SOCK_ANONYMOUS when creating nfsd sockets
sunrpc: avoid -Wformat-security warning
|
|
In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
checksum offload") we have intentionally allowed UDP GSO packets marked
CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and
checksummed by a software fallback when the egress device lacks these
features.
What was not taken into consideration is that a CHECKSUM_NONE skb can be
handed over to the GSO stack also when the egress device advertises the
tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature.
This will happen when there are IPv6 extension headers present, which we
check for in __ip6_append_data(). Syzbot has discovered this scenario,
producing a warning as below:
ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869)
WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
Modules linked in:
CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
[...]
Call Trace:
<TASK>
__skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661
__dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415
neigh_output include/net/neighbour.h:542 [inline]
ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137
ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222
ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958
udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292
udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xef/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
___sys_sendmsg net/socket.c:2639 [inline]
__sys_sendmmsg+0x3b2/0x740 net/socket.c:2725
__do_sys_sendmmsg net/socket.c:2754 [inline]
__se_sys_sendmmsg net/socket.c:2751 [inline]
__x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[...]
</TASK>
We are hitting the bad offload warning because when an egress device is
capable of handling segmentation offload requested by
skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce
any segment skbs and return NULL. See the skb_gso_ok() branch in
{__udp,tcp,sctp}_gso_segment helpers.
To fix it, force a fallback to software USO when processing a packet with
IPv6 extension headers, since we don't know if these can checksummed by
all devices which offer USO.
Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20240808-udp-gso-egress-from-tunnel-v4-2-f5c5b4149ab9@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
UDP segmentation offload inherently depends on checksum offload. It should
not be possible to disable checksum offload while leaving USO enabled.
Enforce this dependency in code.
There is a single tx-udp-segmentation feature flag to indicate support for
both IPv4/6, hence the devices wishing to support USO must offer checksum
offload for both IP versions.
Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240808-udp-gso-egress-from-tunnel-v4-1-f5c5b4149ab9@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently ethtool_set_channel calls separate functions to check whether
the new channel number violates rss configuration or flow steering
configuration.
Very soon we need to check whether the new channel number violates
memory provider configuration as well.
To do all 3 checks cleanly, add a wrapper around
ethtool_get_max_rxnfc_channel() and ethtool_get_max_rxfh_channel(),
which does both checks. We can later extend this wrapper to add the
memory provider check in one place.
Note that in the current code, we put a descriptive genl error message
when we run into issues. To preserve the error message, we pass the
genl_info* to the common helper. The ioctl calls can pass NULL instead.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20240808205345.2141858-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To better our unit tests we need code coverage to be part of the kernel.
This patch borrows heavily from how CONFIG_GCOV_PROFILE_FTRACE is
implemented
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A value of 1 doesn't make sense, as it implies the only allowed
context ID is 0, which is reserved for the default context - in
which case the driver should just not claim to support custom
RSS contexts at all.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/c07725b3a3d0b0a63b85e230f9c77af59d4d07f8.1723045898.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently the only opportunity to set sock ops flags dictating
which callbacks fire for a socket is from within a TCP-BPF sockops
program. This is problematic if the connection is already set up
as there is no further chance to specify callbacks for that socket.
Add TCP_BPF_SOCK_OPS_CB_FLAGS to bpf_setsockopt() and bpf_getsockopt()
to allow users to specify callbacks later, either via an iterator
over sockets or via a socket-specific program triggered by a
setsockopt() on the socket.
Previous discussion on this here [1].
[1] https://lore.kernel.org/bpf/f42f157b-6e52-dd4d-3d97-9b86c84c0b00@oracle.com/
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20240808150558.1035626-2-alan.maguire@oracle.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts or adjacent changes.
Link: https://patch.msgid.link/20240808170148.3629934-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The 'at least one change' requirement is not applicable for context
creation, skip the check in such case.
This allows a command such as 'ethtool -X eth0 context new' to work.
The command works by mistake when using older versions of userspace
ethtool due to an incompatibility issue where rxfh.input_xfrm is passed
as zero (unset) instead of RXH_XFRM_NO_CHANGE as done with recent
userspace. This patch does not try to solve the incompatibility issue.
Link: https://lore.kernel.org/netdev/05ae8316-d3aa-4356-98c6-55ed4253c8a7@nvidia.com/
Fixes: 84a1d9c48200 ("net: ethtool: extend RXNFC API to support RSS spreading of filter matches")
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20240807173352.3501746-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Both ethtool_ops.rxfh_max_context_id and the default value used when
it's not specified are supposed to be exclusive maxima (the former
is documented as such; the latter, U32_MAX, cannot be used as an ID
since it equals ETH_RXFH_CONTEXT_ALLOC), but xa_alloc() expects an
inclusive maximum.
Subtract one from 'limit' to produce an inclusive maximum, and pass
that to xa_alloc().
Increase bnxt's max by one to prevent a (very minor) regression, as
BNXT_MAX_ETH_RSS_CTX is an inclusive max. This is safe since bnxt
is not actually hard-limited; BNXT_MAX_ETH_RSS_CTX is just a
leftover from old driver code that managed context IDs itself.
Rename rxfh_max_context_id to rxfh_max_num_contexts to make its
semantics (hopefully) more obvious.
Fixes: 847a8ab18676 ("net: ethtool: let the core choose RSS context IDs")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/5a2d11a599aa5b0cc6141072c01accfb7758650c.1723045898.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When l2tp tunnels use a socket provided by userspace, we can hit
lockdep splats like the below when data is transmitted through another
(unrelated) userspace socket which then gets routed over l2tp.
This issue was previously discussed here:
https://lore.kernel.org/netdev/87sfialu2n.fsf@cloudflare.com/
The solution is to have lockdep treat socket locks of l2tp tunnel
sockets separately than those of standard INET sockets. To do so, use
a different lockdep subclass where lock nesting is possible.
============================================
WARNING: possible recursive locking detected
6.10.0+ #34 Not tainted
--------------------------------------------
iperf3/771 is trying to acquire lock:
ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0
but task is already holding lock:
ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(slock-AF_INET/1);
lock(slock-AF_INET/1);
*** DEADLOCK ***
May be due to missing lock nesting notation
10 locks held by iperf3/771:
#0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40
#1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
#2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
#3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0
#4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260
#5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10
#6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
#7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
#8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450
#9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450
stack backtrace:
CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0x69/0xa0
dump_stack+0xc/0x20
__lock_acquire+0x135d/0x2600
? srso_alias_return_thunk+0x5/0xfbef5
lock_acquire+0xc4/0x2a0
? l2tp_xmit_skb+0x243/0x9d0
? __skb_checksum+0xa3/0x540
_raw_spin_lock_nested+0x35/0x50
? l2tp_xmit_skb+0x243/0x9d0
l2tp_xmit_skb+0x243/0x9d0
l2tp_eth_dev_xmit+0x3c/0xc0
dev_hard_start_xmit+0x11e/0x420
sch_direct_xmit+0xc3/0x640
__dev_queue_xmit+0x61c/0x1450
? ip_finish_output2+0xf4c/0x1130
ip_finish_output2+0x6b6/0x1130
? srso_alias_return_thunk+0x5/0xfbef5
? __ip_finish_output+0x217/0x380
? srso_alias_return_thunk+0x5/0xfbef5
__ip_finish_output+0x217/0x380
ip_output+0x99/0x120
__ip_queue_xmit+0xae4/0xbc0
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? tcp_options_write.constprop.0+0xcb/0x3e0
ip_queue_xmit+0x34/0x40
__tcp_transmit_skb+0x1625/0x1890
__tcp_send_ack+0x1b8/0x340
tcp_send_ack+0x23/0x30
__tcp_ack_snd_check+0xa8/0x530
? srso_alias_return_thunk+0x5/0xfbef5
tcp_rcv_established+0x412/0xd70
tcp_v4_do_rcv+0x299/0x420
tcp_v4_rcv+0x1991/0x1e10
ip_protocol_deliver_rcu+0x50/0x220
ip_local_deliver_finish+0x158/0x260
ip_local_deliver+0xc8/0xe0
ip_rcv+0xe5/0x1d0
? __pfx_ip_rcv+0x10/0x10
__netif_receive_skb_one_core+0xce/0xe0
? process_backlog+0x28b/0x9f0
__netif_receive_skb+0x34/0xd0
? process_backlog+0x28b/0x9f0
process_backlog+0x2cb/0x9f0
__napi_poll.constprop.0+0x61/0x280
net_rx_action+0x332/0x670
? srso_alias_return_thunk+0x5/0xfbef5
? find_held_lock+0x2b/0x80
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
handle_softirqs+0xda/0x480
? __dev_queue_xmit+0xa2c/0x1450
do_softirq+0xa1/0xd0
</IRQ>
<TASK>
__local_bh_enable_ip+0xc8/0xe0
? __dev_queue_xmit+0xa2c/0x1450
__dev_queue_xmit+0xa48/0x1450
? ip_finish_output2+0xf4c/0x1130
ip_finish_output2+0x6b6/0x1130
? srso_alias_return_thunk+0x5/0xfbef5
? __ip_finish_output+0x217/0x380
? srso_alias_return_thunk+0x5/0xfbef5
__ip_finish_output+0x217/0x380
ip_output+0x99/0x120
__ip_queue_xmit+0xae4/0xbc0
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? tcp_options_write.constprop.0+0xcb/0x3e0
ip_queue_xmit+0x34/0x40
__tcp_transmit_skb+0x1625/0x1890
tcp_write_xmit+0x766/0x2fb0
? __entry_text_end+0x102ba9/0x102bad
? srso_alias_return_thunk+0x5/0xfbef5
? __might_fault+0x74/0xc0
? srso_alias_return_thunk+0x5/0xfbef5
__tcp_push_pending_frames+0x56/0x190
tcp_push+0x117/0x310
tcp_sendmsg_locked+0x14c1/0x1740
tcp_sendmsg+0x28/0x40
inet_sendmsg+0x5d/0x90
sock_write_iter+0x242/0x2b0
vfs_write+0x68d/0x800
? __pfx_sock_write_iter+0x10/0x10
ksys_write+0xc8/0xf0
__x64_sys_write+0x3d/0x50
x64_sys_call+0xfaf/0x1f50
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f4d143af992
Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0
RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992
RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005
RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc
R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0
</TASK>
Fixes: 0b2c59720e65 ("l2tp: close all race conditions in l2tp_tunnel_register()")
Suggested-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+6acef9e0a4d1f46c83d4@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4
CC: gnault@redhat.com
CC: cong.wang@bytedance.com
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Link: https://patch.msgid.link/20240806160626.1248317-1-jchapman@katalix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_sync: avoid dup filtering when passive scanning with adv monitor
- hci_qca: don't call pwrseq_power_off() twice for QCA6390
- hci_qca: fix QCA6390 support on non-DT platforms
- hci_qca: fix a NULL-pointer derefence at shutdown
- l2cap: always unlock channel in l2cap_conless_channel()
* tag 'for-net-2024-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor
Bluetooth: l2cap: always unlock channel in l2cap_conless_channel()
Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown
Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms
Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390
====================
Link: https://patch.msgid.link/20240807210103.142483-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The number of fallback reasons defined in the smc_clc.h file has reached
36. For historical reasons, some are no longer quoted, and there's 33
actually in use. So, add the max value of fallback reason count to 36.
Fixes: 6ac1e6563f59 ("net/smc: support smc v2.x features validate")
Fixes: 7f0620b9940b ("net/smc: support max connections per lgr negotiation")
Fixes: 69b888e3bb4b ("net/smc: support max links per lgr negotiation in clc handshake")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Link: https://patch.msgid.link/20240805043856.565677-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This restores behaviour (including the comment) from now-removed
hci_request.c, and also matches existing code for active scanning.
Without this, the duplicates filter is always active when passive
scanning, which makes it impossible to work with devices that send
nontrivial dynamic data in their advertisement reports.
Fixes: abfeea476c68 ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY")
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Add missing call to 'l2cap_chan_unlock()' on receive error handling
path in 'l2cap_conless_channel()'.
Fixes: a24cce144b98 ("Bluetooth: Fix reference counting of global L2CAP channels")
Reported-by: syzbot+45ac74737e866894acb0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=45ac74737e866894acb0
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Now it's time to let it work by using the 'reason' parameter in
the trace world :)
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
reset
When user tries to disconnect a socket and there are more data written
into tcp write queue, we should tell users about this reset reason.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introducing this to show the users the reason of keepalive timeout.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introducing a new type TCP_STATE to handle some reset conditions
appearing in RFC 793 due to its socket state. Actually, we can look
into RFC 9293 which has no discrepancy about this part.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introducing a new type TCP_ABORT_ON_MEMORY for tcp reset reason to handle
out of memory case.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introducing a new type TCP_ABORT_ON_LINGER for tcp reset reason to handle
negative linger value case.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introducing a new type TCP_ABORT_ON_CLOSE for tcp reset reason to handle
the case where more data is unread in closing phase.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ethtool_cmis_page_fini() is declared but never implemented.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://patch.msgid.link/20240803112213.4044015-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Using clamp instead of min(max()) is easier to read and it matches even
better the comment just above it.
It also reduces the size of the preprocessed files by ~ 2.5 ko.
(see [1] for a discussion about it)
$ ls -l net/ipv4/tcp_htcp*.i
5576024 27 juil. 10:19 net/ipv4/tcp_htcp.old.i
5573550 27 juil. 10:21 net/ipv4/tcp_htcp.new.i
[1]: https://lore.kernel.org/all/23bdb6fc8d884ceebeb6e8b8653b8cfe@AcuMS.aculab.com/
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/561bb4974499a328ac39aff31858465d9bd12b1c.1722752370.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
linkwatch_event() grabs possibly very contended RTNL mutex.
system_wq is not suitable for such work.
Inspired by many noisy syzbot reports.
3 locks held by kworker/0:7/5266:
#0: ffff888015480948 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3206 [inline]
#0: ffff888015480948 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x90a/0x1830 kernel/workqueue.c:3312
#1: ffffc90003f6fd00 ((linkwatch_work).work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3207 [inline]
, at: process_scheduled_works+0x945/0x1830 kernel/workqueue.c:3312
#2: ffffffff8fa6f208 (rtnl_mutex){+.+.}-{3:3}, at: linkwatch_event+0xe/0x60 net/core/link_watch.c:276
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240805085821.1616528-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzkaller reported a warning in bcm_connect() below. [0]
The repro calls connect() to vxcan1, removes vxcan1, and calls
connect() with ifindex == 0.
Calling connect() for a BCM socket allocates a proc entry.
Then, bcm_sk(sk)->bound is set to 1 to prevent further connect().
However, removing the bound device resets bcm_sk(sk)->bound to 0
in bcm_notify().
The 2nd connect() tries to allocate a proc entry with the same
name and sets NULL to bcm_sk(sk)->bcm_proc_read, leaking the
original proc entry.
Since the proc entry is available only for connect()ed sockets,
let's clean up the entry when the bound netdev is unregistered.
[0]:
proc_dir_entry 'can-bcm/2456' already registered
WARNING: CPU: 1 PID: 394 at fs/proc/generic.c:376 proc_register+0x645/0x8f0 fs/proc/generic.c:375
Modules linked in:
CPU: 1 PID: 394 Comm: syz-executor403 Not tainted 6.10.0-rc7-g852e42cc2dd4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:proc_register+0x645/0x8f0 fs/proc/generic.c:375
Code: 00 00 00 00 00 48 85 ed 0f 85 97 02 00 00 4d 85 f6 0f 85 9f 02 00 00 48 c7 c7 9b cb cf 87 48 89 de 4c 89 fa e8 1c 6f eb fe 90 <0f> 0b 90 90 48 c7 c7 98 37 99 89 e8 cb 7e 22 05 bb 00 00 00 10 48
RSP: 0018:ffa0000000cd7c30 EFLAGS: 00010246
RAX: 9e129be1950f0200 RBX: ff1100011b51582c RCX: ff1100011857cd80
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002
RBP: 0000000000000000 R08: ffd400000000000f R09: ff1100013e78cac0
R10: ffac800000cd7980 R11: ff1100013e12b1f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ff1100011a99a2ec
FS: 00007fbd7086f740(0000) GS:ff1100013fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200071c0 CR3: 0000000118556004 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
proc_create_net_single+0x144/0x210 fs/proc/proc_net.c:220
bcm_connect+0x472/0x840 net/can/bcm.c:1673
__sys_connect_file net/socket.c:2049 [inline]
__sys_connect+0x5d2/0x690 net/socket.c:2066
__do_sys_connect net/socket.c:2076 [inline]
__se_sys_connect net/socket.c:2073 [inline]
__x64_sys_connect+0x8f/0x100 net/socket.c:2073
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd9/0x1c0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fbd708b0e5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007fff8cd33f08 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fbd708b0e5d
RDX: 0000000000000010 RSI: 0000000020000040 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000040 R09: 0000000000000040
R10: 0000000000000040 R11: 0000000000000246 R12: 00007fff8cd34098
R13: 0000000000401280 R14: 0000000000406de8 R15: 00007fbd70ab9000
</TASK>
remove_proc_entry: removing non-empty directory 'net/can-bcm', leaking at least '2456'
Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/all/20240722192842.37421-1-kuniyu@amazon.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
syzbot hit a use-after-free[1] which is caused because the bridge doesn't
make sure that all previous garbage has been collected when removing a
port. What happens is:
CPU 1 CPU 2
start gc cycle remove port
acquire gc lock first
wait for lock
call br_multicasg_gc() directly
acquire lock now but free port
the port can be freed
while grp timers still
running
Make sure all previous gc cycles have finished by using flush_work before
freeing the port.
[1]
BUG: KASAN: slab-use-after-free in br_multicast_port_group_expired+0x4c0/0x550 net/bridge/br_multicast.c:861
Read of size 8 at addr ffff888071d6d000 by task syz.5.1232/9699
CPU: 1 PID: 9699 Comm: syz.5.1232 Not tainted 6.10.0-rc5-syzkaller-00021-g24ca36a562d6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0xc3/0x620 mm/kasan/report.c:488
kasan_report+0xd9/0x110 mm/kasan/report.c:601
br_multicast_port_group_expired+0x4c0/0x550 net/bridge/br_multicast.c:861
call_timer_fn+0x1a3/0x610 kernel/time/timer.c:1792
expire_timers kernel/time/timer.c:1843 [inline]
__run_timers+0x74b/0xaf0 kernel/time/timer.c:2417
__run_timer_base kernel/time/timer.c:2428 [inline]
__run_timer_base kernel/time/timer.c:2421 [inline]
run_timer_base+0x111/0x190 kernel/time/timer.c:2437
Reported-by: syzbot+263426984509be19c9a0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=263426984509be19c9a0
Fixes: e12cec65b554 ("net: bridge: mcast: destroy all entries via gc")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20240802080730.3206303-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Following helpers do not touch their 'struct net' argument.
- udp6_lib_lookup()
- __udp6_lib_lookup()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240802134029.3748005-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Following helpers do not touch their struct net argument:
- bpf_sk_lookup_run_v6()
- __inet6_lookup_established()
- inet6_lookup_reuseport()
- inet6_lookup_listener()
- inet6_lookup_run_sk_lookup()
- __inet6_lookup()
- inet6_lookup()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240802134029.3748005-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Following helpers do not touch their 'struct net' argument.
- udp_sk_bound_dev_eq()
- udp4_lib_lookup()
- __udp4_lib_lookup()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240802134029.3748005-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Following helpers do not touch their struct net argument:
- bpf_sk_lookup_run_v4()
- inet_lookup_reuseport()
- inet_lhash2_lookup()
- inet_lookup_run_sk_lookup()
- __inet_lookup_listener()
- __inet_lookup_established()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240802134029.3748005-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
inet_sk_bound_dev_eq() and its callers do not modify the net structure.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240802134029.3748005-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
build_skb() and frag allocations done with GFP_ATOMIC will
fail in real life, when system is under memory pressure,
and there's nothing we can do about that. So no point
printing warnings.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The lifetime of TCP-AO static_key is the same as the last
tcp_ao_info. On the socket destruction tcp_ao_info ceases to be
with RCU grace period, while tcp-ao static branch is currently deferred
destructed. The static key definition is
: DEFINE_STATIC_KEY_DEFERRED_FALSE(tcp_ao_needed, HZ);
which means that if RCU grace period is delayed by more than a second
and tcp_ao_needed is in the process of disablement, other CPUs may
yet see tcp_ao_info which atent dead, but soon-to-be.
And that breaks the assumption of static_key_fast_inc_not_disabled().
See the comment near the definition:
> * The caller must make sure that the static key can't get disabled while
> * in this function. It doesn't patch jump labels, only adds a user to
> * an already enabled static key.
Originally it was introduced in commit eb8c507296f6 ("jump_label:
Prevent key->enabled int overflow"), which is needed for the atomic
contexts, one of which would be the creation of a full socket from a
request socket. In that atomic context, it's known by the presence
of the key (md5/ao) that the static branch is already enabled.
So, the ref counter for that static branch is just incremented
instead of holding the proper mutex.
static_key_fast_inc_not_disabled() is just a helper for such usage
case. But it must not be used if the static branch could get disabled
in parallel as it's not protected by jump_label_mutex and as a result,
races with jump_label_update() implementation details.
Happened on netdev test-bot[1], so not a theoretical issue:
[] jump_label: Fatal kernel bug, unexpected op at tcp_inbound_hash+0x1a7/0x870 [ffffffffa8c4e9b7] (eb 50 0f 1f 44 != 66 90 0f 1f 00)) size:2 type:1
[] ------------[ cut here ]------------
[] kernel BUG at arch/x86/kernel/jump_label.c:73!
[] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[] CPU: 3 PID: 243 Comm: kworker/3:3 Not tainted 6.10.0-virtme #1
[] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[] Workqueue: events jump_label_update_timeout
[] RIP: 0010:__jump_label_patch+0x2f6/0x350
...
[] Call Trace:
[] <TASK>
[] arch_jump_label_transform_queue+0x6c/0x110
[] __jump_label_update+0xef/0x350
[] __static_key_slow_dec_cpuslocked.part.0+0x3c/0x60
[] jump_label_update_timeout+0x2c/0x40
[] process_one_work+0xe3b/0x1670
[] worker_thread+0x587/0xce0
[] kthread+0x28a/0x350
[] ret_from_fork+0x31/0x70
[] ret_from_fork_asm+0x1a/0x30
[] </TASK>
[] Modules linked in: veth
[] ---[ end trace 0000000000000000 ]---
[] RIP: 0010:__jump_label_patch+0x2f6/0x350
[1]: https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/696681/5-connect-deny-ipv6/stderr
Cc: stable@kernel.org
Fixes: 67fa83f7c86a ("net/tcp: Add static_key for TCP-AO")
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 7c3f1875c66f ("net: move somaxconn init from sysctl code")
introduced net_defaults_ops to make sure that net.core sysctl knobs
are always initialised even if CONFIG_SYSCTL is disabled.
Such operations better fit preinit_net() added for a similar purpose
by commit 6e77a5a4af05 ("net: initialize net->notrefcnt_tracker earlier").
Let's initialise the sysctl defaults in preinit_net().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Most initialisations in setup_net() do not require pernet_ops_rwsem
and can be moved to preinit_net().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When initialising the root netns, we call preinit_net() under
pernet_ops_rwsem.
However, the operations in preinit_net() do not require pernet_ops_rwsem.
Also, we don't hold it for preinit_net() when initialising non-root netns.
To be consistent, let's call preinit_net() without pernet_ops_rwsem in
net_ns_init().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When initialising the root netns, we set net->passive in setup_net().
However, we do it twice for non-root netns in copy_net_ns() and
setup_net().
This is because we could bypass setup_net() in copy_net_ns() if
down_read_killable() fails.
preinit_net() is a better place to put such an operation.
Let's initialise net->passive in preinit_net().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can allocate per-netns memory for struct pernet_operations by specifying
id and size.
register_pernet_operations() assigns an id to pernet_operations and later
ops_init() allocates the specified size of memory as net->gen->ptr[id].
If id is missing, no memory is allocated. If size is not specified,
pernet_operations just wastes an entry of net->gen->ptr[] for every netns.
net_generic is available only when both id and size are specified, so let's
ensure that.
While we are at it, we add const to both fields.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and
ppp parts") converted net->gen->ptr[pppol2tp_net_id] in l2tp_ppp.c to
net->gen->ptr[l2tp_net_id] in l2tp_core.c.
Now the leftover wastes one entry of net->gen->ptr[] in each netns.
Let's avoid the unwanted allocation.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|