Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull eprobes fix from Masami Hiramatsu:
- release eprobe when failing to add dyn_event.
This unregisters event call and release eprobe when it fails to add a
dynamic event. Found in cleaning up.
* tag 'probes-fixes-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/eprobe: Fix to release eprobe when failed to add dyn_event
|
|
After the blamed commit below, udp_rehash() is supposed to be called
with both local and remote addresses set.
Currently that is already the case for IPv6 sockets, but for IPv4 the
destination address is updated after rehashing.
Address the issue moving the destination address and port initialization
before rehashing.
Fixes: 1b29a730ef8b ("ipv6/udp: Add 4-tuple hash for connected socket")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/4761e466ab9f7542c68cdc95f248987d127044d2.1733499715.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Koichiro Den says:
====================
virtio_net: correct netdev_tx_reset_queue() invocation points
When virtnet_close is followed by virtnet_open, some TX completions can
possibly remain unconsumed, until they are finally processed during the
first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
[1]. Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
before RX napi enable") was not sufficient to eliminate all BQL crash
scenarios for virtio-net.
This issue can be reproduced with the latest net-next master by running:
`while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
TX load from inside the machine.
This patch series resolves the issue and also addresses similar existing
problems:
(a). Drop netdev_tx_reset_queue() from open/close path. This eliminates the
BQL crashes due to the problematic open/close path.
(b). As a result of (a), netdev_tx_reset_queue() is now explicitly required
in freeze/restore path. Add netdev_tx_reset_queue() immediately after
free_unused_bufs() invocation.
(c). Fix missing resetting in virtnet_tx_resize().
virtnet_tx_resize() has lacked proper resetting since commit
c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits").
(d). Fix missing resetting in the XDP_SETUP_XSK_POOL path.
Similar to (c), this path lacked proper resetting. Call
netdev_tx_reset_queue() when virtqueue_reset() has actually recycled
unused buffers.
This patch series consists of six commits:
[1/6]: Resolves (a) and (b). # also -stable 6.11.y
[2/6]: Minor fix to make [4/6] streamlined.
[3/6]: Prerequisite for (c). # also -stable 6.11.y
[4/6]: Resolves (c) (incl. Prerequisite for (d)) # also -stable 6.11.y
[5/6]: Preresuisite for (d).
[6/6]: Resolves (d).
Changes for v4:
- move netdev_tx_reset_queue() out of free_unused_bufs()
- submit to net, not net-next
Changes for v3:
- replace 'flushed' argument with 'recycle_done'
Changes for v2:
- add tx queue resetting for (b) to (d) above
v3: https://lore.kernel.org/all/20241204050724.307544-1-koichiro.den@canonical.com/
v2: https://lore.kernel.org/all/20241203073025.67065-1-koichiro.den@canonical.com/
v1: https://lore.kernel.org/all/20241130181744.3772632-1-koichiro.den@canonical.com/
[1]:
------------[ cut here ]------------
kernel BUG at lib/dynamic_queue_limits.c:99!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:dql_completed+0x26b/0x290
Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
<IRQ>
? die+0x32/0x80
? do_trap+0xd9/0x100
? dql_completed+0x26b/0x290
? dql_completed+0x26b/0x290
? do_error_trap+0x6d/0xb0
? dql_completed+0x26b/0x290
? exc_invalid_op+0x4c/0x60
? dql_completed+0x26b/0x290
? asm_exc_invalid_op+0x16/0x20
? dql_completed+0x26b/0x290
__free_old_xmit+0xff/0x170 [virtio_net]
free_old_xmit+0x54/0xc0 [virtio_net]
virtnet_poll+0xf4/0xe30 [virtio_net]
? __update_load_avg_cfs_rq+0x264/0x2d0
? update_curr+0x35/0x260
? reweight_entity+0x1be/0x260
__napi_poll.constprop.0+0x28/0x1c0
net_rx_action+0x329/0x420
? enqueue_hrtimer+0x35/0x90
? trace_hardirqs_on+0x1d/0x80
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? sched_clock_cpu+0xd/0x1a0
handle_softirqs+0x138/0x3e0
do_softirq.part.0+0x89/0xc0
</IRQ>
<TASK>
__local_bh_enable_ip+0xa7/0xb0
virtnet_open+0xc8/0x310 [virtio_net]
__dev_open+0xfa/0x1b0
__dev_change_flags+0x1de/0x250
dev_change_flags+0x22/0x60
do_setlink.isra.0+0x2df/0x10b0
? rtnetlink_rcv_msg+0x34f/0x3f0
? netlink_rcv_skb+0x54/0x100
? netlink_unicast+0x23e/0x390
? netlink_sendmsg+0x21e/0x490
? ____sys_sendmsg+0x31b/0x350
? avc_has_perm_noaudit+0x67/0xf0
? cred_has_capability.isra.0+0x75/0x110
? __nla_validate_parse+0x5f/0xee0
? __pfx___probestub_irq_enable+0x3/0x10
? __create_object+0x5e/0x90
? security_capable+0x3b/0x7[I0
rtnl_newlink+0x784/0xaf0
? avc_has_perm_noaudit+0x67/0xf0
? cred_has_capability.isra.0+0x75/0x110
? stack_depot_save_flags+0x24/0x6d0
? __pfx_rtnl_newlink+0x10/0x10
rtnetlink_rcv_msg+0x34f/0x3f0
? do_syscall_64+0x6c/0x180
? entry_SYSCALL_64_after_hwframe+0x76/0x7e
? __pfx_rtnetlink_rcv_msg+0x10/0x10
netlink_rcv_skb+0x54/0x100
netlink_unicast+0x23e/0x390
netlink_sendmsg+0x21e/0x490
____sys_sendmsg+0x31b/0x350
? copy_msghdr_from_user+0x6d/0xa0
___sys_sendmsg+0x86/0xd0
? __pte_offset_map+0x17/0x160
? preempt_count_add+0x69/0xa0
? __call_rcu_common.constprop.0+0x147/0x610
? preempt_count_add+0x69/0xa0
? preempt_count_add+0x69/0xa0
? _raw_spin_trylock+0x13/0x60
? trace_hardirqs_on+0x1d/0x80
__sys_sendmsg+0x66/0xc0
do_syscall_64+0x6c/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f41defe5b34
Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
</TASK>
[...]
====================
Link: https://patch.msgid.link/20241206011047.923923-1-koichiro.den@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
virtnet_sq_bind_xsk_pool() flushes tx skbs and then resets tx queue, so
DQL counters need to be reset when flushing has actually occurred, Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_resize()
to handle this.
Fixes: 21a4e3ce6dc7 ("virtio_net: xsk: bind/unbind xsk for tx")
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When virtqueue_reset() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when it really occurs.
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
virtnet_tx_resize() flushes remaining tx skbs, requiring DQL counters to
be reset when flushing has actually occurred. Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_reset() to
handle this.
Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When virtqueue_resize() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when the recycle really occurs.
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
While not harmful, using vq2rxq where it's always sq appears odd.
Replace it with the more appropriate vq2txq for clarity and correctness.
Fixes: 89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer")
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When virtnet_close is followed by virtnet_open, some TX completions can
possibly remain unconsumed, until they are finally processed during the
first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
[1]. Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
before RX napi enable") was not sufficient to eliminate all BQL crash
cases for virtio-net.
This issue can be reproduced with the latest net-next master by running:
`while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
TX load from inside the machine.
netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
the device is not stopped in any case. For BQL core part, it's just like
traffic nearly ceases to exist for some period. For stall detector added
to BQL, even if virtnet_close could somehow lead to some TX completions
delayed for long, followed by virtnet_open, we can just take it as stall
as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
based on BQL"). Note also that users can still reset stall_max via sysfs.
So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
eliminates the BQL crashes. As a result, netdev_tx_reset_queue() is now
explicitly required in freeze/restore path. This patch adds it to
immediately after free_unused_bufs(), following the rule of thumb:
netdev_tx_reset_queue() should follow any SKB freeing not followed by
netdev_tx_completed_queue(). This seems the most consistent and
streamlined approach, and now netdev_tx_reset_queue() runs whenever
free_unused_bufs() is done.
[1]:
------------[ cut here ]------------
kernel BUG at lib/dynamic_queue_limits.c:99!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:dql_completed+0x26b/0x290
Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
<IRQ>
? die+0x32/0x80
? do_trap+0xd9/0x100
? dql_completed+0x26b/0x290
? dql_completed+0x26b/0x290
? do_error_trap+0x6d/0xb0
? dql_completed+0x26b/0x290
? exc_invalid_op+0x4c/0x60
? dql_completed+0x26b/0x290
? asm_exc_invalid_op+0x16/0x20
? dql_completed+0x26b/0x290
__free_old_xmit+0xff/0x170 [virtio_net]
free_old_xmit+0x54/0xc0 [virtio_net]
virtnet_poll+0xf4/0xe30 [virtio_net]
? __update_load_avg_cfs_rq+0x264/0x2d0
? update_curr+0x35/0x260
? reweight_entity+0x1be/0x260
__napi_poll.constprop.0+0x28/0x1c0
net_rx_action+0x329/0x420
? enqueue_hrtimer+0x35/0x90
? trace_hardirqs_on+0x1d/0x80
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? sched_clock_cpu+0xd/0x1a0
handle_softirqs+0x138/0x3e0
do_softirq.part.0+0x89/0xc0
</IRQ>
<TASK>
__local_bh_enable_ip+0xa7/0xb0
virtnet_open+0xc8/0x310 [virtio_net]
__dev_open+0xfa/0x1b0
__dev_change_flags+0x1de/0x250
dev_change_flags+0x22/0x60
do_setlink.isra.0+0x2df/0x10b0
? rtnetlink_rcv_msg+0x34f/0x3f0
? netlink_rcv_skb+0x54/0x100
? netlink_unicast+0x23e/0x390
? netlink_sendmsg+0x21e/0x490
? ____sys_sendmsg+0x31b/0x350
? avc_has_perm_noaudit+0x67/0xf0
? cred_has_capability.isra.0+0x75/0x110
? __nla_validate_parse+0x5f/0xee0
? __pfx___probestub_irq_enable+0x3/0x10
? __create_object+0x5e/0x90
? security_capable+0x3b/0x70
rtnl_newlink+0x784/0xaf0
? avc_has_perm_noaudit+0x67/0xf0
? cred_has_capability.isra.0+0x75/0x110
? stack_depot_save_flags+0x24/0x6d0
? __pfx_rtnl_newlink+0x10/0x10
rtnetlink_rcv_msg+0x34f/0x3f0
? do_syscall_64+0x6c/0x180
? entry_SYSCALL_64_after_hwframe+0x76/0x7e
? __pfx_rtnetlink_rcv_msg+0x10/0x10
netlink_rcv_skb+0x54/0x100
netlink_unicast+0x23e/0x390
netlink_sendmsg+0x21e/0x490
____sys_sendmsg+0x31b/0x350
? copy_msghdr_from_user+0x6d/0xa0
___sys_sendmsg+0x86/0xd0
? __pte_offset_map+0x17/0x160
? preempt_count_add+0x69/0xa0
? __call_rcu_common.constprop.0+0x147/0x610
? preempt_count_add+0x69/0xa0
? preempt_count_add+0x69/0xa0
? _raw_spin_trylock+0x13/0x60
? trace_hardirqs_on+0x1d/0x80
__sys_sendmsg+0x66/0xc0
do_syscall_64+0x6c/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f41defe5b34
Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
</TASK>
[...]
Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
[ pabeni: trimmed possibly troublesome separator ]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Due to target variable is being reassigned in npc_install_flow()
function, PF multicast rules are not getting installed.
This patch addresses the issue by fixing the "IF" condition
checks when rules are installed by AF.
Fixes: 6c40ca957fe5 ("octeontx2-pf: Adds TC offload support").
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241205113435.10601-1-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Stefan Wahren says:
====================
qca_spi: Fix SPI specific issues
This small series address two annoying SPI specific issues of
the qca_spi driver.
====================
Link: https://patch.msgid.link/20241206184643.123399-1-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The module parameter qcaspi_pluggable controls if QCA7000 signature
should be checked at driver probe (current default) or not. Unfortunately
this could fail in case the chip is temporary in reset, which isn't under
total control by the Linux host. So disable this check per default
in order to avoid unexpected probe failures.
Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://patch.msgid.link/20241206184643.123399-3-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Storing the maximum clock speed in module parameter qcaspi_clkspeed
has the unintended side effect that the first probed instance
defines the value for all other instances. Fix this issue by storing
it in max_speed_hz of the relevant SPI device.
This fix keeps the priority of the speed parameter (module parameter,
device tree property, driver default). Btw this uses the opportunity
to get the rid of the unused member clkspeed.
Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://patch.msgid.link/20241206184643.123399-2-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
t4_set_vf_mac_acl() uses pf to set mac addr, but t4vf_get_vf_mac_acl()
uses port number to get mac addr, this leads to error when an attempt
to set MAC address on VF's of PF2 and PF3.
This patch fixes the issue by using port number to set mac address.
Fixes: e0cdac65ba26 ("cxgb4vf: configure ports accessible by the VF")
Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241206062014.49414-1-anumula@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Borislav Petkov:
- Remove if_not_guard() as it is generating incorrect code
- Fix the initialization of the fake lockdep_map for the first locked
ww_mutex
* tag 'locking_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
headers/cleanup.h: Remove the if_not_guard() facility
locking/ww_mutex: Fix ww_mutex dummy lockdep map selftest warnings
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fixes from Borislav Petkov:
- Make sure the PEBS buffer is drained before reconfiguring the
hardware
- Add Arrow Lake U support
* tag 'perf_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/ds: Unconditionally drain PEBS DS when changing PEBS_DATA_CFG
perf/x86/intel: Add Arrow Lake U support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Remove wrong enqueueing of a task for a later wakeup when a task
blocks on a RT mutex
- Do not setup a new deadline entity on a boosted task as that has
happened already
- Update preempt= kernel command line param
- Prevent needless softirqd wakeups in the idle task's context
- Detect the case where the idle load balancer CPU becomes busy and
avoid unnecessary load balancing invocation
- Remove an unnecessary load balancing need_resched() call in
nohz_csd_func()
- Allow for raising of SCHED_SOFTIRQ softirq type on RT but retain the
warning to catch any other cases
- Remove a wrong warning when a cpuset update makes the task affinity
no longer a subset of the cpuset
* tag 'sched_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking: rtmutex: Fix wake_q logic in task_blocks_on_rt_mutex
sched/deadline: Fix warning in migrate_enable for boosted tasks
sched/core: Update kernel boot parameters for LAZY preempt.
sched/core: Prevent wakeup of ksoftirqd during idle load balance
sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy
sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()
softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel
sched: fix warning in sched_setaffinity
sched/deadline: Fix replenish_dl_new_period dl_server condition
|
|
Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error:
ld: vmlinux.o: in function `virtual_mapped':
linux/arch/x86/kernel/relocate_kernel_64.S:249:(.text+0x5915b): undefined reference to `saved_context_gdt_desc'
when CONFIG_KEXEC_JUMP is enabled.
This was introduced by commit 07fa619f2a40 ("x86/kexec: Restore GDT on
return from ::preserve_context kexec") which introduced a use of
saved_context_gdt_desc without a declaration for it.
Fix that by including asm/asm-offsets.h where saved_context_gdt_desc
is defined (indirectly in include/generated/asm-offsets.h which
asm/asm-offsets.h includes).
Fixes: 07fa619f2a40 ("x86/kexec: Restore GDT on return from ::preserve_context kexec")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Closes: https://lore.kernel.org/oe-kbuild-all/202411270006.ZyyzpYf8-lkp@intel.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The powerpc user access code is special, and unlike other architectures
distinguishes between user access for reading and writing.
And commit 43a43faf5376 ("futex: improve user space accesses") messed
that up. It went undetected elsewhere, but caused ppc32 to fail early
during boot, because the user access had been started with
user_read_access_begin(), but then finished off with just a plain
"user_access_end()".
Note that the address-masking user access helpers don't even have that
read-vs-write distinction, so if powerpc ever wants to do address
masking tricks, we'll have to do some extra work for it.
[ Make sure to also do it for the EFAULT case, as pointed out by
Christophe Leroy ]
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/all/87bjxl6b0i.fsf@igel.home/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Daniel Machon says:
====================
net: sparx5: misc fixes for sparx5 and lan969x
This series fixes various issues in the Sparx5 and lan969x drivers. Most
of the fixes are for new issues introduced by the recent series adding
lan969x switch support in the Sparx5 driver.
Most notable is patch 1/5 that moves the lan969x dir into the sparx5
dir, in order to address a cyclic dependency issue reported by depmod,
when installing modules. Details are in the commit descriptions.
To: Andrew Lunn <andrew+netdev@lunn.ch>
To: David S. Miller <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Lars Povlsen <lars.povlsen@microchip.com>
To: Steen Hegelund <Steen.Hegelund@microchip.com>
To: UNGLinuxDriver@microchip.com
To: Richard Cochran <richardcochran@gmail.com>
To: Bjarni Jonasson <bjarni.jonasson@microchip.com>
To: jensemil.schulzostergaard@microchip.com
To: horatiu.vultur@microchip.com
To: arnd@arndb.de
To: jacob.e.keller@intel.com
To: Parthiban.Veerasooran@microchip.com
Cc: Calvin Owens <calvin@wbinvd.org>
Cc: Muhammad Usama Anjum <Usama.Anjum@collabora.com>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
====================
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On port initialization, we configure the maximum frame length accepted
by the receive module associated with the port. This value is currently
written to the MAX_LEN field of the DEV10G_MAC_ENA_CFG register, when in
fact, it should be written to the DEV10G_MAC_MAXLEN_CFG register. Fix
this.
Fixes: 946e7fd5053a ("net: sparx5: add port module support")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When doing port mirroring, the physical port to send the frame to, is
written to the FRMC_PORT_VAL field of the QFWD_FRAME_COPY_CFG register.
This field is 7 bits wide on sparx5 and 6 bits wide on lan969x, and has
a default value of 65 and 30, respectively (the number of front ports).
On mirror deletion, we set the default value of the monitor port to
65 for this field, in case no more ports exists for the mirror. Needless
to say, this will not fit the 6 bits on lan969x.
Fix this by correctly using the n_ports constant instead.
Fixes: 3f9e46347a46 ("net: sparx5: use SPX5_CONST for constants which already have a symbol")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The FDMA handler is responsible for scheduling a NAPI poll, which will
eventually fetch RX packets from the FDMA queue. Currently, the FDMA
handler is run in a threaded context. For some reason, this kills
performance. Admittedly, I did not do a thorough investigation to see
exactly what causes the issue, however, I noticed that in the other
driver utilizing the same FDMA engine, we run the FDMA handler in hard
IRQ context.
Fix this performance issue, by running the FDMA handler in hard IRQ
context, not deferring any work to a thread.
Prior to this change, the RX UDP performance was:
Interval Transfer Bitrate Jitter
0.00-10.20 sec 44.6 MBytes 36.7 Mbits/sec 0.027 ms
After this change, the rx UDP performance is:
Interval Transfer Bitrate Jitter
0.00-9.12 sec 1.01 GBytes 953 Mbits/sec 0.020 ms
Fixes: 10615907e9b5 ("net: sparx5: switchdev: adding frame DMA functionality")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are mixing the use of spin_lock() and spin_lock_irqsave() functions
in the PTP handler of lan969x. Fix this by correctly using the _irqsave
variants.
Fixes: 24fe83541755 ("net: lan969x: add PTP handler function")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
[1]: https://lore.kernel.org/netdev/20241024-sparx5-lan969x-switch-driver-2-v2-10-a0b5fae88a0f@microchip.com/
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Depmod reports a cyclic dependency between modules sparx5-switch.ko and
lan969x-switch.ko:
depmod: ERROR: Cycle detected: lan969x_switch -> sparx5_switch -> lan969x_switch
depmod: ERROR: Found 2 modules in dependency cycles!
make[2]: *** [scripts/Makefile.modinst:132: depmod] Error 1
make: *** [Makefile:224: __sub-make] Error 2
This makes sense, as they both require symbols from each other.
Fix this by compiling lan969x support into the sparx5-switch.ko module.
In order to do this, in a sensible way, we move the lan969x/ dir into
the sparx5/ dir and do some code cleanup of code that is no longer
required.
After this patch, depmod will no longer complain, as lan969x support is
compiled into the sparx5-swicth.ko module, and can no longer be compiled
as a standalone module.
Fixes: 98a01119608d ("net: sparx5: add compatible string for lan969x")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix a section mismatch warning in modpost
- Fix Debian package build error with the O= option
* tag 'kbuild-fixes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: deb-pkg: fix build error with O=
modpost: Add .irqentry.text to OTHER_SECTIONS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix a /proc/interrupts formatting regression
- Have the BCM2836 interrupt controller enter power management states
properly
- Other fixlets
* tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/stm32mp-exti: CONFIG_STM32MP_EXTI should not default to y when compile-testing
genirq/proc: Add missing space separator back
irqchip/bcm2836: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
irqchip/gic-v3: Fix irq_complete_ack() comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Borislav Petkov:
- Handle the case where clocksources with small counter width can,
in conjunction with overly long idle sleeps, falsely trigger the
negative motion detection of clocksources
* tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Make negative motion detection more robust
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Have the Automatic IBRS setting check on AMD does not falsely fire in
the guest when it has been set already on the host
- Make sure cacheinfo structures memory is allocated to address a boot
NULL ptr dereference on Intel Meteor Lake which has different numbers
of subleafs in its CPUID(4) leaf
- Take care of the GDT restoring on the kexec path too, as expected by
the kernel
- Make sure SMP is not disabled when IO-APIC is disabled on the kernel
cmdline
- Add a PGD flag _PAGE_NOPTISHADOW to instruct machinery not to
propagate changes to the kernelmode page tables, to the user portion,
in PTI
- Mark Intel Lunar Lake as affected by an issue where MONITOR wakeups
can get lost and thus user-visible delays happen
- Make sure PKRU is properly restored with XRSTOR on AMD after a PRKU
write of 0 (WRPKRU) which will mark PKRU in its init state and thus
lose the actual buffer
* tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails
x86/cacheinfo: Delete global num_cache_leaves
cacheinfo: Allocate memory during CPU hotplug if not done from the primary CPU
x86/kexec: Restore GDT on return from ::preserve_context kexec
x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC
x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables
x86/cpu: Add Lunar Lake to list of CPUs with a broken MONITOR implementation
x86/pkeys: Ensure updated PKRU value is XRSTOR'd
x86/pkeys: Change caller of update_pkru_in_sigframe()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"24 hotfixes. 17 are cc:stable. 15 are MM and 9 are non-MM.
The usual bunch of singletons - please see the relevant changelogs for
details"
* tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits)
iio: magnetometer: yas530: use signed integer type for clamp limits
sched/numa: fix memory leak due to the overwritten vma->numab_state
mm/damon: fix order of arguments in damos_before_apply tracepoint
lib: stackinit: hide never-taken branch from compiler
mm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio()
scatterlist: fix incorrect func name in kernel-doc
mm: correct typo in MMAP_STATE() macro
mm: respect mmap hint address when aligning for THP
mm: memcg: declare do_memsw_account inline
mm/codetag: swap tags when migrate pages
ocfs2: update seq_file index in ocfs2_dlm_seq_next
stackdepot: fix stack_depot_save_flags() in NMI context
mm: open-code page_folio() in dump_page()
mm: open-code PageTail in folio_flags() and const_folio_flags()
mm: fix vrealloc()'s KASAN poisoning logic
Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()"
selftests/damon: add _damon_sysfs.py to TEST_FILES
selftest: hugetlb_dio: fix test naming
ocfs2: free inode when ocfs2_get_init_inode() fails
nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()
...
|
|
Fix eprobe event to unregister event call and release eprobe when it fails
to add dynamic event correctly.
Link: https://lore.kernel.org/all/173289886698.73724.1959899350183686006.stgit@devnote2/
Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Since commit 13b25489b6f8 ("kbuild: change working directory to external
module directory with M="), the Debian package build fails if a relative
path is specified with the O= option.
$ make O=build bindeb-pkg
[ snip ]
dpkg-deb: building package 'linux-image-6.13.0-rc1' in '../linux-image-6.13.0-rc1_6.13.0-rc1-6_amd64.deb'.
Rebuilding host programs with x86_64-linux-gnu-gcc...
make[6]: Entering directory '/home/masahiro/linux/build'
/home/masahiro/linux/Makefile:190: *** specified kernel directory "build" does not exist. Stop.
This occurs because the sub_make_done flag is cleared, even though the
working directory is already in the output directory.
Passing KBUILD_OUTPUT=. resolves the issue.
Fixes: 13b25489b6f8 ("kbuild: change working directory to external module directory with M=")
Reported-by: Charlie Jenkins <charlie@rivosinc.com>
Closes: https://lore.kernel.org/all/Z1DnP-GJcfseyrM3@ghost/
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The compiler can fully inline the actual handler function of an interrupt
entry into the .irqentry.text entry point. If such a function contains an
access which has an exception table entry, modpost complains about a
section mismatch:
WARNING: vmlinux.o(__ex_table+0x447c): Section mismatch in reference ...
The relocation at __ex_table+0x447c references section ".irqentry.text"
which is not in the list of authorized sections.
Add .irqentry.text to OTHER_SECTIONS to cure the issue.
Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # needed for linux-5.4-y
Link: https://lore.kernel.org/all/20241128111844.GE10431@google.com/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
If rtnl_get_peer_net() fails, then propagate the error code. Don't
return success.
Fixes: 48327566769a ("rtnetlink: fix double call of rtnl_link_get_net_ifla()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/a2d20cd4-387a-4475-887c-bb7d0e88e25a@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vladimir Oltean says:
====================
Ocelot PTP fixes
This is another attempt at "net: mscc: ocelot: be resilient to loss of
PTP packets during transmission".
https://lore.kernel.org/netdev/20241203164755.16115-1-vladimir.oltean@nxp.com/
The central change is in patch 4/5. It recovers a port from a very
reproducible condition where previously, it would become unable to take
any hardware TX timestamp.
Then we have patches 1/5 and 5/5 which are extra bug fixes.
Patches 2/5 and 3/5 are logical changes split out of the monolithic
patch previously submitted as v1, so that patch 4/5 is hopefully easier
to follow.
====================
Link: https://patch.msgid.link/20241205145519.1236778-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
An unsupported RX filter will leave the port with TX timestamping still
applied as per the new request, rather than the old setting. When
parsing the tx_type, don't apply it just yet, but delay that until after
we've parsed the rx_filter as well (and potentially returned -ERANGE for
that).
Similarly, copy_to_user() may fail, which is a rare occurrence, but
should still be treated by unwinding what was done.
Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241205145519.1236778-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Felix DSA driver presents unique challenges that make the simplistic
ocelot PTP TX timestamping procedure unreliable: any transmitted packet
may be lost in hardware before it ever leaves our local system.
This may happen because there is congestion on the DSA conduit, the
switch CPU port or even user port (Qdiscs like taprio may delay packets
indefinitely by design).
The technical problem is that the kernel, i.e. ocelot_port_add_txtstamp_skb(),
runs out of timestamp IDs eventually, because it never detects that
packets are lost, and keeps the IDs of the lost packets on hold
indefinitely. The manifestation of the issue once the entire timestamp
ID range becomes busy looks like this in dmesg:
mscc_felix 0000:00:00.5: port 0 delivering skb without TX timestamp
mscc_felix 0000:00:00.5: port 1 delivering skb without TX timestamp
At the surface level, we need a timeout timer so that the kernel knows a
timestamp ID is available again. But there is a deeper problem with the
implementation, which is the monotonically increasing ocelot_port->ts_id.
In the presence of packet loss, it will be impossible to detect that and
reuse one of the holes created in the range of free timestamp IDs.
What we actually need is a bitmap of 63 timestamp IDs tracking which one
is available. That is able to use up holes caused by packet loss, but
also gives us a unique opportunity to not implement an actual timer_list
for the timeout timer (very complicated in terms of locking).
We could only declare a timestamp ID stale on demand (lazily), aka when
there's no other timestamp ID available. There are pros and cons to this
approach: the implementation is much more simple than per-packet timers
would be, but most of the stale packets would be quasi-leaked - not
really leaked, but blocked in driver memory, since this algorithm sees
no reason to free them.
An improved technique would be to check for stale timestamp IDs every
time we allocate a new one. Assuming a constant flux of PTP packets,
this avoids stale packets being blocked in memory, but of course,
packets lost at the end of the flux are still blocked until the flux
resumes (nobody left to kick them out).
Since implementing per-packet timers is way too complicated, this should
be good enough.
Testing procedure:
Persistently block traffic class 5 and try to run PTP on it:
$ tc qdisc replace dev swp3 parent root taprio num_tc 8 \
map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 sched-entry S 0xdf 100000 flags 0x2
[ 126.948141] mscc_felix 0000:00:00.5: port 3 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
$ ptp4l -i swp3 -2 -P -m --socket_priority 5 --fault_reset_interval ASAP --logSyncInterval -3
ptp4l[70.351]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[70.354]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[70.358]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE
[ 70.394583] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[70.406]: timed out while polling for tx timestamp
ptp4l[70.406]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[70.406]: port 1 (swp3): send peer delay response failed
ptp4l[70.407]: port 1 (swp3): clearing fault immediately
ptp4l[70.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 71.394858] mscc_felix 0000:00:00.5: port 3 timestamp id 1
ptp4l[71.400]: timed out while polling for tx timestamp
ptp4l[71.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[71.401]: port 1 (swp3): send peer delay response failed
ptp4l[71.401]: port 1 (swp3): clearing fault immediately
[ 72.393616] mscc_felix 0000:00:00.5: port 3 timestamp id 2
ptp4l[72.401]: timed out while polling for tx timestamp
ptp4l[72.402]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[72.402]: port 1 (swp3): send peer delay response failed
ptp4l[72.402]: port 1 (swp3): clearing fault immediately
ptp4l[72.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 73.395291] mscc_felix 0000:00:00.5: port 3 timestamp id 3
ptp4l[73.400]: timed out while polling for tx timestamp
ptp4l[73.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[73.400]: port 1 (swp3): send peer delay response failed
ptp4l[73.400]: port 1 (swp3): clearing fault immediately
[ 74.394282] mscc_felix 0000:00:00.5: port 3 timestamp id 4
ptp4l[74.400]: timed out while polling for tx timestamp
ptp4l[74.401]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[74.401]: port 1 (swp3): send peer delay response failed
ptp4l[74.401]: port 1 (swp3): clearing fault immediately
ptp4l[74.953]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 75.396830] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost
[ 75.405760] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[75.410]: timed out while polling for tx timestamp
ptp4l[75.411]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[75.411]: port 1 (swp3): send peer delay response failed
ptp4l[75.411]: port 1 (swp3): clearing fault immediately
(...)
Remove the blocking condition and see that the port recovers:
$ same tc command as above, but use "sched-entry S 0xff" instead
$ same ptp4l command as above
ptp4l[99.489]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[99.490]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[99.492]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE
[ 100.403768] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost
[ 100.412545] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 1 which seems lost
[ 100.421283] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 2 which seems lost
[ 100.430015] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 3 which seems lost
[ 100.438744] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 4 which seems lost
[ 100.447470] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 100.505919] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[100.963]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 101.405077] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 101.507953] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 102.405405] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 102.509391] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 103.406003] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 103.510011] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 104.405601] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 104.510624] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[104.965]: selected best master clock d858d7.fffe.00ca6d
ptp4l[104.966]: port 1 (swp3): assuming the grand master role
ptp4l[104.967]: port 1 (swp3): LISTENING to GRAND_MASTER on RS_GRAND_MASTER
[ 105.106201] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.232420] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.359001] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.405500] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.485356] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.511220] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.610938] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.737237] mscc_felix 0000:00:00.5: port 3 timestamp id 0
(...)
Notice that in this new usage pattern, a non-congested port should
basically use timestamp ID 0 all the time, progressing to higher numbers
only if there are unacknowledged timestamps in flight. Compare this to
the old usage, where the timestamp ID used to monotonically increase
modulo OCELOT_MAX_PTP_ID.
In terms of implementation, this simplifies the bookkeeping of the
ocelot_port :: ts_id and ptp_skbs_in_flight. Since we need to traverse
the list of two-step timestampable skbs for each new packet anyway, the
information can already be computed and does not need to be stored.
Also, ocelot_port->tx_skbs is always accessed under the switch-wide
ocelot->ts_id_lock IRQ-unsafe spinlock, so we don't need the skb queue's
lock and can use the unlocked primitives safely.
This problem was actually detected using the tc-taprio offload, and is
causing trouble in TSN scenarios, which Felix (NXP LS1028A / VSC9959)
supports but Ocelot (VSC7514) does not. Thus, I've selected the commit
to blame as the one adding initial timestamping support for the Felix
switch.
Fixes: c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241205145519.1236778-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ocelot_get_txtstamp() is a threaded IRQ handler, requested explicitly as
such by both ocelot_ptp_rdy_irq_handler() and vsc9959_irq_handler().
As such, it runs with IRQs enabled, and not in hardirq context. Thus,
ocelot_port_add_txtstamp_skb() has no reason to turn off IRQs, it cannot
be preempted by ocelot_get_txtstamp(). For the same reason,
dev_kfree_skb_any_reason() will always evaluate as kfree_skb_reason() in
this calling context, so just simplify the dev_kfree_skb_any() call to
kfree_skb().
Also, ocelot_port_txtstamp_request() runs from NET_TX softirq context,
not with hardirqs enabled. Thus, ocelot_get_txtstamp() which shares the
ocelot_port->tx_skbs.lock lock with it, has no reason to disable hardirqs.
This is part of a larger rework of the TX timestamping procedure.
A logical subportion of the rework has been split into a separate
change.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241205145519.1236778-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This condition, theoretically impossible to trigger, is not really
handled well. By "continuing", we are skipping the write to SYS_PTP_NXT
which advances the timestamp FIFO to the next entry. So we are reading
the same FIFO entry all over again, printing stack traces and eventually
killing the kernel.
No real problem has been observed here. This is part of a larger rework
of the timestamp IRQ procedure, with this logical change split out into
a patch of its own. We will need to "goto next_ts" for other conditions
as well.
Fixes: 9fde506e0c53 ("net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241205145519.1236778-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If ocelot_port_add_txtstamp_skb() fails, for example due to a full PTP
timestamp FIFO, we must undo the skb_clone_sk() call with kfree_skb().
Otherwise, the reference to the skb clone is lost.
Fixes: 52849bcf0029 ("net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241205145519.1236778-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzkaller reported a warning in __sk_skb_reason_drop().
Commit 61b95c70f344 ("net: ip: make ip_route_input_rcu() return
drop reasons") missed a path where -EINVAL is returned.
Then, the cited commit started to trigger the warning with the
invalid error.
Let's fix it by returning SKB_DROP_REASON_NOT_SPECIFIED.
[0]:
WARNING: CPU: 0 PID: 10 at net/core/skbuff.c:1216 __sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
WARNING: CPU: 0 PID: 10 at net/core/skbuff.c:1216 sk_skb_reason_drop+0x97/0x1b0 net/core/skbuff.c:1241
Modules linked in:
CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.12.0-10686-gbb18265c3aba #10 1c308307628619808b5a4a0495c4aab5637b0551
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker
RIP: 0010:__sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
RIP: 0010:sk_skb_reason_drop+0x97/0x1b0 net/core/skbuff.c:1241
Code: 5d 41 5c 41 5d 41 5e e9 e7 9e 95 fd e8 e2 9e 95 fd 31 ff 44 89 e6 e8 58 a1 95 fd 45 85 e4 0f 85 a2 00 00 00 e8 ca 9e 95 fd 90 <0f> 0b 90 e8 c1 9e 95 fd 44 89 e6 bf 01 00 00 00 e8 34 a1 95 fd 41
RSP: 0018:ffa0000000007650 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000000000000ffff RCX: ffffffff83bc3592
RDX: ff110001002a0000 RSI: ffffffff83bc34d6 RDI: 0000000000000007
RBP: ff11000109ee85f0 R08: 0000000000000001 R09: ffe21c00213dd0da
R10: 000000000000ffff R11: 0000000000000000 R12: 00000000ffffffea
R13: 0000000000000000 R14: ff11000109ee86d4 R15: ff11000109ee8648
FS: 0000000000000000(0000) GS:ff1100011a000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020177000 CR3: 0000000108a3d006 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000600
PKRU: 55555554
Call Trace:
<IRQ>
kfree_skb_reason include/linux/skbuff.h:1263 [inline]
ip_rcv_finish_core.constprop.0+0x896/0x2320 net/ipv4/ip_input.c:424
ip_list_rcv_finish.constprop.0+0x1b2/0x710 net/ipv4/ip_input.c:610
ip_sublist_rcv net/ipv4/ip_input.c:636 [inline]
ip_list_rcv+0x34a/0x460 net/ipv4/ip_input.c:670
__netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
__netif_receive_skb_list_core+0x536/0x900 net/core/dev.c:5762
__netif_receive_skb_list net/core/dev.c:5814 [inline]
netif_receive_skb_list_internal+0x77c/0xdc0 net/core/dev.c:5905
gro_normal_list include/net/gro.h:515 [inline]
gro_normal_list include/net/gro.h:511 [inline]
napi_complete_done+0x219/0x8c0 net/core/dev.c:6256
wg_packet_rx_poll+0xbff/0x1e40 drivers/net/wireguard/receive.c:488
__napi_poll.constprop.0+0xb3/0x530 net/core/dev.c:6877
napi_poll net/core/dev.c:6946 [inline]
net_rx_action+0x9eb/0xe30 net/core/dev.c:7068
handle_softirqs+0x1ac/0x740 kernel/softirq.c:554
do_softirq kernel/softirq.c:455 [inline]
do_softirq+0x48/0x80 kernel/softirq.c:442
</IRQ>
<TASK>
__local_bh_enable_ip+0xed/0x110 kernel/softirq.c:382
spin_unlock_bh include/linux/spinlock.h:396 [inline]
ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline]
wg_packet_decrypt_worker+0x3ba/0x580 drivers/net/wireguard/receive.c:499
process_one_work+0x940/0x1a70 kernel/workqueue.c:3229
process_scheduled_works kernel/workqueue.c:3310 [inline]
worker_thread+0x639/0xe30 kernel/workqueue.c:3391
kthread+0x283/0x350 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244
</TASK>
Fixes: 82d9983ebeb8 ("net: ip: make ip_route_input_noref() return drop reasons")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241206020715.80207-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 66600fac7a98 ("net: stmmac: TSO: Fix unbalanced DMA map/unmap
for non-paged SKB data") moved the assignment of tx_skbuff_dma[]'s
members to be later in stmmac_tso_xmit().
The buf (dma cookie) and len stored in this structure are passed to
dma_unmap_single() by stmmac_tx_clean(). The DMA API requires that
the dma cookie passed to dma_unmap_single() is the same as the value
returned from dma_map_single(). However, by moving the assignment
later, this is not the case when priv->dma_cap.addr64 > 32 as "des"
is offset by proto_hdr_len.
This causes problems such as:
dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
and with DMA_API_DEBUG enabled:
DMA-API: dwc-eth-dwmac 2490000.ethernet: device driver tries to +free DMA memory it has not allocated [device address=0x000000ffffcf65c0] [size=66 bytes]
Fix this by maintaining "des" as the original DMA cookie, and use
tso_des to pass the offset DMA cookie to stmmac_tso_allocator().
Full details of the crashes can be found at:
https://lore.kernel.org/all/d8112193-0386-4e14-b516-37c2d838171a@nvidia.com/
https://lore.kernel.org/all/klkzp5yn5kq5efgtrow6wbvnc46bcqfxs65nz3qy77ujr5turc@bwwhelz2l4dw/
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Reported-by: Thierry Reding <thierry.reding@gmail.com>
Fixes: 66600fac7a98 ("net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data")
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Furong Xu <0x1207@gmail.com>
Link: https://patch.msgid.link/E1tJXcx-006N4Z-PC@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull smb client fixes from Steve French:
- DFS fix (for race with tree disconnect and dfs cache worker)
- Four fixes for SMB3.1.1 posix extensions:
- improve special file support e.g. to Samba, retrieving the file
type earlier
- reduce roundtrips (e.g. on ls -l, in some cases)
* tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix potential race in cifs_put_tcon()
smb3.1.1: fix posix mounts to older servers
fs/smb/client: cifs_prime_dcache() for SMB3 POSIX reparse points
fs/smb/client: Implement new SMB3 POSIX type
fs/smb/client: avoid querying SMB2_OP_QUERY_WSL_EA for SMB3 POSIX
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Large number of small fixes, all in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
scsi: scsi_debug: Fix hrtimer support for ndelay
scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_OVERRUN as an error
scsi: ufs: core: Add missing post notify for power mode change
scsi: sg: Fix slab-use-after-free read in sg_release()
scsi: ufs: core: sysfs: Prevent div by zero
scsi: qla2xxx: Update version to 10.02.09.400-k
scsi: qla2xxx: Supported speed displayed incorrectly for VPorts
scsi: qla2xxx: Fix NVMe and NPIV connect issue
scsi: qla2xxx: Remove check req_sg_cnt should be equal to rsp_sg_cnt
scsi: qla2xxx: Fix use after free on unload
scsi: qla2xxx: Fix abort in bsg timeout
scsi: mpi3mr: Update driver version to 8.12.0.3.50
scsi: mpi3mr: Handling of fault code for insufficient power
scsi: mpi3mr: Start controller indexing from 0
scsi: mpi3mr: Fix corrupt config pages PHY state is switched in sysfs
scsi: mpi3mr: Synchronize access to ioctl data buffer
scsi: mpt3sas: Update driver version to 51.100.00.00
scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load time
scsi: ufs: pltfrm: Dellocate HBA during ufshcd_pltfrm_remove()
scsi: ufs: pltfrm: Drop PM runtime reference count after ufshcd_remove()
...
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Target fix using incorrect zero buffer (Nilay)
- Device specifc deallocate quirk fixes (Christoph, Keith)
- Fabrics fix for handling max command target bugs (Maurizio)
- Cocci fix usage for kzalloc (Yu-Chen)
- DMA size fix for host memory buffer feature (Christoph)
- Fabrics queue cleanup fixes (Chunguang)
- CPU hotplug ordering fixes
- Add missing MODULE_DESCRIPTION for rnull
- bcache error value fix
- virtio-blk queue freeze fix
* tag 'block-6.13-20241207' of git://git.kernel.dk/linux:
blk-mq: move cpuhp callback registering out of q->sysfs_lock
blk-mq: register cpuhp callback after hctx is added to xarray table
virtio-blk: don't keep queue frozen during system suspend
nvme-tcp: simplify nvme_tcp_teardown_io_queues()
nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()
nvme-rdma: unquiesce admin_q before destroy it
nvme-tcp: fix the memleak while create new ctrl failed
nvme-pci: don't use dma_alloc_noncontiguous with 0 merge boundary
nvmet: replace kmalloc + memset with kzalloc for data allocation
nvme-fabrics: handle zero MAXCMD without closing the connection
bcache: revert replacing IS_ERR_OR_NULL with IS_ERR again
nvme-pci: remove two deallocate zeroes quirks
block: rnull: add missing MODULE_DESCRIPTION
nvme: don't apply NVME_QUIRK_DEALLOCATE_ZEROES when DSM is not supported
nvmet: use kzalloc instead of ZERO_PAGE in nvme_execute_identify_ns_nvm()
|
|
Pull io_uring fix from Jens Axboe:
"A single fix for a parameter type which affects 32-bit"
* tag 'io_uring-6.13-20241207' of git://git.kernel.dk/linux:
io_uring: Change res2 parameter type in io_uring_cmd_done
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull jffs2 fix from Richard Weinberger:
- Fixup rtime compressor bounds checking
* tag 'ubifs-for-linus-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
jffs2: Fix rtime decompressor
|
|
Linus noticed that the new if_not_guard() definition is fragile:
"This macro generates actively wrong code if it happens to be inside an
if-statement or a loop without a block.
IOW, code like this:
for (iterate-over-something)
if_not_guard(a)
return -BUSY;
looks like will build fine, but will generate completely incorrect code."
The reason is that the __if_not_guard() macro is multi-statement, so
while most kernel developers expect macros to be simple or at least
compound statements - but for __if_not_guard() it is not so:
#define __if_not_guard(_name, _id, args...) \
BUILD_BUG_ON(!__is_cond_ptr(_name)); \
CLASS(_name, _id)(args); \
if (!__guard_ptr(_name)(&_id))
To add insult to injury, the placement of the BUILD_BUG_ON() line makes
the macro appear to compile fine, but it will generate incorrect code
as Linus reported, for example if used within iteration or conditional
statements that will use the first statement of a macro as a loop body
or conditional statement body.
[ I'd also like to note that the original submission by David Lechner did
not contain the BUILD_BUG_ON() line, so it was safer than what we ended
up committing. Mea culpa. ]
It doesn't appear to be possible to turn this macro into a robust
single or compound statement that could be used in single statements,
due to the necessity to define an auto scope variable with an open
scope and the necessity of it having to expand to a partial 'if'
statement with no body.
Instead of trying to work around this fragility, just remove the
construct before it gets used.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Lechner <dlechner@baylibre.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/Z1LBnX9TpZLR5Dkf@gmail.com
|
|
Ilya reported a slab-use-after-free in dst_destroy [1]
Issue is in xfrm6_net_init() and xfrm4_net_init() :
They copy xfrm[46]_dst_ops_template into net->xfrm.xfrm[46]_dst_ops.
But net structure might be freed before all the dst callbacks are
called. So when dst_destroy() calls later :
if (dst->ops->destroy)
dst->ops->destroy(dst);
dst->ops points to the old net->xfrm.xfrm[46]_dst_ops, which has been freed.
See a relevant issue fixed in :
ac888d58869b ("net: do not delay dst_entries_add() in dst_release()")
A fix is to queue the 'struct net' to be freed after one
another cleanup_net() round (and existing rcu_barrier())
[1]
BUG: KASAN: slab-use-after-free in dst_destroy (net/core/dst.c:112)
Read of size 8 at addr ffff8882137ccab0 by task swapper/37/0
Dec 03 05:46:18 kernel:
CPU: 37 UID: 0 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 6.12.0 #67
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.1-1.el9 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:124)
print_address_description.constprop.0 (mm/kasan/report.c:378)
? dst_destroy (net/core/dst.c:112)
print_report (mm/kasan/report.c:489)
? dst_destroy (net/core/dst.c:112)
? kasan_addr_to_slab (mm/kasan/common.c:37)
kasan_report (mm/kasan/report.c:603)
? dst_destroy (net/core/dst.c:112)
? rcu_do_batch (kernel/rcu/tree.c:2567)
dst_destroy (net/core/dst.c:112)
rcu_do_batch (kernel/rcu/tree.c:2567)
? __pfx_rcu_do_batch (kernel/rcu/tree.c:2491)
? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4339 kernel/locking/lockdep.c:4406)
rcu_core (kernel/rcu/tree.c:2825)
handle_softirqs (kernel/softirq.c:554)
__irq_exit_rcu (kernel/softirq.c:589 kernel/softirq.c:428 kernel/softirq.c:637)
irq_exit_rcu (kernel/softirq.c:651)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049)
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702)
RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:743)
Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d c7 c9 27 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90
RSP: 0018:ffff888100d2fe00 EFLAGS: 00000246
RAX: 00000000001870ed RBX: 1ffff110201a5fc2 RCX: ffffffffb61a3e46
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb3d4d123
RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed11c7e1835d
R10: ffff888e3f0c1aeb R11: 0000000000000000 R12: 0000000000000000
R13: ffff888100d20000 R14: dffffc0000000000 R15: 0000000000000000
? ct_kernel_exit.constprop.0 (kernel/context_tracking.c:148)
? cpuidle_idle_call (kernel/sched/idle.c:186)
default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118)
cpuidle_idle_call (kernel/sched/idle.c:186)
? __pfx_cpuidle_idle_call (kernel/sched/idle.c:168)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5848)
? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4347 kernel/locking/lockdep.c:4406)
? tsc_verify_tsc_adjust (arch/x86/kernel/tsc_sync.c:59)
do_idle (kernel/sched/idle.c:326)
cpu_startup_entry (kernel/sched/idle.c:423 (discriminator 1))
start_secondary (arch/x86/kernel/smpboot.c:202 arch/x86/kernel/smpboot.c:282)
? __pfx_start_secondary (arch/x86/kernel/smpboot.c:232)
? soft_restart_cpu (arch/x86/kernel/head_64.S:452)
common_startup_64 (arch/x86/kernel/head_64.S:414)
</TASK>
Dec 03 05:46:18 kernel:
Allocated by task 12184:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
__kasan_slab_alloc (mm/kasan/common.c:319 mm/kasan/common.c:345)
kmem_cache_alloc_noprof (mm/slub.c:4085 mm/slub.c:4134 mm/slub.c:4141)
copy_net_ns (net/core/net_namespace.c:421 net/core/net_namespace.c:480)
create_new_namespaces (kernel/nsproxy.c:110)
unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4))
ksys_unshare (kernel/fork.c:3313)
__x64_sys_unshare (kernel/fork.c:3382)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Dec 03 05:46:18 kernel:
Freed by task 11:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
kasan_save_free_info (mm/kasan/generic.c:582)
__kasan_slab_free (mm/kasan/common.c:271)
kmem_cache_free (mm/slub.c:4579 mm/slub.c:4681)
cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:446 net/core/net_namespace.c:647)
process_one_work (kernel/workqueue.c:3229)
worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
kthread (kernel/kthread.c:389)
ret_from_fork (arch/x86/kernel/process.c:147)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
Dec 03 05:46:18 kernel:
Last potentially related work creation:
kasan_save_stack (mm/kasan/common.c:48)
__kasan_record_aux_stack (mm/kasan/generic.c:541)
insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186)
__queue_work (kernel/workqueue.c:2340)
queue_work_on (kernel/workqueue.c:2391)
xfrm_policy_insert (net/xfrm/xfrm_policy.c:1610)
xfrm_add_policy (net/xfrm/xfrm_user.c:2116)
xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321)
netlink_rcv_skb (net/netlink/af_netlink.c:2536)
xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344)
netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342)
netlink_sendmsg (net/netlink/af_netlink.c:1886)
sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165)
vfs_write (fs/read_write.c:590 fs/read_write.c:683)
ksys_write (fs/read_write.c:736)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Dec 03 05:46:18 kernel:
Second to last potentially related work creation:
kasan_save_stack (mm/kasan/common.c:48)
__kasan_record_aux_stack (mm/kasan/generic.c:541)
insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186)
__queue_work (kernel/workqueue.c:2340)
queue_work_on (kernel/workqueue.c:2391)
__xfrm_state_insert (./include/linux/workqueue.h:723 net/xfrm/xfrm_state.c:1150 net/xfrm/xfrm_state.c:1145 net/xfrm/xfrm_state.c:1513)
xfrm_state_update (./include/linux/spinlock.h:396 net/xfrm/xfrm_state.c:1940)
xfrm_add_sa (net/xfrm/xfrm_user.c:912)
xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321)
netlink_rcv_skb (net/netlink/af_netlink.c:2536)
xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344)
netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342)
netlink_sendmsg (net/netlink/af_netlink.c:1886)
sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165)
vfs_write (fs/read_write.c:590 fs/read_write.c:683)
ksys_write (fs/read_write.c:736)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Fixes: a8a572a6b5f2 ("xfrm: dst_entries_init() per-net dst_ops")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Closes: https://lore.kernel.org/netdev/CANn89iKKYDVpB=MtmfH7nyv2p=rJWSLedO5k7wSZgtY_tO8WQg@mail.gmail.com/T/#m02c98c3009fe66382b73cfb4db9cf1df6fab3fbf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241204125455.3871859-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|