summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-19net: airoha: Always check return value from airoha_ppe_foe_get_entry()Lorenzo Bianconi
airoha_ppe_foe_get_entry routine can return NULL, so check the returned pointer is not NULL in airoha_ppe_foe_flow_l2_entry_update() Fixes: b81e0f2b58be3 ("net: airoha: Add FLOW_CLS_STATS callback support") Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250618-check-ret-from-airoha_ppe_foe_get_entry-v2-1-068dcea3cc66@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-19NFC: nci: uart: Set tty->disc_data only in success pathKrzysztof Kozlowski
Setting tty->disc_data before opening the NCI device means we need to clean it up on error paths. This also opens some short window if device starts sending data, even before NCIUARTSETDRIVER IOCTL succeeded (broken hardware?). Close the window by exposing tty->disc_data only on the success path, when opening of the NCI device and try_module_get() succeeds. The code differs in error path in one aspect: tty->disc_data won't be ever assigned thus NULL-ified. This however should not be relevant difference, because of "tty->disc_data=NULL" in nci_uart_tty_open(). Cc: Linus Torvalds <torvalds@linuxfoundation.org> Fixes: 9961127d4bce ("NFC: nci: add generic uart support") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20250618073649.25049-2-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-19calipso: Fix null-ptr-deref in calipso_req_{set,del}attr().Kuniyuki Iwashima
syzkaller reported a null-ptr-deref in sock_omalloc() while allocating a CALIPSO option. [0] The NULL is of struct sock, which was fetched by sk_to_full_sk() in calipso_req_setattr(). Since commit a1a5344ddbe8 ("tcp: avoid two atomic ops for syncookies"), reqsk->rsk_listener could be NULL when SYN Cookie is returned to its client, as hinted by the leading SYN Cookie log. Here are 3 options to fix the bug: 1) Return 0 in calipso_req_setattr() 2) Return an error in calipso_req_setattr() 3) Alaways set rsk_listener 1) is no go as it bypasses LSM, but 2) effectively disables SYN Cookie for CALIPSO. 3) is also no go as there have been many efforts to reduce atomic ops and make TCP robust against DDoS. See also commit 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood"). As of the blamed commit, SYN Cookie already did not need refcounting, and no one has stumbled on the bug for 9 years, so no CALIPSO user will care about SYN Cookie. Let's return an error in calipso_req_setattr() and calipso_req_delattr() in the SYN Cookie case. This can be reproduced by [1] on Fedora and now connect() of nc times out. [0]: TCP: request_sock_TCPv6: Possible SYN flooding on port [::]:20002. Sending cookies. Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] CPU: 3 UID: 0 PID: 12262 Comm: syz.1.2611 Not tainted 6.14.0 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:read_pnet include/net/net_namespace.h:406 [inline] RIP: 0010:sock_net include/net/sock.h:655 [inline] RIP: 0010:sock_kmalloc+0x35/0x170 net/core/sock.c:2806 Code: 89 d5 41 54 55 89 f5 53 48 89 fb e8 25 e3 c6 fd e8 f0 91 e3 00 48 8d 7b 30 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 26 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b RSP: 0018:ffff88811af89038 EFLAGS: 00010216 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff888105266400 RDX: 0000000000000006 RSI: ffff88800c890000 RDI: 0000000000000030 RBP: 0000000000000050 R08: 0000000000000000 R09: ffff88810526640e R10: ffffed1020a4cc81 R11: ffff88810526640f R12: 0000000000000000 R13: 0000000000000820 R14: ffff888105266400 R15: 0000000000000050 FS: 00007f0653a07640(0000) GS:ffff88811af80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f863ba096f4 CR3: 00000000163c0005 CR4: 0000000000770ef0 PKRU: 80000000 Call Trace: <IRQ> ipv6_renew_options+0x279/0x950 net/ipv6/exthdrs.c:1288 calipso_req_setattr+0x181/0x340 net/ipv6/calipso.c:1204 calipso_req_setattr+0x56/0x80 net/netlabel/netlabel_calipso.c:597 netlbl_req_setattr+0x18a/0x440 net/netlabel/netlabel_kapi.c:1249 selinux_netlbl_inet_conn_request+0x1fb/0x320 security/selinux/netlabel.c:342 selinux_inet_conn_request+0x1eb/0x2c0 security/selinux/hooks.c:5551 security_inet_conn_request+0x50/0xa0 security/security.c:4945 tcp_v6_route_req+0x22c/0x550 net/ipv6/tcp_ipv6.c:825 tcp_conn_request+0xec8/0x2b70 net/ipv4/tcp_input.c:7275 tcp_v6_conn_request+0x1e3/0x440 net/ipv6/tcp_ipv6.c:1328 tcp_rcv_state_process+0xafa/0x52b0 net/ipv4/tcp_input.c:6781 tcp_v6_do_rcv+0x8a6/0x1a40 net/ipv6/tcp_ipv6.c:1667 tcp_v6_rcv+0x505e/0x5b50 net/ipv6/tcp_ipv6.c:1904 ip6_protocol_deliver_rcu+0x17c/0x1da0 net/ipv6/ip6_input.c:436 ip6_input_finish+0x103/0x180 net/ipv6/ip6_input.c:480 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_input+0x13c/0x6b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:469 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] ip6_rcv_finish+0xb6/0x490 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ipv6_rcv+0xf9/0x490 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core+0x12e/0x1f0 net/core/dev.c:5896 __netif_receive_skb+0x1d/0x170 net/core/dev.c:6009 process_backlog+0x41e/0x13b0 net/core/dev.c:6357 __napi_poll+0xbd/0x710 net/core/dev.c:7191 napi_poll net/core/dev.c:7260 [inline] net_rx_action+0x9de/0xde0 net/core/dev.c:7382 handle_softirqs+0x19a/0x770 kernel/softirq.c:561 do_softirq.part.0+0x36/0x70 kernel/softirq.c:462 </IRQ> <TASK> do_softirq arch/x86/include/asm/preempt.h:26 [inline] __local_bh_enable_ip+0xf1/0x110 kernel/softirq.c:389 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline] __dev_queue_xmit+0xc2a/0x3c40 net/core/dev.c:4679 dev_queue_xmit include/linux/netdevice.h:3313 [inline] neigh_hh_output include/net/neighbour.h:523 [inline] neigh_output include/net/neighbour.h:537 [inline] ip6_finish_output2+0xd69/0x1f80 net/ipv6/ip6_output.c:141 __ip6_finish_output net/ipv6/ip6_output.c:215 [inline] ip6_finish_output+0x5dc/0xd60 net/ipv6/ip6_output.c:226 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x24b/0x8d0 net/ipv6/ip6_output.c:247 dst_output include/net/dst.h:459 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_xmit+0xbbc/0x20d0 net/ipv6/ip6_output.c:366 inet6_csk_xmit+0x39a/0x720 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x1a7b/0x3b40 net/ipv4/tcp_output.c:1471 tcp_transmit_skb net/ipv4/tcp_output.c:1489 [inline] tcp_send_syn_data net/ipv4/tcp_output.c:4059 [inline] tcp_connect+0x1c0c/0x4510 net/ipv4/tcp_output.c:4148 tcp_v6_connect+0x156c/0x2080 net/ipv6/tcp_ipv6.c:333 __inet_stream_connect+0x3a7/0xed0 net/ipv4/af_inet.c:677 tcp_sendmsg_fastopen+0x3e2/0x710 net/ipv4/tcp.c:1039 tcp_sendmsg_locked+0x1e82/0x3570 net/ipv4/tcp.c:1091 tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1358 inet6_sendmsg+0xb9/0x150 net/ipv6/af_inet6.c:659 sock_sendmsg_nosec net/socket.c:718 [inline] __sock_sendmsg+0xf4/0x2a0 net/socket.c:733 __sys_sendto+0x29a/0x390 net/socket.c:2187 __do_sys_sendto net/socket.c:2194 [inline] __se_sys_sendto net/socket.c:2190 [inline] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2190 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xc3/0x1d0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f06553c47ed Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f0653a06fc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f0655605fa0 RCX: 00007f06553c47ed RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000b RBP: 00007f065545db38 R08: 0000200000000140 R09: 000000000000001c R10: f7384d4ea84b01bd R11: 0000000000000246 R12: 0000000000000000 R13: 00007f0655605fac R14: 00007f0655606038 R15: 00007f06539e7000 </TASK> Modules linked in: [1]: dnf install -y selinux-policy-targeted policycoreutils netlabel_tools procps-ng nmap-ncat mount -t selinuxfs none /sys/fs/selinux load_policy netlabelctl calipso add pass doi:1 netlabelctl map del default netlabelctl map add default address:::1 protocol:calipso,1 sysctl net.ipv4.tcp_syncookies=2 nc -l ::1 80 & nc ::1 80 Fixes: e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm.") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: John Cheung <john.cs.hey@gmail.com> Closes: https://lore.kernel.org/netdev/CAP=Rh=MvfhrGADy+-WJiftV2_WzMH4VEhEFmeT28qY+4yxNu4w@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20250617224125.17299-1-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-19MAINTAINERS: Remove Shannon Nelson from MAINTAINERS fileShannon Nelson
Brett Creeley is taking ownership of AMD/Pensando drivers while I wander off into the sunset with my retirement this month. I'll still keep an eye out on a few topics for awhile, and maybe do some free-lance work in the future. Meanwhile, thank you all for the fun and support and the many learning opportunities :-). Special thanks go to DaveM for merging my first patch long ago, the big ionic patchset a few years ago, and my last patchset last week. Redirect things to a non-corporate account. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Link: https://patch.msgid.link/20250616224437.56581-1-shannon.nelson@amd.com [Jakub: squash in the .mailmap update] Signed-off-by: Shannon Nelson <sln@onemain.com> Link: https://patch.msgid.link/20250619010603.1173141-1-sln@onemain.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-19Merge tag 'linux-can-next-for-6.17-20250618' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2025-06-18 All 10 patches are by Geert Uytterhoeven, target the rcar_canfd driver, first cleanup/refactor the driver and then add support for Transceiver Delay Compensation. * tag 'linux-can-next-for-6.17-20250618' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: rcar_canfd: Add support for Transceiver Delay Compensation can: rcar_canfd: Return early in rcar_canfd_set_bittiming() when not FD can: rcar_canfd: Share config code in rcar_canfd_set_bittiming() can: rcar_canfd: Rename rcar_canfd_setrnc() to rcar_canfd_set_rnc() can: rcar_canfd: Repurpose f_dcfg base for other registers can: rcar_canfd: Simplify data access in rcar_canfd_{ge,pu}t_data() can: rcar_canfd: Add helper variable dev to rcar_canfd_reset_controller() can: rcar_canfd: Add helper variable ndev to rcar_canfd_rx_pkt() can: rcar_canfd: Remove bittiming debug prints can: rcar_canfd: Consistently use ndev for net_device pointers ==================== Link: https://patch.msgid.link/20250618092336.2175168-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-19net: lan743x: fix potential out-of-bounds write in ↵Alexey Kodanev
lan743x_ptp_io_event_clock_get() Before calling lan743x_ptp_io_event_clock_get(), the 'channel' value is checked against the maximum value of PCI11X1X_PTP_IO_MAX_CHANNELS(8). This seems correct and aligns with the PTP interrupt status register (PTP_INT_STS) specifications. However, lan743x_ptp_io_event_clock_get() writes to ptp->extts[] with only LAN743X_PTP_N_EXTTS(4) elements, using channel as an index: lan743x_ptp_io_event_clock_get(..., u8 channel,...) { ... /* Update Local timestamp */ extts = &ptp->extts[channel]; extts->ts.tv_sec = sec; ... } To avoid an out-of-bounds write and utilize all the supported GPIO inputs, set LAN743X_PTP_N_EXTTS to 8. Detected using the static analysis tool - Svace. Fixes: 60942c397af6 ("net: lan743x: Add support for PTP-IO Event Input External Timestamp (extts)") Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Rengarajan S <rengarajan.s@microchip.com> Link: https://patch.msgid.link/20250616113743.36284-1-aleksei.kodanev@bell-sw.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19Merge branch 'selftests-net-use-slowwait-to-make-sure-setup-finished'Paolo Abeni
Hangbin Liu says: ==================== selftests: net: use slowwait to make sure setup finished The two updated tests sometimes failed because the network setup hadn't completed. Used slowwait to ensure the setup finished and the tests always passed. I ran both tests 50 times, and all of them passed. ==================== Link: https://patch.msgid.link/20250617105101.433718-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19selftests: net: use slowwait to make sure IPv6 setup finishedHangbin Liu
Sometimes the vxlan vnifiltering test failed on slow machines due to network setup not finished. e.g. TEST: VM connectivity over vnifiltering vxlan (ipv4 default rdst) [ OK ] TEST: VM connectivity over vnifiltering vxlan (ipv6 default rdst) [FAIL] Let's use slowwait to make sure the connection is finished. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250617105101.433718-3-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19selftests: net: use slowwait to stabilize vrf_route_leaking testHangbin Liu
The vrf_route_leaking test occasionally fails due to connectivity issues in our testing environment. A sample failure message shows that the ping check fails intermittently PING 2001:db8:16:2::2 (2001:db8:16:2::2) 56 data bytes --- 2001:db8:16:2::2 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms TEST: Basic IPv6 connectivity [FAIL] This is likely due to insufficient wait time on slower machines. To address this, switch to using slowwait, which provides a longer and more reliable wait for setup completion. Before this change, the test failed 3 out of 10 times. After applying this fix, the test was run 30 times without any failure. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250617105101.433718-2-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19Merge branch 'support-bandwidth-clamping-in-mana-using-net-shapers'Paolo Abeni
Erni Sri Satya Vennela says: ==================== Support bandwidth clamping in mana using net shapers This patchset introduces hardware-backed bandwidth rate limiting for MANA NICs via the net_shaper_ops interface, enabling efficient and fine-grained traffic shaping directly on the device. Previously, MANA lacked a mechanism for user-configurable bandwidth control. With this addition, users can now configure shaping parameters, allowing better traffic management and performance isolation. The implementation includes the net_shaper_ops callbacks in the MANA driver and supports one shaper per vport. Add shaping support via mana_set_bw_clamp(), allowing the configuration of bandwidth rates in 100 Mbps increments (minimum 100 Mbps). The driver validates input and rejects unsupported values. On failure, it restores the previous configuration which is queried using mana_query_link_cfg() or retains the current state. To prevent potential deadlocks introduced by net_shaper_ops, switch to _locked variants of NAPI APIs when netdevops_lock is held during VF setup and teardown. Also, Add support for ethtool get_link_ksettings to report the maximum link speed supported by the SKU in mbps. These APIs when invoked on hardware that are older or that do not support these APIs, the speed would be reported as UNKNOWN and the net-shaper calls to set speed would fail. ==================== Link: https://patch.msgid.link/1750144656-2021-1-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19net: mana: Handle unsupported HWC commandsErni Sri Satya Vennela
If any of the HWC commands are not recognized by the underlying hardware, the hardware returns the response header status of -1. Log the information using netdev_info_once to avoid multiple error logs in dmesg. Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/1750144656-2021-5-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19net: mana: Add speed support in mana_get_link_ksettingsErni Sri Satya Vennela
Allow mana ethtool get_link_ksettings operation to report the maximum speed supported by the SKU in mbps. The driver retrieves this information by issuing a HWC command to the hardware via mana_query_link_cfg(), which retrieves the SKU's maximum supported speed. These APIs when invoked on hardware that are older/do not support these APIs, the speed would be reported as UNKNOWN. Before: $ethtool enP30832s1 > Settings for enP30832s1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: Unknown! Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes After: $ethtool enP30832s1 > Settings for enP30832s1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 16000Mb/s Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1750144656-2021-4-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19net: mana: Add support for net_shaper_opsErni Sri Satya Vennela
Introduce support for net_shaper_ops in the MANA driver, enabling configuration of rate limiting on the MANA NIC. To apply rate limiting, the driver issues a HWC command via mana_set_bw_clamp() and updates the corresponding shaper object in the net_shaper cache. If an error occurs during this process, the driver restores the previous speed by querying the current link configuration using mana_query_link_cfg(). The minimum supported bandwidth is 100 Mbps, and only values that are exact multiples of 100 Mbps are allowed. Any other values are rejected. To remove a shaper, the driver resets the bandwidth to the maximum supported by the SKU using mana_set_bw_clamp() and clears the associated cache entry. If an error occurs during this process, the shaper details are retained. On the hardware that does not support these APIs, the net-shaper calls to set speed would fail. Set the speed: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/net_shaper.yaml \ --do set --json '{"ifindex":'$IFINDEX', "handle":{"scope": "netdev", "id":'$ID' }, "bw-max": 200000000 }' Get the shaper details: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/net_shaper.yaml \ --do get --json '{"ifindex":'$IFINDEX', "handle":{"scope": "netdev", "id":'$ID' }}' > {'bw-max': 200000000, > 'handle': {'scope': 'netdev'}, > 'ifindex': $IFINDEX, > 'metric': 'bps'} Delete the shaper object: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/net_shaper.yaml \ --do delete --json '{"ifindex":'$IFINDEX', "handle":{"scope": "netdev","id":'$ID' }}' Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1750144656-2021-3-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19net: mana: Fix potential deadlocks in mana napi opsErni Sri Satya Vennela
When net_shaper_ops are enabled for MANA, netdev_ops_lock becomes active. MANA VF setup/teardown by netvsc follows this call chain: netvsc_vf_setup() dev_change_flags() ... __dev_open() OR __dev_close() dev_change_flags() holds the netdev mutex via netdev_lock_ops. Meanwhile, mana_create_txq() and mana_create_rxq() in mana_open() path call NAPI APIs (netif_napi_add_tx(), netif_napi_add_weight(), napi_enable()), which also try to acquire the same lock, risking deadlock. Similarly in the teardown path (mana_close()), netif_napi_disable() and netif_napi_del(), contend for the same lock. Switch to the _locked variants of these APIs to avoid deadlocks when the netdev_ops_lock is held. Fixes: d4c22ec680c8 ("net: hold netdev instance lock during ndo_open/ndo_stop") Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1750144656-2021-2-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19eth: fbnic: avoid double free when failing to DMA-map FW msgJakub Kicinski
The semantics are that caller of fbnic_mbx_map_msg() retains the ownership of the message on error. All existing callers dutifully free the page. Fixes: da3cde08209e ("eth: fbnic: Add FW communication mechanism") Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250616195510.225819-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19ipv6: Simplify link-local address generation for IPv6 GRE.Guillaume Nault
Since commit 3e6a0243ff00 ("gre: Fix again IPv6 link-local address generation."), addrconf_gre_config() has stopped handling IP6GRE devices specially and just calls the regular addrconf_addr_gen() function to create their link-local IPv6 addresses. We can thus avoid using addrconf_gre_config() for IP6GRE devices and use the normal IPv6 initialisation path instead (that is, jump directly to addrconf_dev_config() in addrconf_init_auto_addrs()). See commit 3e6a0243ff00 ("gre: Fix again IPv6 link-local address generation.") for a deeper explanation on how and why GRE devices started handling their IPv6 link-local address generation specially, why it was a problem, and why this is not even necessary in most cases (especially for GRE over IPv6). Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/a9144be9c7ec3cf09f25becae5e8fdf141fde9f6.1750075076.git.gnault@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19net/mlx4_en: Remove the redundant NULL check for the 'my_ets' objectAndrey Vatoropin
Static analysis shows that pointer "my_ets" cannot be NULL because it points to the object "struct ieee_ets". Remove the extra NULL check. It is meaningless and harms the readability of the code. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250616045034.26000-1-a.vatoropin@crpt.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-18Merge branch 'add-support-for-pse-budget-evaluation-strategy'Jakub Kicinski
Kory Maincent says: ==================== Add support for PSE budget evaluation strategy This series brings support for budget evaluation strategy in the PSE subsystem. PSE controllers can set priorities to decide which ports should be turned off in case of special events like over-current. This patch series adds support for two budget evaluation strategy. 1. Static Method: This method involves distributing power based on PD classification. It’s straightforward and stable, the PSE core keeping track of the budget and subtracting the power requested by each PD’s class. Advantages: Every PD gets its promised power at any time, which guarantees reliability. Disadvantages: PD classification steps are large, meaning devices request much more power than they actually need. As a result, the power supply may only operate at, say, 50% capacity, which is inefficient and wastes money. 2. Dynamic Method: To address the inefficiencies of the static method, vendors like Microchip have introduced dynamic power budgeting, as seen in the PD692x0 firmware. This method monitors the current consumption per port and subtracts it from the available power budget. When the budget is exceeded, lower-priority ports are shut down. Advantages: This method optimizes resource utilization, saving costs. Disadvantages: Low-priority devices may experience instability. The UAPI allows adding support for software port priority mode managed from userspace later if needed. Patches 1-2: Add support for interrupt event report in PSE core, ethtool and ethtool specs. Patch 3: Adds support for interrupt and event report in TPS23881 driver. Patches 4,5: Add support for PSE power domain in PSE core and ethtool. Patches 6-8: Add support for budget evaluation strategy in PSE core, ethtool and ethtool specs. Patches 9-11: Add support for port priority and power supplies in PD692x0 drivers. Patches 12,13: Add support for port priority in TPS23881 drivers. ==================== Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-0-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18dt-bindings: net: pse-pd: ti,tps23881: Add interrupt descriptionKory Maincent (Dent Project)
Add an interrupt property to the device tree bindings for the TI TPS23881 PSE controller. The interrupt is primarily used to detect classification and disconnection events, which are essential for managing the PSE controller in compliance with the PoE standard. Interrupt support is essential for the proper functioning of the TPS23881 controller. Without it, after a power-on (PWON), the controller will no longer perform detection and classification. This could lead to potential hazards, such as connecting a non-PoE device after a PoE device, which might result in magic smoke. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-13-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: pse-pd: tps23881: Add support for static port priority featureKory Maincent (Dent Project)
This patch enhances PSE callbacks by introducing support for the static port priority feature. It extends interrupt management to handle and report detection, classification, and disconnection events. Additionally, it introduces the pi_get_pw_req() callback, which provides information about the power requested by the Powered Devices. Interrupt support is essential for the proper functioning of the TPS23881 controller. Without it, after a power-on (PWON), the controller will no longer perform detection and classification. This could lead to potential hazards, such as connecting a non-PoE device after a PoE device, which might result in magic smoke. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-12-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18dt-bindings: net: pse-pd: microchip,pd692x0: Add manager regulator supplyKory Maincent (Dent Project)
Adds the regulator supply parameter of the managers. Update also the example as the regulator supply of the PSE PIs should be the managers itself and not an external regulator. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-11-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: pse-pd: pd692x0: Add support for controller and manager power suppliesKory Maincent (Dent Project)
Add support for managing the VDD and VDDA power supplies for the PD692x0 PSE controller, as well as the VAUX5 and VAUX3P3 power supplies for the PD6920x PSE managers. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-10-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: pse-pd: pd692x0: Add support for PSE PI priority featureKory Maincent (Dent Project)
This patch extends the PSE callbacks by adding support for the newly introduced pi_set_prio() callback, enabling the configuration of PSE PI priorities. The current port priority is now also included in the status information returned to users. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-9-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ethtool: Add PSE port priority support featureKory Maincent (Dent Project)
This patch expands the status information provided by ethtool for PSE c33 with current port priority and max port priority. It also adds a call to pse_ethtool_set_prio() to configure the PSE port priority. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-8-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: pse-pd: Add support for budget evaluation strategiesKory Maincent (Dent Project)
This patch introduces the ability to configure the PSE PI budget evaluation strategies. Budget evaluation strategies is utilized by PSE controllers to determine which ports to turn off first in scenarios such as power budget exceedance. The pis_prio_max value is used to define the maximum priority level supported by the controller. Both the current priority and the maximum priority are exposed to the user through the pse_ethtool_get_status call. This patch add support for two mode of budget evaluation strategies. 1. Static Method: This method involves distributing power based on PD classification. It’s straightforward and stable, the PSE core keeping track of the budget and subtracting the power requested by each PD’s class. Advantages: Every PD gets its promised power at any time, which guarantees reliability. Disadvantages: PD classification steps are large, meaning devices request much more power than they actually need. As a result, the power supply may only operate at, say, 50% capacity, which is inefficient and wastes money. Priority max value is matching the number of PSE PIs within the PSE. 2. Dynamic Method: To address the inefficiencies of the static method, vendors like Microchip have introduced dynamic power budgeting, as seen in the PD692x0 firmware. This method monitors the current consumption per port and subtracts it from the available power budget. When the budget is exceeded, lower-priority ports are shut down. Advantages: This method optimizes resource utilization, saving costs. Disadvantages: Low-priority devices may experience instability. Priority max value is set by the PSE controller driver. For now, budget evaluation methods are not configurable and cannot be mixed. They are hardcoded in the PSE driver itself, as no current PSE controller supports both methods. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-7-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: pse-pd: Add helper to report hardware enable status of the PIKory Maincent (Dent Project)
Refactor code by introducing a helper function to retrieve the hardware enabled state of the PI, avoiding redundant implementations in the future. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-6-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ethtool: Add support for new power domains index descriptionKory Maincent (Dent Project)
Report the index of the newly introduced PSE power domain to the user, enabling improved management of the power budget for PSE devices. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-5-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: pse-pd: Add support for PSE power domainsKory Maincent (Dent Project)
Introduce PSE power domain support as groundwork for upcoming port priority features. Multiple PSE PIs can now be grouped under a single PSE power domain, enabling future enhancements like defining available power budgets, port priority modes, and disconnection policies. This setup will allow the system to assess whether activating a port would exceed the available power budget, preventing over-budget states proactively. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-4-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: pse-pd: tps23881: Add support for PSE events and interruptsKory Maincent (Dent Project)
Add support for PSE event reporting through interrupts. Set up the newly introduced devm_pse_irq_helper helper to register the interrupt. Events are reported for over-current and over-temperature conditions. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-3-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: pse-pd: Add support for reporting eventsKory Maincent (Dent Project)
Add support for devm_pse_irq_helper() to register PSE interrupts and report events such as over-current or over-temperature conditions. This follows a similar approach to the regulator API but also sends notifications using a dedicated PSE ethtool netlink socket. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-2-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: pse-pd: Introduce attached_phydev to pse controlKory Maincent (Dent Project)
In preparation for reporting PSE events via ethtool notifications, introduce an attached_phydev field in the pse_control structure. This field stores the phy_device associated with the PSE PI, ensuring that notifications are sent to the correct network interface. The attached_phydev pointer is directly tied to the PHY lifecycle. It is set when the PHY is registered and cleared when the PHY is removed. There is no need to use a refcount, as doing so could interfere with the PHY removal process. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-1-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18Merge branch 'phc-support-in-ena-driver'Jakub Kicinski
David Arinzon says: ==================== PHC support in ENA driver This patchset adds the support for PHC (PTP Hardware Clock) in the ENA driver. The documentation part of the patchset includes additional information, including statistics, utilization and invocation examples through the testptp utility. ==================== Link: https://patch.msgid.link/20250617110545.5659-1-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: Add PHC documentationDavid Arinzon
Provide the relevant information and guidelines about the feature support in the ENA driver. Signed-off-by: Amit Bernstein <amitbern@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-10-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: View PHC stats using debugfsDavid Arinzon
Add an entry named `phc_stats` to view the PHC statistics. If PHC is enabled, the stats are printed, as below: phc_cnt: 0 phc_exp: 0 phc_skp: 0 phc_err_dv: 0 phc_err_ts: 0 If PHC is disabled, no statistics will be displayed. Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-9-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: Add debugfs support to the ENA driverDavid Arinzon
Adding the base directory of debugfs to the driver. In order for the folder to be unique per driver instantiation, the chosen name is the device name. This commit contains the initialization and the base folder. The creation of the base folder may fail, but is considered non-fatal. Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-8-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: Control PHC enable through devlinkDavid Arinzon
Add the capability to set parameters through the devlink framework. The parameter used for controlling PHC (enable/disable) details are as follows: - Name: enable_phc - Type: Boolean (true - enable/false - disable) - Mode: DEVLINK_PARAM_CMODE_DRIVERINIT - Effect: Changes take place during driver initialization, any changes require a devlink reload to take effect. Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-7-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18devlink: Add new "enable_phc" generic device paramDavid Arinzon
Add a new device generic parameter to enable/disable the PHC (PTP Hardware Clock) functionality in the device associated with the devlink instance. Signed-off-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250617110545.5659-6-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: Add devlink port supportDavid Arinzon
Add the basic functionality to support devlink port for devlink model completeness purposes. Current support is for registration/un-registration. Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-5-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: Add device reload capability through devlinkDavid Arinzon
Adding basic devlink capability support of reloading the driver. This capability is required to support driver init type devlink params (DEVLINK_PARAM_CMODE_DRIVERINIT). Such params require reloading of the driver (destroy/restore sequence). The reloading is done by the devlink framework using the hooks provided by the driver. Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-4-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: PHC silent resetDavid Arinzon
Each PHC device kernel registration receives a unique kernel index, which is associated with a new PHC device file located at "/dev/ptp<index>". This device file serves as an interface for obtaining PHC timestamps. Examples of tools that use "/dev/ptp" include testptp [1] and chrony [2]. A reset flow may occur in the ENA driver while PHC is active. During a reset, the driver will unregister and then re-register the PHC device with the kernel. Under race conditions, particularly during heavy PHC loads, the driver’s reset flow might complete faster than the kernel’s PHC unregister/register process. This can result in the PHC index being different from what it was prior to the reset, as the PHC index is selected using kernel ID allocation [3]. While driver rmmod/insmod are done by the user, a reset may occur at anytime, without the user awareness, consequently, the driver might receive a new PHC index after the reset, potentially compromising the user experience. To prevent this issue, the PHC flow will detect the reset during PHC destruction and will skip the PHC unregister/register calls to preserve the kernel PHC index. During the reset flow, any attempt to get a PHC timestamp will fail as expected, but the kernel PHC index will remain unchanged. [1]: https://github.com/torvalds/linux/blob/v6.1/tools/testing/selftests/ptp/testptp.c [2]: https://github.com/mlichvar/chrony [3]: https://www.kernel.org/doc/html/latest/core-api/idr.html Signed-off-by: Amit Bernstein <amitbern@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-3-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: Add PHC support in the ENA driverDavid Arinzon
The ENA driver will be extended to support the new PHC feature using ptp_clock interface [1]. this will provide timestamp reference for user space to allow measuring time offset between the PHC and the system clock in order to achieve nanosecond accuracy. [1] - https://www.kernel.org/doc/html/latest/driver-api/ptp.html Signed-off-by: Amit Bernstein <amitbern@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18Merge branch 'udp_tunnel-remove-rtnl_lock-dependency'Jakub Kicinski
Stanislav Fomichev says: ==================== udp_tunnel: remove rtnl_lock dependency Recently bnxt had to grow back a bunch of rtnl dependencies because of udp_tunnel's infra. Add separate (global) mutext to protect udp_tunnel state. ==================== Link: https://patch.msgid.link/20250616162117.287806-1-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18Revert "bnxt_en: bring back rtnl_lock() in the bnxt_open() path"Stanislav Fomichev
This reverts commit 325eb217e41fa14f307c7cc702bd18d0bb38fe84. udp_tunnel infra doesn't need RTNL, should be safe to get back to only netdev instance lock. Cc: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-7-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18netdevsim: remove udp_ports_sleepStanislav Fomichev
Now that there is only one path in udp_tunnel, there is no need to have udp_ports_sleep knob. Remove it and adjust the test. Cc: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-6-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: remove redundant ASSERT_RTNL() in queue setup functionsStanislav Fomichev
The existing netdev_ops_assert_locked() already asserts that either the RTNL lock or the per-device lock is held, making the explicit ASSERT_RTNL() redundant. Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-5-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18udp_tunnel: remove rtnl_lock dependencyStanislav Fomichev
Drivers that are using ops lock and don't depend on RTNL lock still need to manage it because udp_tunnel's RTNL dependency. Introduce new udp_tunnel_nic_lock and use it instead of rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to grab udp_tunnel_nic_lock mutex and might sleep). Cover more places in v4: - netlink - udp_tunnel_notify_add_rx_port (ndo_open) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_notify_del_rx_port (ndo_stop) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_get_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - udp_tunnel_drop_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_DROP_INFO - udp_tunnel_nic_reset_ntf (ndo_open) - notifiers - udp_tunnel_nic_netdevice_event, depending on the event: - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - triggers NETDEV_UDP_TUNNEL_DROP_INFO - ethnl_tunnel_info_reply_size - udp_tunnel_nic_set_port_priv (two intel drivers) Cc: Michael Chan <michael.chan@broadcom.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-4-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18vxlan: drop sock_lockStanislav Fomichev
We won't be able to sleep soon in vxlan_offload_rx_ports and won't be able to grab sock_lock. Instead of having separate spinlock to manage sockets, rely on rtnl lock. This is similar to how geneve manages its sockets. Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250616162117.287806-3-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18geneve: rely on rtnl lock in geneve_offload_rx_portsStanislav Fomichev
udp_tunnel_push_rx_port will grab mutex in the next patch so we can't use rcu. geneve_offload_rx_ports is called from geneve_netdevice_event for NETDEV_UDP_TUNNEL_PUSH_INFO and NETDEV_UDP_TUNNEL_DROP_INFO which both have ASSERT_RTNL. Entries are added to and removed from the sock_list under rtnl lock as well (when adding or removing a tunneling device). Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250616162117.287806-2-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18Merge branch 'net-fix-passive-tfo-socket-having-invalid-napi-id'Jakub Kicinski
David Wei says: ==================== net: fix passive TFO socket having invalid NAPI ID Found a bug where an accepted passive TFO socket returns an invalid NAPI ID (i.e. 0) from SO_INCOMING_NAPI_ID. Add a selftest for this using netdevsim and fix the bug. Patch 1 is a drive-by fix for the lib.sh include in an existing drivers/net/netdevsim/peer.sh selftest. Patch 2 adds a test binary for a simple TFO server/client. Patch 3 adds a selftest for checking that the NAPI ID of a passive TFO socket is valid. This will currently fail. Patch 4 adds a fix for the bug. ==================== Link: https://patch.msgid.link/20250617212102.175711-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18tcp: fix passive TFO socket having invalid NAPI IDDavid Wei
There is a bug with passive TFO sockets returning an invalid NAPI ID 0 from SO_INCOMING_NAPI_ID. Normally this is not an issue, but zero copy receive relies on a correct NAPI ID to process sockets on the right queue. Fix by adding a sk_mark_napi_id_set(). Fixes: e5907459ce7e ("tcp: Record Rx hash and NAPI ID in tcp_child_process") Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250617212102.175711-5-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>