summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-15Bluetooth: btusb: use skb_pull to avoid unsafe access in QCA dump handlingEn-Wei Wu
Use skb_pull() and skb_pull_data() to safely parse QCA dump packets. This avoids direct pointer math on skb->data, which could lead to invalid access if the packet is shorter than expected. Fixes: 20981ce2d5a5 ("Bluetooth: btusb: Add WCN6855 devcoredump support") Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-05-15Bluetooth: L2CAP: Fix not checking l2cap_chan security levelLuiz Augusto von Dentz
l2cap_check_enc_key_size shall check the security level of the l2cap_chan rather than the hci_conn since for incoming connection request that may be different as hci_conn may already been encrypted using a different security level. Fixes: 522e9ed157e3 ("Bluetooth: l2cap: Check encryption key size on incoming connection") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-05-15net: devmem: fix kernel panic when netlink socket close after module unloadTaehee Yoo
Kernel panic occurs when a devmem TCP socket is closed after NIC module is unloaded. This is Devmem TCP unregistration scenarios. number is an order. (a)netlink socket close (b)pp destroy (c)uninstall result 1 2 3 OK 1 3 2 (d)Impossible 2 1 3 OK 3 1 2 (e)Kernel panic 2 3 1 (d)Impossible 3 2 1 (d)Impossible (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is closed. (b) page_pool_destroy() is called when the interface is down. (c) mp_ops->uninstall() is called when an interface is unregistered. (d) There is no scenario in mp_ops->uninstall() is called before page_pool_destroy(). Because unregister_netdevice_many_notify() closes interfaces first and then calls mp_ops->uninstall(). (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire netdev_lock(). But if the interface module has already been removed, net_device pointer is invalid, so it causes kernel panic. In summary, there are only 3 possible scenarios. A. sk close -> pp destroy -> uninstall. B. pp destroy -> sk close -> uninstall. C. pp destroy -> uninstall -> sk close. Case C is a kernel panic scenario. In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set binding->dev to NULL. It indicates an bound net_device was unregistered. It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock() if binding->dev is NULL. A new binding->lock is added to protect a dev of a binding. So, lock ordering is like below. priv->lock netdev_lock(dev) binding->lock Tests: Scenario A: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 kill $pid ip link set $interface down modprobe -rv $module Scenario B: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 ip link set $interface down kill $pid modprobe -rv $module Scenario C: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 modprobe -rv $module sleep 5 kill $pid Splat looks like: Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf] CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G B W 6.15.0-rc1+ #2 PREEMPT(undef) 0947ec89efa0fd68838b78e36aa1617e97ff5d7f Tainted: [B]=BAD_PAGE, [W]=WARN RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746) Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811 RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4 R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88 FS: 0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ... netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3)) genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705) ... netlink_release (net/netlink/af_netlink.c:737) ... __sock_release (net/socket.c:647) sock_close (net/socket.c:1393) Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250514154028.1062909-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tsnep: fix timestamping with a stacked DSA driverGerhard Engleder
This driver is susceptible to a form of the bug explained in commit c26a2c2ddc01 ("gianfar: Fix TX timestamping with a stacked DSA driver") and in Documentation/networking/timestamping.rst section "Other caveats for MAC drivers", specifically it timestamps any skb which has SKBTX_HW_TSTAMP, and does not consider if timestamping has been enabled in adapter->hwtstamp_config.tx_type. Evaluate the proper TX timestamping condition only once on the TX path (in tsnep_xmit_frame_ring()) and store the result in an additional TX entry flag. Evaluate the new TX entry flag in the TX confirmation path (in tsnep_tx_poll()). This way SKBTX_IN_PROGRESS is set by the driver as required, but never evaluated. SKBTX_IN_PROGRESS shall not be evaluated as it can be set by a stacked DSA driver and evaluating it would lead to unwanted timestamps. Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250514195657.25874-1-gerhard@engleder-embedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15net/tls: fix kernel panic when alloc_page failedPengtao He
We cannot set frag_list to NULL pointer when alloc_page failed. It will be used in tls_strp_check_queue_ok when the next time tls_strp_read_sock is called. This is because we don't reset full_len in tls_strp_flush_anchor_copy() so the recv path will try to continue handling the partial record on the next call but we dettached the rcvq from the frag list. Alternative fix would be to reset full_len. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028 Call trace: tls_strp_check_rcv+0x128/0x27c tls_strp_data_ready+0x34/0x44 tls_data_ready+0x3c/0x1f0 tcp_data_ready+0x9c/0xe4 tcp_data_queue+0xf6c/0x12d0 tcp_rcv_established+0x52c/0x798 Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Signed-off-by: Pengtao He <hept.hept.hept@gmail.com> Link: https://patch.msgid.link/20250514132013.17274-1-hept.hept.hept@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15Merge tag 'wireless-2025-05-15' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple of stragglers: - mac80211: fix syzbot/ubsan in scan counted-by - mt76: fix NAPI handling on driver remove - mt67: fix multicast/ipv6 receive * tag 'wireless-2025-05-15' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_request wifi: mt76: mt7925: fix missing hdr_trans_tlv command for broadcast wtbl wifi: mt76: disable napi on driver removal ==================== Link: https://patch.msgid.link/20250515121749.61912-4-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15bnxt_en: bring back rtnl_lock() in the bnxt_open() pathMichael Chan
Error recovery, PCIe AER, resume, and TX timeout will invoke bnxt_open() with netdev_lock only. This will cause RTNL assert failure in netif_set_real_num_tx_queues(), netif_set_real_num_tx_queues(), and netif_set_real_num_tx_queues(). Example error recovery assert: RTNL: assertion failed at net/core/dev.c (3178) WARNING: CPU: 3 PID: 3392 at net/core/dev.c:3178 netif_set_real_num_tx_queues+0x1fd/0x210 Call Trace: <TASK> ? __pfx_bnxt_msix+0x10/0x10 [bnxt_en] __bnxt_open_nic+0x1ef/0xb20 [bnxt_en] bnxt_open+0xda/0x130 [bnxt_en] bnxt_fw_reset_task+0x21f/0x780 [bnxt_en] process_scheduled_works+0x9d/0x400 For now, bring back rtnl_lock() in all these code paths that can invoke bnxt_open(). In the bnxt_queue_start() error path, we don't have rtnl_lock held so we just change it to call netif_close() instead of bnxt_reset_task() for simplicity. This error path is unlikely so it should be fine. Fixes: 004b5008016a ("eth: bnxt: remove most dependencies on RTNL") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250514062908.2766677-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15mlxsw: spectrum_router: Fix use-after-free when deleting GRE net devicesIdo Schimmel
The driver only offloads neighbors that are constructed on top of net devices registered by it or their uppers (which are all Ethernet). The device supports GRE encapsulation and decapsulation of forwarded traffic, but the driver will not offload dummy neighbors constructed on top of GRE net devices as they are not uppers of its net devices: # ip link add name gre1 up type gre tos inherit local 192.0.2.1 remote 198.51.100.1 # ip neigh add 0.0.0.0 lladdr 0.0.0.0 nud noarp dev gre1 $ ip neigh show dev gre1 nud noarp 0.0.0.0 lladdr 0.0.0.0 NOARP (Note that the neighbor is not marked with 'offload') When the driver is reloaded and the existing configuration is replayed, the driver does not perform the same check regarding existing neighbors and offloads the previously added one: # devlink dev reload pci/0000:01:00.0 $ ip neigh show dev gre1 nud noarp 0.0.0.0 lladdr 0.0.0.0 offload NOARP If the neighbor is later deleted, the driver will ignore the notification (given the GRE net device is not its upper) and will therefore keep referencing freed memory, resulting in a use-after-free [1] when the net device is deleted: # ip neigh del 0.0.0.0 lladdr 0.0.0.0 dev gre1 # ip link del dev gre1 Fix by skipping neighbor replay if the net device for which the replay is performed is not our upper. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_neigh_entry_update+0x1ea/0x200 Read of size 8 at addr ffff888155b0e420 by task ip/2282 [...] Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6f/0x350 print_report+0x108/0x205 kasan_report+0xdf/0x110 mlxsw_sp_neigh_entry_update+0x1ea/0x200 mlxsw_sp_router_rif_gone_sync+0x2a8/0x440 mlxsw_sp_rif_destroy+0x1e9/0x750 mlxsw_sp_netdevice_ipip_ol_event+0x3c9/0xdc0 mlxsw_sp_router_netdevice_event+0x3ac/0x15e0 notifier_call_chain+0xca/0x150 call_netdevice_notifiers_info+0x7f/0x100 unregister_netdevice_many_notify+0xc8c/0x1d90 rtnl_dellink+0x34e/0xa50 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x131/0x360 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 __sys_sendmsg+0x121/0x1b0 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: 8fdb09a7674c ("mlxsw: spectrum_router: Replay neighbours when RIF is made") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/c53c02c904fde32dad484657be3b1477884e9ad6.1747225701.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15Merge tag 'mt76-fixes-2025-05-15' of https://github.com/nbd168/wirelessJohannes Berg
Felix Fietkau says: =================== mt76 fix for 6.15 - disable napi on driver removal to fix warning - fix multicast rx regression on mt7925 =================== Link: https://patch.msgid.link/3b526d06-b717-4d47-817c-a9f47b796a31@nbd.name/ Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-15wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_requestKees Cook
Make sure that n_channels is set after allocating the struct cfg80211_registered_device::int_scan_req member. Seen with syzkaller: UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:1208:5 index 0 is out of range for type 'struct ieee80211_channel *[] __counted_by(n_channels)' (aka 'struct ieee80211_channel *[]') This was missed in the initial conversions because I failed to locate the allocation likely due to the "sizeof(void *)" not matching the "channels" array type. Reported-by: syzbot+4bcdddd48bb6f0be0da1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/680fd171.050a0220.2b69d1.045e.GAE@google.com/ Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by") Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/20250509184641.work.542-kees@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-15octeontx2-pf: Do not reallocate all ntuple filtersSubbaraya Sundeep
If ntuple filters count is modified followed by unicast filters count using devlink then the ntuple count set by user is ignored and all the ntuple filters are being reallocated. Fix this by storing the ntuple count set by user. Without this patch, say if user tries to modify ntuple count as 8 followed by ucast filter count as 4 using devlink commands then ntuple count is being reverted to default value 16 i.e, not retaining user set value 8. Fixes: 39c469188b6d ("octeontx2-pf: Add ucast filter count configurability via devlink.") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747054357-5850-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15wifi: mt76: mt7925: fix missing hdr_trans_tlv command for broadcast wtblMing Yen Hsieh
Ensure that the hdr_trans_tlv command is included in the broadcast wtbl to prevent the IPv6 and multicast packet from being dropped by the chip. Cc: stable@vger.kernel.org Fixes: cb1353ef3473 ("wifi: mt76: mt7925: integrate *mlo_sta_cmd and *sta_cmd") Reported-by: Benjamin Xiao <fossben@pm.me> Tested-by: Niklas Schnelle <niks@kernel.org> Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://lore.kernel.org/lkml/EmWnO5b-acRH1TXbGnkx41eJw654vmCR-8_xMBaPMwexCnfkvKCdlU5u19CGbaapJ3KRu-l3B-tSUhf8CCQwL0odjo6Cd5YG5lvNeB-vfdg=@pm.me/ Link: https://patch.msgid.link/20250509010421.403022-1-mingyen.hsieh@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-05-15wifi: mt76: disable napi on driver removalFedor Pchelkin
A warning on driver removal started occurring after commit 9dd05df8403b ("net: warn if NAPI instance wasn't shut down"). Disable tx napi before deleting it in mt76_dma_cleanup(). WARNING: CPU: 4 PID: 18828 at net/core/dev.c:7288 __netif_napi_del_locked+0xf0/0x100 CPU: 4 UID: 0 PID: 18828 Comm: modprobe Not tainted 6.15.0-rc4 #4 PREEMPT(lazy) Hardware name: ASUS System Product Name/PRIME X670E-PRO WIFI, BIOS 3035 09/05/2024 RIP: 0010:__netif_napi_del_locked+0xf0/0x100 Call Trace: <TASK> mt76_dma_cleanup+0x54/0x2f0 [mt76] mt7921_pci_remove+0xd5/0x190 [mt7921e] pci_device_remove+0x47/0xc0 device_release_driver_internal+0x19e/0x200 driver_detach+0x48/0x90 bus_remove_driver+0x6d/0xf0 pci_unregister_driver+0x2e/0xb0 __do_sys_delete_module.isra.0+0x197/0x2e0 do_syscall_64+0x7b/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e Tested with mt7921e but the same pattern can be actually applied to other mt76 drivers calling mt76_dma_cleanup() during removal. Tx napi is enabled in their *_dma_init() functions and only toggled off and on again inside their suspend/resume/reset paths. So it should be okay to disable tx napi in such a generic way. Found by Linux Verification Center (linuxtesting.org). Fixes: 2ac515a5d74f ("mt76: mt76x02: use napi polling for tx cleanup") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Tested-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://patch.msgid.link/20250506115540.19045-1-pchelkin@ispras.ru Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-05-14Merge branch 'hv_netvsc-fix-error-nvsp_rndis_pkt_complete-error-status-2'Jakub Kicinski
Michael Kelley says: ==================== hv_netvsc: Fix error "nvsp_rndis_pkt_complete error status: 2" Starting with commit dca5161f9bd0 in the 6.3 kernel, the Linux driver for Hyper-V synthetic networking (netvsc) occasionally reports "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates that Hyper-V has rejected a network packet transmit request from the guest, and the outgoing network packet is dropped. Higher level network protocols presumably recover and resend the packet so there is no functional error, but performance is slightly impacted. Commit dca5161f9bd0 is not the cause of the error -- it only added reporting of an error that was already happening without any notice. The error has presumably been present since the netvsc driver was originally introduced into Linux. This patch set fixes the root cause of the problem, which is that the netvsc driver in Linux may send an incorrectly formatted VMBus message to Hyper-V when transmitting the network packet. The incorrect formatting occurs when the rndis header of the VMBus message crosses a page boundary due to how the Linux skb head memory is aligned. In such a case, two PFNs are required to describe the location of the rndis header, even though they are contiguous in guest physical address (GPA) space. Hyper-V requires that two PFNs be in a single "GPA range" data struture, but current netvsc code puts each PFN in its own GPA range, which Hyper-V rejects as an error in the case of the rndis header. The incorrect formatting occurs only for larger packets that netvsc must transmit via a VMBus "GPA Direct" message. There's no problem when netvsc transmits a smaller packet by copying it into a pre- allocated send buffer slot because the pre-allocated slots don't have page crossing issues. After commit 14ad6ed30a10 in the 6.14 kernel, the error occurs much more frequently in VMs with 16 or more vCPUs. It may occur every few seconds, or even more frequently, in a ssh session that outputs a lot of text. Commit 14ad6ed30a10 subtly changes how skb head memory is allocated, making it much more likely that the rndis header will cross a page boundary when the vCPU count is 16 or more. The changes in commit 14ad6ed30a10 are perfectly valid -- they just had the side effect of making the netvsc bug more prominent. One fix is to check for adjacent PFNs in vmbus_sendpacket_pagebuffer() and just combine them into a single GPA range. Such a fix is very contained. But conceptually it is fixing the problem at the wrong level. So this patch set takes the broader approach of maintaining the already known grouping of contiguous PFNs at a higher level in the netvsc driver code, and propagating that grouping down to the creation of the VMBus message to send to Hyper-V. Maintaining the grouping fixes this problem, and has the added benefit of allowing netvsc_dma_map() to make fewer calls to dma_map_single() to do bounce buffering in CoCo VMs. Patch 1 is a preparatory change to allow vmbus_sendpacket_mpb_desc() to specify multiple GPA ranges. In current code vmbus_sendpacket_mpb_desc() is used only by the storvsc synthetic SCSI driver, and it always creates a single GPA range. Patch 2 updates the netvsc driver to use vmbus_sendpacket_mpb_desc() instead of vmbus_sendpacket_pagebuffer(). Because the higher levels of netvsc still don't group contiguous PFNs, this patch is functionally neutral. The VMBus message to Hyper-V still has many GPA ranges, each with a single PFN. But it lays the groundwork for the next patch. Patch 3 changes the higher levels of netvsc to preserve the already known grouping of contiguous PFNs. When the contiguous groupings are passed to vmbus_sendpacket_mpb_desc(), GPA ranges containing multiple PFNs are produced, as expected by Hyper-V. This is point at which the core problem is fixed. Patches 4 and 5 remove code that is no longer necessary after the previous patches. These changes provide a net reduction of about 65 lines of code, which is an added benefit. These changes have been tested in normal VMs, in SEV-SNP and TDX CoCo VMs, and in Dv6-series VMs where the netvsp implementation is in the OpenHCL paravisor instead of the Hyper-V host. These changes are built against kernel version 6.15-rc6. [1] https://bugzilla.kernel.org/show_bug.cgi?id=217503 ==================== Link: https://patch.msgid.link/20250513000604.1396-1-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()Michael Kelley
With the netvsc driver changed to use vmbus_sendpacket_mpb_desc() instead of vmbus_sendpacket_pagebuffer(), the latter has no remaining callers. Remove it. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-6-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14hv_netvsc: Remove rmsg_pgcntMichael Kelley
init_page_array() now always creates a single page buffer array entry for the rndis message, even if the rndis message crosses a page boundary. As such, the number of page buffer array entries used for the rndis message must no longer be tracked -- it is always just 1. Remove the rmsg_pgcnt field and use "1" where the value is needed. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-5-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14hv_netvsc: Preserve contiguous PFN grouping in the page buffer arrayMichael Kelley
Starting with commit dca5161f9bd0 ("hv_netvsc: Check status in SEND_RNDIS_PKT completion message") in the 6.3 kernel, the Linux driver for Hyper-V synthetic networking (netvsc) occasionally reports "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates that Hyper-V has rejected a network packet transmit request from the guest, and the outgoing network packet is dropped. Higher level network protocols presumably recover and resend the packet so there is no functional error, but performance is slightly impacted. Commit dca5161f9bd0 is not the cause of the error -- it only added reporting of an error that was already happening without any notice. The error has presumably been present since the netvsc driver was originally introduced into Linux. The root cause of the problem is that the netvsc driver in Linux may send an incorrectly formatted VMBus message to Hyper-V when transmitting the network packet. The incorrect formatting occurs when the rndis header of the VMBus message crosses a page boundary due to how the Linux skb head memory is aligned. In such a case, two PFNs are required to describe the location of the rndis header, even though they are contiguous in guest physical address (GPA) space. Hyper-V requires that two rndis header PFNs be in a single "GPA range" data struture, but current netvsc code puts each PFN in its own GPA range, which Hyper-V rejects as an error. The incorrect formatting occurs only for larger packets that netvsc must transmit via a VMBus "GPA Direct" message. There's no problem when netvsc transmits a smaller packet by copying it into a pre- allocated send buffer slot because the pre-allocated slots don't have page crossing issues. After commit 14ad6ed30a10 ("net: allow small head cache usage with large MAX_SKB_FRAGS values") in the 6.14-rc4 kernel, the error occurs much more frequently in VMs with 16 or more vCPUs. It may occur every few seconds, or even more frequently, in an ssh session that outputs a lot of text. Commit 14ad6ed30a10 subtly changes how skb head memory is allocated, making it much more likely that the rndis header will cross a page boundary when the vCPU count is 16 or more. The changes in commit 14ad6ed30a10 are perfectly valid -- they just had the side effect of making the netvsc bug more prominent. Current code in init_page_array() creates a separate page buffer array entry for each PFN required to identify the data to be transmitted. Contiguous PFNs get separate entries in the page buffer array, and any information about contiguity is lost. Fix the core issue by having init_page_array() construct the page buffer array to represent contiguous ranges rather than individual pages. When these ranges are subsequently passed to netvsc_build_mpb_array(), it can build GPA ranges that contain multiple PFNs, as required to avoid the error "nvsp_rndis_pkt_complete error status: 2". If instead the network packet is sent by copying into a pre-allocated send buffer slot, the copy proceeds using the contiguous ranges rather than individual pages, but the result of the copying is the same. Also fix rndis_filter_send_request() to construct a contiguous range, since it has its own page buffer array. This change has a side benefit in CoCo VMs in that netvsc_dma_map() calls dma_map_single() on each contiguous range instead of on each page. This results in fewer calls to dma_map_single() but on larger chunks of memory, which should reduce contention on the swiotlb. Since the page buffer array now contains one entry for each contiguous range instead of for each individual page, the number of entries in the array can be reduced, saving 208 bytes of stack space in netvsc_xmit() when MAX_SKG_FRAGS has the default value of 17. [1] https://bugzilla.kernel.org/show_bug.cgi?id=217503 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217503 Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-4-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messagesMichael Kelley
netvsc currently uses vmbus_sendpacket_pagebuffer() to send VMBus messages. This function creates a series of GPA ranges, each of which contains a single PFN. However, if the rndis header in the VMBus message crosses a page boundary, the netvsc protocol with the host requires that both PFNs for the rndis header must be in a single "GPA range" data structure, which isn't possible with vmbus_sendpacket_pagebuffer(). As the first step in fixing this, add a new function netvsc_build_mpb_array() to build a VMBus message with multiple GPA ranges, each of which may contain multiple PFNs. Use vmbus_sendpacket_mpb_desc() to send this VMBus message to the host. There's no functional change since higher levels of netvsc don't maintain or propagate knowledge of contiguous PFNs. Based on its input, netvsc_build_mpb_array() still produces a separate GPA range for each PFN and the behavior is the same as with vmbus_sendpacket_pagebuffer(). But the groundwork is laid for a subsequent patch to provide the necessary grouping. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-3-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple rangesMichael Kelley
vmbus_sendpacket_mpb_desc() is currently used only by the storvsc driver and is hardcoded to create a single GPA range. To allow it to also be used by the netvsc driver to create multiple GPA ranges, no longer hardcode as having a single GPA range. Allow the calling driver to specify the rangecount in the supplied descriptor. Update the storvsc driver to reflect this new approach. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-2-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14octeontx2-af: Fix CGX Receive countersHariprasad Kelam
Each CGX block supports 4 logical MACs (LMACS). Receive counters CGX_CMR_RX_STAT0-8 are per LMAC and CGX_CMR_RX_STAT9-12 are per CGX. Due a bug in previous patch, stale Per CGX counters values observed. Fixes: 66208910e57a ("octeontx2-af: Support to retrieve CGX LMAC stats") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20250513071554.728922-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14net: ethernet: mtk_eth_soc: fix typo for declaration MT7988 ESW capabilityBo-Cun Chen
Since MTK_ESW_BIT is a bit number rather than a bitmap, it causes MTK_HAS_CAPS to produce incorrect results. This leads to the ETH driver not declaring MAC capabilities correctly for the MT7988 ESW. Fixes: 445eb6448ed3 ("net: ethernet: mtk_eth_soc: add basic support for MT7988 SoC") Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/b8b37f409d1280fad9c4d32521e6207f63cd3213.1747110258.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14net: libwx: Fix FW mailbox unknown commandJiawen Wu
For the new SW-FW interaction, missing the error return if there is an unknown command. It causes the driver to mistakenly believe that the interaction is complete. This problem occurs when new driver is paired with old firmware, which does not support the new mailbox commands. Fixes: 2e5af6b2ae85 ("net: txgbe: Add basic support for new AML devices") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/64DBB705D35A0016+20250513021009.145708-4-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14net: libwx: Fix FW mailbox reply timeoutJiawen Wu
For the new SW-FW interaction, the timeout waiting for the firmware to return is too short. So that some mailbox commands cannot be completed. Use the 'timeout' parameter instead of fixed timeout value for flexible configuration. Fixes: 2e5af6b2ae85 ("net: txgbe: Add basic support for new AML devices") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/5D5BDE3EA501BDB8+20250513021009.145708-3-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14net: txgbe: Fix to calculate EEPROM checksum for AML devicesJiawen Wu
In the new firmware version, the shadow ram reserves some space to store I2C information, so the checksum calculation needs to skip this section. Otherwise, the driver will fail to probe because the invalid EEPROM checksum. Fixes: 2e5af6b2ae85 ("net: txgbe: Add basic support for new AML devices") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1C6BF7A937237F5A+20250513021009.145708-2-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14octeontx2-pf: macsec: Fix incorrect max transmit size in TX secySubbaraya Sundeep
MASCEC hardware block has a field called maximum transmit size for TX secy. Max packet size going out of MCS block has be programmed taking into account full packet size which has L2 header,SecTag and ICV. MACSEC offload driver is configuring max transmit size as macsec interface MTU which is incorrect. Say with 1500 MTU of real device, macsec interface created on top of real device will have MTU of 1468(1500 - (SecTag + ICV)). This is causing packets from macsec interface of size greater than or equal to 1468 are not getting transmitted out because driver programmed max transmit size as 1468 instead of 1514(1500 + ETH_HDR_LEN). Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747053756-4529-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14netlink: specs: tc: all actions are indexed arraysJakub Kicinski
Some TC filters have actions listed as indexed arrays of nests and some as just nests. They are all indexed arrays, the handling is common across filters. Fixes: 2267672a6190 ("doc/netlink/specs: Update the tc spec") Link: https://patch.msgid.link/20250513221638.842532-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14netlink: specs: tc: fix a couple of attribute namesJakub Kicinski
Fix up spelling of two attribute names. These are clearly typoes and will prevent C codegen from working. Let's treat this as a fix to get the correction into users' hands ASAP, and prevent anyone depending on the wrong names. Fixes: a1bcfde83669 ("doc/netlink/specs: Add a spec for tc") Link: https://patch.msgid.link/20250513221316.841700-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14octeontx2-pf: Fix ethtool support for SDP representorsHariprasad Kelam
The hardware supports multiple MAC types, including RPM, SDP, and LBK. However, features such as link settings and pause frames are only available on RPM MAC, and not supported on SDP or LBK. This patch updates the ethtool operations logic accordingly to reflect this behavior. Fixes: 2f7f33a09516 ("octeontx2-pf: Add representors for sdp MAC") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-13qlcnic: fix memory leak in qlcnic_sriov_channel_cfg_cmd()Abdun Nihaal
In one of the error paths in qlcnic_sriov_channel_cfg_cmd(), the memory allocated in qlcnic_sriov_alloc_bc_mbx_args() for mailbox arguments is not freed. Fix that by jumping to the error path that frees them, by calling qlcnic_free_mbx_args(). This was found using static analysis. Fixes: f197a7aa6288 ("qlcnic: VF-PF communication channel implementation") Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250512044829.36400-1-abdun.nihaal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13docs: networking: timestamping: improve stacked PHC sentenceVladimir Oltean
The first paragraph makes no grammatical sense. I suppose a portion of the intended sentece is missing: "[The challenge with ] stacked PHCs (...) is that they uncover bugs". Rephrase, and at the same time simplify the structure of the sentence a little bit, it is not easy to follow. Fixes: 94d9f78f4d64 ("docs: networking: timestamping: add section for stacked PHC devices") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://patch.msgid.link/20250512131751.320283-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net/mlx5e: Disable MACsec offload for uplink representor profileCarolina Jubran
MACsec offload is not supported in switchdev mode for uplink representors. When switching to the uplink representor profile, the MACsec offload feature must be cleared from the netdevice's features. If left enabled, attempts to add offloads result in a null pointer dereference, as the uplink representor does not support MACsec offload even though the feature bit remains set. Clear NETIF_F_HW_MACSEC in mlx5e_fix_uplink_rep_features(). Kernel log: Oops: general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f] CPU: 29 UID: 0 PID: 4714 Comm: ip Not tainted 6.14.0-rc4_for_upstream_debug_2025_03_02_17_35 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__mutex_lock+0x128/0x1dd0 Code: d0 7c 08 84 d2 0f 85 ad 15 00 00 8b 35 91 5c fe 03 85 f6 75 29 49 8d 7e 60 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 a6 15 00 00 4d 3b 76 60 0f 85 fd 0b 00 00 65 ff RSP: 0018:ffff888147a4f160 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000078 RBP: ffff888147a4f2e0 R08: ffffffffa05d2c19 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000018 R15: ffff888152de0000 FS: 00007f855e27d800(0000) GS:ffff88881ee80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004e5768 CR3: 000000013ae7c005 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 Call Trace: <TASK> ? die_addr+0x3d/0xa0 ? exc_general_protection+0x144/0x220 ? asm_exc_general_protection+0x22/0x30 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? __mutex_lock+0x128/0x1dd0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mutex_lock_io_nested+0x1ae0/0x1ae0 ? lock_acquire+0x1c2/0x530 ? macsec_upd_offload+0x145/0x380 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? kasan_save_stack+0x30/0x40 ? kasan_save_stack+0x20/0x40 ? kasan_save_track+0x10/0x30 ? __kasan_kmalloc+0x77/0x90 ? __kmalloc_noprof+0x249/0x6b0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0xb5/0x240 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mlx5e_macsec_add_rxsa+0x11a0/0x11a0 [mlx5_core] macsec_update_offload+0x26c/0x820 ? macsec_set_mac_address+0x4b0/0x4b0 ? lockdep_hardirqs_on_prepare+0x284/0x400 ? _raw_spin_unlock_irqrestore+0x47/0x50 macsec_upd_offload+0x2c8/0x380 ? macsec_update_offload+0x820/0x820 ? __nla_parse+0x22/0x30 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x15e/0x240 genl_family_rcv_msg_doit+0x1cc/0x2a0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? cap_capable+0xd4/0x330 genl_rcv_msg+0x3ea/0x670 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? macsec_update_offload+0x820/0x820 netlink_rcv_skb+0x12b/0x390 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? netlink_ack+0xd80/0xd80 ? rwsem_down_read_slowpath+0xf90/0xf90 ? netlink_deliver_tap+0xcd/0xac0 ? netlink_deliver_tap+0x155/0xac0 ? _copy_from_iter+0x1bb/0x12c0 genl_rcv+0x24/0x40 netlink_unicast+0x440/0x700 ? netlink_attachskb+0x760/0x760 ? lock_acquire+0x1c2/0x530 ? __might_fault+0xbb/0x170 netlink_sendmsg+0x749/0xc10 ? netlink_unicast+0x700/0x700 ? __might_fault+0xbb/0x170 ? netlink_unicast+0x700/0x700 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x53f/0x760 ? import_iovec+0x7/0x10 ? kernel_sendmsg+0x30/0x30 ? __copy_msghdr+0x3c0/0x3c0 ? filter_irq_stacks+0x90/0x90 ? stack_depot_save_flags+0x28/0xa30 ___sys_sendmsg+0xeb/0x170 ? kasan_save_stack+0x30/0x40 ? copy_msghdr_from_user+0x110/0x110 ? do_syscall_64+0x6d/0x140 ? lock_acquire+0x1c2/0x530 ? __virt_addr_valid+0x116/0x3b0 ? __virt_addr_valid+0x1da/0x3b0 ? lock_downgrade+0x680/0x680 ? __delete_object+0x21/0x50 __sys_sendmsg+0xf7/0x180 ? __sys_sendmsg_sock+0x20/0x20 ? kmem_cache_free+0x14c/0x4e0 ? __x64_sys_close+0x78/0xd0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f855e113367 Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffd15e90c88 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f855e113367 RDX: 0000000000000000 RSI: 00007ffd15e90cf0 RDI: 0000000000000004 RBP: 00007ffd15e90dbc R08: 0000000000000028 R09: 000000000045d100 R10: 00007f855e011dd8 R11: 0000000000000246 R12: 0000000000000019 R13: 0000000067c6b785 R14: 00000000004a1e80 R15: 0000000000000000 </TASK> Modules linked in: 8021q garp mrp sch_ingress openvswitch nsh mlx5_ib mlx5_fwctl mlx5_dpll mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]--- Fixes: 8ff0ac5be144 ("net/mlx5: Add MACsec offload Tx command support") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1746958552-561295-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13vsock/test: Fix occasional failure in SIOCOUTQ testsKonstantin Shkolnyy
These tests: "SOCK_STREAM ioctl(SIOCOUTQ) 0 unsent bytes" "SOCK_SEQPACKET ioctl(SIOCOUTQ) 0 unsent bytes" output: "Unexpected 'SIOCOUTQ' value, expected 0, got 64 (CLIENT)". They test that the SIOCOUTQ ioctl reports 0 unsent bytes after the data have been received by the other side. However, sometimes there is a delay in updating this "unsent bytes" counter, and the test fails even though the counter properly goes to 0 several milliseconds later. The delay occurs in the kernel because the used buffer notification callback virtio_vsock_tx_done(), called upon receipt of the data by the other side, doesn't update the counter itself. It delegates that to a kernel thread (via vsock->tx_work). Sometimes that thread is delayed more than the test expects. Change the test to poll SIOCOUTQ until it returns 0 or a timeout occurs. Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Fixes: 18ee44ce97c1 ("test/vsock: add ioctl unsent bytes test") Link: https://patch.msgid.link/20250507151456.2577061-1-kshk@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13tools: ynl-gen: Allow multi-attr without nested-attributes againLukas Wunner
Since commit ce6cb8113c84 ("tools: ynl-gen: individually free previous values on double set"), specifying the "multi-attr" property raises an error unless the "nested-attributes" property is specified as well: File "tools/net/ynl/./pyynl/ynl_gen_c.py", line 1147, in _load_nested_sets child = self.pure_nested_structs.get(nested) ^^^^^^ UnboundLocalError: cannot access local variable 'nested' where it is not associated with a value This appears to be a bug since there are existing specs which omit "nested-attributes" on "multi-attr" attributes. Also, according to Documentation/userspace-api/netlink/specs.rst, multi-attr "is the recommended way of implementing arrays (no extra nesting)", suggesting that nesting should even be avoided in favor of multi-attr. Fix the indentation of the if-block introduced by the commit to avoid the error. Fixes: ce6cb8113c84 ("tools: ynl-gen: individually free previous values on double set") Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/d6b58684b7e5bfb628f7313e6893d0097904e1d1.1746940107.git.lukas@wunner.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net: dsa: b53: prevent standalone from trying to forward to other portsJonas Gorski
When bridged ports and standalone ports share a VLAN, e.g. via VLAN uppers, or untagged traffic with a vlan unaware bridge, the ASIC will still try to forward traffic to known FDB entries on standalone ports. But since the port VLAN masks prevent forwarding to bridged ports, this traffic will be dropped. This e.g. can be observed in the bridge_vlan_unaware ping tests, where this breaks pinging with learning on. Work around this by enabling the simplified EAP mode on switches supporting it for standalone ports, which causes the ASIC to redirect traffic of unknown source MAC addresses to the CPU port. Since standalone ports do not learn, there are no known source MAC addresses, so effectively this redirects all incoming traffic to the CPU port. Fixes: ff39c2d68679 ("net: dsa: b53: Add bridge support") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20250508091424.26870-1-jonas.gorski@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-13Merge branch 'address-eee-regressions-on-ksz-switches-since-v6-9-v6-14'Paolo Abeni
Oleksij Rempel says: ==================== address EEE regressions on KSZ switches since v6.9 (v6.14+) This patch series addresses a regression in Energy Efficient Ethernet (EEE) handling for KSZ switches with integrated PHYs, introduced in kernel v6.9 by commit fe0d4fd9285e ("net: phy: Keep track of EEE configuration"). The first patch updates the DSA driver to allow phylink to properly manage PHY EEE configuration. Since integrated PHYs handle LPI internally and ports without integrated PHYs do not document MAC-level LPI support, dummy MAC LPI callbacks are provided. The second patch removes outdated EEE workarounds from the micrel PHY driver, as they are no longer needed with correct phylink handling. This series addresses the regression for mainline and kernels starting from v6.14. It is not easily possible to fully fix older kernels due to missing infrastructure changes. Tested on KSZ9893 hardware. ==================== Link: https://patch.msgid.link/20250504081434.424489-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-13net: phy: micrel: remove KSZ9477 EEE quirks now handled by phylinkOleksij Rempel
The KSZ9477 PHY driver contained workarounds for broken EEE capability advertisements by manually masking supported EEE modes and forcibly disabling EEE if MICREL_NO_EEE was set. With proper MAC-side EEE handling implemented via phylink, these quirks are no longer necessary. Remove MICREL_NO_EEE handling and the use of ksz9477_get_features(). This simplifies the PHY driver and avoids duplicated EEE management logic. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org # v6.14+ Link: https://patch.msgid.link/20250504081434.424489-3-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-13net: dsa: microchip: let phylink manage PHY EEE configuration on KSZ switchesOleksij Rempel
Phylink expects MAC drivers to provide LPI callbacks to properly manage Energy Efficient Ethernet (EEE) configuration. On KSZ switches with integrated PHYs, LPI is internally handled by hardware, while ports without integrated PHYs have no documented MAC-level LPI support. Provide dummy mac_disable_tx_lpi() and mac_enable_tx_lpi() callbacks to satisfy phylink requirements. Also, set default EEE capabilities during phylink initialization where applicable. Since phylink can now gracefully handle optional EEE configuration, remove the need for the MICREL_NO_EEE PHY flag. This change addresses issues caused by incomplete EEE refactoring introduced in commit fe0d4fd9285e ("net: phy: Keep track of EEE configuration"). It is not easily possible to fix all older kernels, but this patch ensures proper behavior on latest kernels and can be considered for backporting to stable kernels starting from v6.14. Fixes: fe0d4fd9285e ("net: phy: Keep track of EEE configuration") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org # v6.14+ Link: https://patch.msgid.link/20250504081434.424489-2-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-12net: dsa: sja1105: discard incoming frames in BR_STATE_LISTENINGVladimir Oltean
It has been reported that when under a bridge with stp_state=1, the logs get spammed with this message: [ 251.734607] fsl_dpaa2_eth dpni.5 eth0: Couldn't decode source port Further debugging shows the following info associated with packets: source_port=-1, switch_id=-1, vid=-1, vbid=1 In other words, they are data plane packets which are supposed to be decoded by dsa_tag_8021q_find_port_by_vbid(), but the latter (correctly) refuses to do so, because no switch port is currently in BR_STATE_LEARNING or BR_STATE_FORWARDING - so the packet is effectively unexpected. The error goes away after the port progresses to BR_STATE_LEARNING in 15 seconds (the default forward_time of the bridge), because then, dsa_tag_8021q_find_port_by_vbid() can correctly associate the data plane packets with a plausible bridge port in a plausible STP state. Re-reading IEEE 802.1D-1990, I see the following: "4.4.2 Learning: (...) The Forwarding Process shall discard received frames." IEEE 802.1D-2004 further clarifies: "DISABLED, BLOCKING, LISTENING, and BROKEN all correspond to the DISCARDING port state. While those dot1dStpPortStates serve to distinguish reasons for discarding frames, the operation of the Forwarding and Learning processes is the same for all of them. (...) LISTENING represents a port that the spanning tree algorithm has selected to be part of the active topology (computing a Root Port or Designated Port role) but is temporarily discarding frames to guard against loops or incorrect learning." Well, this is not what the driver does - instead it sets mac[port].ingress = true. To get rid of the log spam, prevent unexpected data plane packets to be received by software by discarding them on ingress in the LISTENING state. In terms of blame attribution: the prints only date back to commit d7f9787a763f ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID"). However, the settings would permit a LISTENING port to forward to a FORWARDING port, and the standard suggests that's not OK. Fixes: 640f763f98c2 ("net: dsa: sja1105: Add support for Spanning Tree Protocol") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250509113816.2221992-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: Lock lower level devices when updating featuresCosmin Ratiu
__netdev_update_features() expects the netdevice to be ops-locked, but it gets called recursively on the lower level netdevices to sync their features, and nothing locks those. This commit fixes that, with the assumption that it shouldn't be possible for both higher-level and lover-level netdevices to require the instance lock, because that would lead to lock dependency warnings. Without this, playing with higher level (e.g. vxlan) netdevices on top of netdevices with instance locking enabled can run into issues: WARNING: CPU: 59 PID: 206496 at ./include/net/netdev_lock.h:17 netif_napi_add_weight_locked+0x753/0xa60 [...] Call Trace: <TASK> mlx5e_open_channel+0xc09/0x3740 [mlx5_core] mlx5e_open_channels+0x1f0/0x770 [mlx5_core] mlx5e_safe_switch_params+0x1b5/0x2e0 [mlx5_core] set_feature_lro+0x1c2/0x330 [mlx5_core] mlx5e_handle_feature+0xc8/0x140 [mlx5_core] mlx5e_set_features+0x233/0x2e0 [mlx5_core] __netdev_update_features+0x5be/0x1670 __netdev_update_features+0x71f/0x1670 dev_ethtool+0x21c5/0x4aa0 dev_ioctl+0x438/0xae0 sock_ioctl+0x2ba/0x690 __x64_sys_ioctl+0xa78/0x1700 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250509072850.2002821-1-cratiu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: cadence: macb: Fix a possible deadlock in macb_halt_tx.Mathieu Othacehe
There is a situation where after THALT is set high, TGO stays high as well. Because jiffies are never updated, as we are in a context with interrupts disabled, we never exit that loop and have a deadlock. That deadlock was noticed on a sama5d4 device that stayed locked for days. Use retries instead of jiffies so that the timeout really works and we do not have a deadlock anymore. Fixes: e86cd53afc590 ("net/macb: better manage tx errors") Signed-off-by: Mathieu Othacehe <othacehe@gnu.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250509121935.16282-1-othacehe@gnu.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09Merge tag 'batadv-net-pullrequest-20250509' of ↵Jakub Kicinski
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - fix duplicate MAC address check, by Matthias Schiffer * tag 'batadv-net-pullrequest-20250509' of git://git.open-mesh.org/linux-merge: batman-adv: fix duplicate MAC address check ==================== Link: https://patch.msgid.link/20250509090240.107796-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09net: mctp: Ensure keys maintain only one ref to corresponding devAndrew Jeffery
mctp_flow_prepare_output() is called in mctp_route_output(), which places outbound packets onto a given interface. The packet may represent a message fragment, in which case we provoke an unbalanced reference count to the underlying device. This causes trouble if we ever attempt to remove the interface: [ 48.702195] usb 1-1: USB disconnect, device number 2 [ 58.883056] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2 [ 69.022548] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2 [ 79.172568] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2 ... Predicate the invocation of mctp_dev_set_key() in mctp_flow_prepare_output() on not already having associated the device with the key. It's not yet realistic to uphold the property that the key maintains only one device reference earlier in the transmission sequence as the route (and therefore the device) may not be known at the time the key is associated with the socket. Fixes: 67737c457281 ("mctp: Pass flow data & flow release events to drivers") Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au> Link: https://patch.msgid.link/20250508-mctp-dev-refcount-v1-1-d4f965c67bb5@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09tests/ncdevmem: Fix double-free of queue arrayCosmin Ratiu
netdev_bind_rx takes ownership of the queue array passed as parameter and frees it, so a queue array buffer cannot be reused across multiple netdev_bind_rx calls. This commit fixes that by always passing in a newly created queue array to all netdev_bind_rx calls in ncdevmem. Fixes: 85585b4bc8d8 ("selftests: add ncdevmem, netcat for devmem TCP") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250508084434.1933069-1-cratiu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09net: mctp: Don't access ifa_index when missingMatt Johnston
In mctp_dump_addrinfo, ifa_index can be used to filter interfaces, but only when the struct ifaddrmsg is provided. Otherwise it will be comparing to uninitialised memory - reproducible in the syzkaller case from dhcpd, or busybox "ip addr show". The kernel MCTP implementation has always filtered by ifa_index, so existing userspace programs expecting to dump MCTP addresses must already be passing a valid ifa_index value (either 0 or a real index). BUG: KMSAN: uninit-value in mctp_dump_addrinfo+0x208/0xac0 net/mctp/device.c:128 mctp_dump_addrinfo+0x208/0xac0 net/mctp/device.c:128 rtnl_dump_all+0x3ec/0x5b0 net/core/rtnetlink.c:4380 rtnl_dumpit+0xd5/0x2f0 net/core/rtnetlink.c:6824 netlink_dump+0x97b/0x1690 net/netlink/af_netlink.c:2309 Fixes: 583be982d934 ("mctp: Add device handling and netlink interface") Reported-by: syzbot+e76d52dadc089b9d197f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68135815.050a0220.3a872c.000e.GAE@google.com/ Reported-by: syzbot+1065a199625a388fce60@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/681357d6.050a0220.14dd7d.000d.GAE@google.com/ Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Link: https://patch.msgid.link/20250508-mctp-addr-dump-v2-1-c8a53fd2dd66@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09tools/net/ynl: ethtool: fix crash when Hardware Clock info is missingHangbin Liu
Fix a crash in the ethtool YNL implementation when Hardware Clock information is not present in the response. This ensures graceful handling of devices or drivers that do not provide this optional field. e.g. Traceback (most recent call last): File "/net/tools/net/ynl/pyynl/./ethtool.py", line 438, in <module> main() ~~~~^^ File "/net/tools/net/ynl/pyynl/./ethtool.py", line 341, in main print(f'PTP Hardware Clock: {tsinfo["phc-index"]}') ~~~~~~^^^^^^^^^^^^^ KeyError: 'phc-index' Fixes: f3d07b02b2b8 ("tools: ynl: ethtool testing tool") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250508035414.82974-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09Merge branch 'net_sched-gso_skb-flushing'David S. Miller
Cong Wang says: ==================== net_sched: Fix gso_skb flushing during qdisc change This patchset contains a bug fix and its test cases, please check each patch description for more details. To keep the bug fix minimum, I intentionally limit the code changes to the cases reported here. --- v2: added a missing qlen-- fixed the new boolean parameter for two qdiscs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-09selftests/tc-testing: Add qdisc limit trimming testsCong Wang
Added new test cases for FQ, FQ_CODEL, FQ_PIE, and HHF qdiscs to verify queue trimming behavior when the qdisc limit is dynamically reduced. Each test injects packets, reduces the qdisc limit, and checks that the new limit is enforced. This is still best effort since timing qdisc backlog is not easy. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-09net_sched: Flush gso_skb list too during ->change()Cong Wang
Previously, when reducing a qdisc's limit via the ->change() operation, only the main skb queue was trimmed, potentially leaving packets in the gso_skb list. This could result in NULL pointer dereference when we only check sch->limit against sch->q.qlen. This patch introduces a new helper, qdisc_dequeue_internal(), which ensures both the gso_skb list and the main queue are properly flushed when trimming excess packets. All relevant qdiscs (codel, fq, fq_codel, fq_pie, hhf, pie) are updated to use this helper in their ->change() routines. Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM") Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM") Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Fixes: d4b36210c2e6 ("net: pkt_sched: PIE AQM scheme") Reported-by: Will <willsroot@protonmail.com> Reported-by: Savy <savy@syst3mfailure.io> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-08net: qede: Initialize qede_ll_ops with designated initializerNathan Chancellor
After a recent change [1] in clang's randstruct implementation to randomize structures that only contain function pointers, there is an error because qede_ll_ops get randomized but does not use a designated initializer for the first member: drivers/net/ethernet/qlogic/qede/qede_main.c:206:2: error: a randomized struct can only be initialized with a designated initializer 206 | { | ^ Explicitly initialize the common member using a designated initializer to fix the build. Cc: stable@vger.kernel.org Fixes: 035f7f87b729 ("randstruct: Enable Clang support") Link: https://github.com/llvm/llvm-project/commit/04364fb888eea6db9811510607bed4b200bcb082 [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20250507-qede-fix-clang-randstruct-v1-1-5ccc15626fba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-08Merge tag 'for-net-2025-05-08' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - MGMT: Fix MGMT_OP_ADD_DEVICE invalid device flags - hci_event: Fix not using key encryption size when its known * tag 'for-net-2025-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_event: Fix not using key encryption size when its known Bluetooth: MGMT: Fix MGMT_OP_ADD_DEVICE invalid device flags ==================== Link: https://patch.msgid.link/20250508150927.385675-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>