summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-17Merge branch 'for-next' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux Tony Nguyen says: ==================== Add RDMA support for Intel IPU E2000 in idpf Tatyana Nikolova says: This idpf patch series is the second part of the staged submission for introducing RDMA RoCEv2 support for the IPU E2000 line of products, referred to as GEN3. To support RDMA GEN3 devices, the idpf driver uses common definitions of the IIDC interface and implements specific device functionality in iidc_rdma_idpf.h. The IPU model can host one or more logical network endpoints called vPorts per PCI function that are flexibly associated with a physical port or an internal communication port. Other features as it pertains to GEN3 devices include: * MMIO learning * RDMA capability negotiation * RDMA vectors discovery between idpf and control plane These patches are split from the submission "Add RDMA support for Intel IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts and platforms with a variety of general RDMA applications which include standalone verbs (rping, perftest, etc.), storage and HPC applications. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ This idpf patch series is the second part of the staged submission for introducing RDMA RoCEv2 support for the IPU E2000 line of products, referred to as GEN3. To support RDMA GEN3 devices, the idpf driver uses common definitions of the IIDC interface and implements specific device functionality in iidc_rdma_idpf.h. The IPU model can host one or more logical network endpoints called vPorts per PCI function that are flexibly associated with a physical port or an internal communication port. Other features as it pertains to GEN3 devices include: * MMIO learning * RDMA capability negotiation * RDMA vectors discovery between idpf and control plane These patches are split from the submission "Add RDMA support for Intel IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts and platforms with a variety of general RDMA applications which include standalone verbs (rping, perftest, etc.), storage and HPC applications. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ IWL reviews: v3: https://lore.kernel.org/all/20250708210554.1662-1-tatyana.e.nikolova@intel.com/ v2: https://lore.kernel.org/all/20250612220002.1120-1-tatyana.e.nikolova@intel.com/ v1 (split from previous series): https://lore.kernel.org/all/20250523170435.668-1-tatyana.e.nikolova@intel.com/ v3: https://lore.kernel.org/all/20250207194931.1569-1-tatyana.e.nikolova@intel.com/ RFC v2: https://lore.kernel.org/all/20240824031924.421-1-tatyana.e.nikolova@intel.com/ RFC: https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux: idpf: implement get LAN MMIO memory regions idpf: implement IDC vport aux driver MTU change handler idpf: implement remaining IDC RDMA core callbacks and handlers idpf: implement RDMA vport auxiliary dev create, init, and destroy idpf: implement core RDMA auxiliary dev create, init, and destroy idpf: use reserved RDMA vectors from control plane ==================== Link: https://patch.msgid.link/20250714181002.2865694-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17netfilter: nf_conntrack: fix crash due to removal of uninitialised entryFlorian Westphal
A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check passes ct->timeout has been altered to reflect the absolute 'best before' date instead of a relative time. Step 3 will therefore not remove the entry. Without this change to nf_ct_should_gc() we could still get this sequence: 1. Check if entry has expired. 2. Obtain a reference. 3. Call nf_ct_should_gc() to double-check step 1: 4 - entry is still observed as expired 5 - meanwhile, ct->timeout is corrected to absolute value on other CPU and confirm bit gets set 6 - confirm bit is seen 7 - valid entry is removed again First do check 6), then 4) so the gc expiry check always picks up either confirmed bit unset (entry gets skipped) or expiry re-check failure for re-inited conntrack objects. This change cannot be backported to releases before 5.19. Without commit 8a75a2c17410 ("netfilter: conntrack: remove unconfirmed list") |= IPS_CONFIRMED line cannot be moved without further changes. Cc: Razvan Cojocaru <rzvncj@gmail.com> Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/ Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/ Fixes: 1397af5bfd7d ("netfilter: conntrack: remove the percpu dying list") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-17net: fix segmentation after TCP/UDP fraglist GROFelix Fietkau
Since "net: gro: use cb instead of skb->network_header", the skb network header is no longer set in the GRO path. This breaks fraglist segmentation, which relies on ip_hdr()/tcp_hdr() to check for address/port changes. Fix this regression by selectively setting the network header for merged segment skbs. Fixes: 186b1ea73ad8 ("net: gro: use cb instead of skb->network_header") Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250705150622.10699-1-nbd@nbd.name Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-16ipv6: mcast: Delay put pmc->idev in mld_del_delrec()Yue Haibing
pmc->idev is still used in ip6_mc_clear_src(), so as mld_clear_delrec() does, the reference should be put after ip6_mc_clear_src() return. Fixes: 63ed8de4be81 ("mld: add mc_lock for protecting per-interface mld data") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20250714141957.3301871-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16Merge branch 'net-mlx5e-add-support-for-pcie-congestion-events'Jakub Kicinski
Tariq Toukan says: ==================== net/mlx5e: Add support for PCIe congestion events Dragos says: PCIe congestion events are events generated by the firmware when the device side has sustained PCIe inbound or outbound traffic above certain thresholds. The high and low threshold are hysteresis thresholds to prevent flapping: once the high threshold has been reached, a low threshold event will be triggered only after the bandwidth usage went below the low threshold. This series adds support for receiving and exposing such events as ethtool counters. 2 new pairs of counters are exposed: pci_bw_in/outbound_high/low. These should help the user understand if the device PCI is under pressure. Planned followup patches: - Allow configuration of thresholds through devlink. - Add ethtool counter for wakeups which did not result in any state change. ==================== Link: https://patch.msgid.link/1752589821-145787-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16net/mlx5e: Add device PCIe congestion ethtool statsDragos Tatulea
Implement the PCIe Congestion Event notifier which triggers a work item to query the PCIe Congestion Event object. The result of the congestion state is reflected in the new ethtool stats: * pci_bw_inbound_high: the device has crossed the high threshold for inbound PCIe traffic. * pci_bw_inbound_low: the device has crossed the low threshold for inbound PCIe traffic * pci_bw_outbound_high: the device has crossed the high threshold for outbound PCIe traffic. * pci_bw_outbound_low: the device has crossed the low threshold for outbound PCIe traffic The high and low thresholds are currently configured at 90% and 75%. These are hysteresis thresholds which help to check if the PCI bus on the device side is in a congested state. If low + 1 = high then the device is in a congested state. If low == high then the device is not in a congested state. The counters are also documented. A follow-up patch will make the thresholds configurable. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1752589821-145787-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16net/mlx5e: Create/destroy PCIe Congestion Event objectDragos Tatulea
Add initial infrastructure to create and destroy the PCIe Congestion Event object if the object is supported. The verb for the object creation function is "set" instead of "create" because the function will accommodate the modify operation as well in a subsequent patch. The next patches will hook it up to the event handler and will add actual functionality. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1752589821-145787-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16s390/net: Remove NETIUCV device driverNagamani PV
The netiucv driver creates TCP/IP interfaces over IUCV between Linux guests on z/VM and other z/VM entities. Rationale for removal: - NETIUCV connections are only supported for compatibility with earlier versions and not to be used for new network setups, since at least Linux kernel 4.0. - No known active users, use cases, or product dependencies - The driver is no longer relevant for z/VM networking; preferred methods include: * Device pass-through (e.g., OSA, RoCE) * z/VM Virtual Switch (VSWITCH) The IUCV mechanism itself remains supported and is actively used via AF_IUCV, hvc_iucv, and smsg_iucv. Signed-off-by: Nagamani PV <nagamani@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250715074210.3999296-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16Merge branch 'expose-refclk-for-rmii-and-enable-rmii'Jakub Kicinski
Ryan Wanner says: ==================== Expose REFCLK for RMII and enable RMII This set allows the REFCLK property to be exposed as a dt-property to properly reflect the correct RMII layout. RMII can take an external or internal provided REFCLK, since this is not SoC dependent but board dependent this must be exposed as a DT property for the macb driver. This set also enables RMII mode for the SAMA7 SoCs gigabit mac. v1: https://lore.kernel.org/cover.1750346271.git.Ryan.Wanner@microchip.com ==================== Link: https://patch.msgid.link/cover.1752510727.git.Ryan.Wanner@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flagRyan Wanner
Remove USARIO_CLKEN flag since this is now a device tree argument and not fixed to the SoC. This will instead be selected by the "cdns,refclk-ext" device tree property. Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com> Link: https://patch.msgid.link/1e7a8c324526f631f279925aa8a6aa937d55c796.1752510727.git.Ryan.Wanner@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16net: cadence: macb: Enable RMII for SAMA7 gemRyan Wanner
This macro enables the RMII mode bit in the USRIO register when RMII mode is requested. Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com> Link: https://patch.msgid.link/6698836e4ee7df5f6bee181f0d2e38d4b8e4cec2.1752510727.git.Ryan.Wanner@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16net: cadence: macb: Expose REFCLK as a device tree propertyRyan Wanner
The RMII and RGMII can both support internal or external provided REFCLKs 50MHz and 125MHz respectively. Since this is dependent on the board that the SoC is on this needs to be set via the device tree. This property flag is checked in the MACB DT node so the REFCLK cap is configured the correct way for the RMII or RGMII is configured on the board. Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com> Link: https://patch.msgid.link/7f9b65896d6b7b48275bc527b72a16347f8ce10a.1752510727.git.Ryan.Wanner@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16dt-bindings: net: cdns,macb: Add external REFCLK propertyRyan Wanner
REFCLK can be provided by an external source so this should be exposed by a DT property. The REFCLK is used for RMII and in some SoCs that use this driver the RGMII 125MHz clk can also be provided by an external source. Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/d558467c4d5b27fb3135ffdead800b14cd9c6c0a.1752510727.git.Ryan.Wanner@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16Merge branch 'selftest-net-add-selftest-for-netpoll'Jakub Kicinski
Breno Leitao says: ==================== selftest: net: Add selftest for netpoll I am submitting a new selftest for the netpoll subsystem specifically targeting the case where the RX is polling in the TX path, which is a case that we don't have any test in the tree today. This is done when netpoll_poll_dev() called, and this test creates a scenario when that is probably. The test does the following: 1) Configuring a single RX/TX queue to increase contention on the interface. 2) Generating background traffic to saturate the network, mimicking real-world congestion. 3) Sending netconsole messages to trigger netpoll polling and monitor its behavior. 4) Using dynamic netconsole targets via configfs, with the ability to delete and recreate targets during the test. 5) Running bpftrace in parallel to verify that netpoll_poll_dev() is called when expected. If it is called, then the test passes, otherwise the test is marked as skipped. In order to achieve it, I stole Jakub's bpftrace helper from [1], and did some small changes that I found useful to use the helper. So, this patchset basically contains: 1) The code stolen from Jakub 2) Improvements on bpftrace() helper 3) The selftest itself Link: https://lore.kernel.org/all/20250421222827.283737-22-kuba@kernel.org/ [1] ==================== Link: https://patch.msgid.link/20250714-netpoll_test-v7-0-c0220cfaa63e@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16selftests: net: add netpoll basic functionality testBreno Leitao
Add a basic selftest for the netpoll polling mechanism, specifically targeting the netpoll poll() side. The test creates a scenario where network transmission is running at maximum speed, and netpoll needs to poll the NIC. This is achieved by: 1. Configuring a single RX/TX queue to create contention 2. Generating background traffic to saturate the interface 3. Sending netconsole messages to trigger netpoll polling 4. Using dynamic netconsole targets via configfs 5. Delete and create new netconsole targets after some messages 6. Start a bpftrace in parallel to make sure netpoll_poll_dev() is called 7. If bpftrace exists and netpoll_poll_dev() was called, stop. The test validates a critical netpoll code path by monitoring traffic flow and ensuring netpoll_poll_dev() is called when the normal TX path is blocked. This addresses a gap in netpoll test coverage for a path that is tricky for the network stack. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250714-netpoll_test-v7-3-c0220cfaa63e@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16selftests: drv-net: Strip '@' prefix from bpftrace map keysBreno Leitao
The '@' prefix in bpftrace map keys is specific to bpftrace and can be safely removed when processing results. This patch modifies the bpftrace utility to strip the '@' from map keys before storing them in the result dictionary, making the keys more consistent with Python conventions. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250714-netpoll_test-v7-2-c0220cfaa63e@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16selftests: drv-net: add helper/wrapper for bpftraceJakub Kicinski
bpftrace is very useful for low level driver testing. perf or trace-cmd would also do for collecting data from tracepoints, but they require much more post-processing. Add a wrapper for running bpftrace and sanitizing its output. bpftrace has JSON output, which is great, but it prints loose objects and in a slightly inconvenient format. We have to read the objects line by line, and while at it return them indexed by the map name. Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250714-netpoll_test-v7-1-c0220cfaa63e@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16doc: xdp: Clarify driver implementation for XDP Rx metadataSong Yoong Siang
Clarify that drivers must remove device-reserved metadata from the data_meta area before passing frames to XDP programs. Additionally, expand the explanation of how userspace and BPF programs should coordinate the use of METADATA_SIZE, and add a detailed diagram to illustrate pointer adjustments and metadata layout. Also describe the requirements and constraints enforced by bpf_xdp_adjust_meta(). Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://lore.kernel.org/r/20250716154846.3513575-1-yoong.siang.song@intel.com
2025-07-16Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-07-15 (ixgbe, fm10k, i40e, ice) Arnd Bergmann resolves compile issues with large NR_CPUS for ixgbe, fm10k, and i40e. For ice: Dave adds a NULL check for LAG netdev. Michal corrects a pointer check in debugfs initialization. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: check correct pointer in fwlog debugfs ice: add NULL check in eswitch lag check ethernet: intel: fix building with large NR_CPUS ==================== Link: https://patch.msgid.link/20250715202948.3841437-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16net: airoha: fix potential use-after-free in airoha_npu_get()Alok Tiwari
np->name was being used after calling of_node_put(np), which releases the node and can lead to a use-after-free bug. Previously, of_node_put(np) was called unconditionally after of_find_device_by_node(np), which could result in a use-after-free if pdev is NULL. This patch moves of_node_put(np) after the error check to ensure the node is only released after both the error and success cases are handled appropriately, preventing potential resource issues. Fixes: 23290c7bc190 ("net: airoha: Introduce Airoha NPU support") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250715143102.3458286-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16ipv6: mcast: Simplify mld_clear_{report|query}()Yue Haibing
Use __skb_queue_purge() instead of re-implementing it. Note that it uses kfree_skb_reason() instead of kfree_skb() internally, and pass SKB_DROP_REASON_QUEUE_PURGE drop reason to the kfree_skb tracepoint. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250715120709.3941510-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16vsock/test: fix vsock_ioctl_int() check for unsupported ioctlStefano Garzarella
`vsock_do_ioctl` returns -ENOIOCTLCMD if an ioctl support is not implemented, like for SIOCINQ before commit f7c722659275 ("vsock: Add support for SIOCINQ ioctl"). In net/socket.c, -ENOIOCTLCMD is re-mapped to -ENOTTY for the user space. So, our test suite, without that commit applied, is failing in this way: 34 - SOCK_STREAM ioctl(SIOCINQ) functionality...ioctl(21531): Inappropriate ioctl for device Return false in vsock_ioctl_int() to skip the test in this case as well, instead of failing. Fixes: 53548d6bffac ("test/vsock: Add retry mechanism to ioctl wrapper") Cc: niuxuewei.nxw@antgroup.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Xuewei Niu <niuxuewei.nxw@antgroup.com> Link: https://patch.msgid.link/20250715093233.94108-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16tcp: fix UaF in tcp_prune_ofo_queue()Paolo Abeni
The CI reported a UaF in tcp_prune_ofo_queue(): BUG: KASAN: slab-use-after-free in tcp_prune_ofo_queue+0x55d/0x660 Read of size 4 at addr ffff8880134729d8 by task socat/20348 CPU: 0 UID: 0 PID: 20348 Comm: socat Not tainted 6.16.0-rc5-virtme #1 PREEMPT(full) Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: <TASK> dump_stack_lvl+0x82/0xd0 print_address_description.constprop.0+0x2c/0x400 print_report+0xb4/0x270 kasan_report+0xca/0x100 tcp_prune_ofo_queue+0x55d/0x660 tcp_try_rmem_schedule+0x855/0x12e0 tcp_data_queue+0x4dd/0x2260 tcp_rcv_established+0x5e8/0x2370 tcp_v4_do_rcv+0x4ba/0x8c0 __release_sock+0x27a/0x390 release_sock+0x53/0x1d0 tcp_sendmsg+0x37/0x50 sock_write_iter+0x3c1/0x520 vfs_write+0xc09/0x1210 ksys_write+0x183/0x1d0 do_syscall_64+0xc1/0x380 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fcf73ef2337 Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffd4f924708 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcf73ef2337 RDX: 0000000000002000 RSI: 0000555f11d1a000 RDI: 0000000000000008 RBP: 0000555f11d1a000 R08: 0000000000002000 R09: 0000000000000000 R10: 0000000000000040 R11: 0000000000000246 R12: 0000000000000008 R13: 0000000000002000 R14: 0000555ee1a44570 R15: 0000000000002000 </TASK> Allocated by task 20348: kasan_save_stack+0x24/0x50 kasan_save_track+0x14/0x30 __kasan_slab_alloc+0x59/0x70 kmem_cache_alloc_node_noprof+0x110/0x340 __alloc_skb+0x213/0x2e0 tcp_collapse+0x43f/0xff0 tcp_try_rmem_schedule+0x6b9/0x12e0 tcp_data_queue+0x4dd/0x2260 tcp_rcv_established+0x5e8/0x2370 tcp_v4_do_rcv+0x4ba/0x8c0 __release_sock+0x27a/0x390 release_sock+0x53/0x1d0 tcp_sendmsg+0x37/0x50 sock_write_iter+0x3c1/0x520 vfs_write+0xc09/0x1210 ksys_write+0x183/0x1d0 do_syscall_64+0xc1/0x380 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 20348: kasan_save_stack+0x24/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x38/0x50 kmem_cache_free+0x149/0x330 tcp_prune_ofo_queue+0x211/0x660 tcp_try_rmem_schedule+0x855/0x12e0 tcp_data_queue+0x4dd/0x2260 tcp_rcv_established+0x5e8/0x2370 tcp_v4_do_rcv+0x4ba/0x8c0 __release_sock+0x27a/0x390 release_sock+0x53/0x1d0 tcp_sendmsg+0x37/0x50 sock_write_iter+0x3c1/0x520 vfs_write+0xc09/0x1210 ksys_write+0x183/0x1d0 do_syscall_64+0xc1/0x380 entry_SYSCALL_64_after_hwframe+0x77/0x7f The buggy address belongs to the object at ffff888013472900 which belongs to the cache skbuff_head_cache of size 232 The buggy address is located 216 bytes inside of freed 232-byte region [ffff888013472900, ffff8880134729e8) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13472 head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x80000000000040(head|node=0|zone=1) page_type: f5(slab) raw: 0080000000000040 ffff88800198fb40 ffffea0000347b10 ffffea00004f5290 raw: 0000000000000000 0000000000120012 00000000f5000000 0000000000000000 head: 0080000000000040 ffff88800198fb40 ffffea0000347b10 ffffea00004f5290 head: 0000000000000000 0000000000120012 00000000f5000000 0000000000000000 head: 0080000000000001 ffffea00004d1c81 00000000ffffffff 00000000ffffffff head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888013472880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888013472900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888013472980: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc ^ ffff888013472a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888013472a80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb Indeed tcp_prune_ofo_queue() is reusing the skb dropped a few lines above. The caller wants to enqueue 'in_skb', lets check space vs the latter. Fixes: 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: syzbot+865aca08c0533171bf6a@syzkaller.appspotmail.com Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/b78d2d9bdccca29021eed9a0e7097dd8dc00f485.1752567053.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16net/mlx5: Correctly set gso_size when LRO is usedChristoph Paasch
gso_size is expected by the networking stack to be the size of the payload (thus, not including ethernet/IP/TCP-headers). However, cqe_bcnt is the full sized frame (including the headers). Dividing cqe_bcnt by lro_num_seg will then give incorrect results. For example, running a bpftrace higher up in the TCP-stack (tcp_event_data_recv), we commonly have gso_size set to 1450 or 1451 even though in reality the payload was only 1448 bytes. This can have unintended consequences: - In tcp_measure_rcv_mss() len will be for example 1450, but. rcv_mss will be 1448 (because tp->advmss is 1448). Thus, we will always recompute scaling_ratio each time an LRO-packet is received. - In tcp_gro_receive(), it will interfere with the decision whether or not to flush and thus potentially result in less gro'ed packets. So, we need to discount the protocol headers from cqe_bcnt so we can actually divide the payload by lro_num_seg to get the real gso_size. v2: - Use "(unsigned char *)tcp + tcp->doff * 4 - skb->data)" to compute header-len (Tariq Toukan <tariqt@nvidia.com>) - Improve commit-message (Gal Pressman <gal@nvidia.com>) Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Signed-off-by: Christoph Paasch <cpaasch@openai.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250715-cpaasch-pf-925-investigate-incorrect-gso_size-on-cx-7-nic-v2-1-e06c3475f3ac@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16selftests: packetdrill: correct the expected timing in tcp_rcv_big_endseqJakub Kicinski
Commit f5fda1a86884 ("selftests/net: packetdrill: add tcp_rcv_big_endseq.pkt") added this test recently, but it's failing with: # tcp_rcv_big_endseq.pkt:41: error handling packet: timing error: expected outbound packet at 1.230105 sec but happened at 1.190101 sec; tolerance 0.005046 sec # script packet: 1.230105 . 1:1(0) ack 54001 win 0 # actual packet: 1.190101 . 1:1(0) ack 54001 win 0 It's unclear why the test expects the ack to be delayed. Correct it. Link: https://patch.msgid.link/20250715142849.959444-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16ethtool: Don't check for RXFH fields conflict when no input_xfrm is requestedGal Pressman
The requirement of ->get_rxfh_fields() in ethtool_set_rxfh() is there to verify that we have no conflict of input_xfrm with the RSS fields options, there is no point in doing it if input_xfrm is not supported/requested. This is under the assumption that a driver that supports input_xfrm will also support ->get_rxfh_fields(), so add a WARN_ON() to ethtool_check_ops() to verify it, and remove the op NULL check. This fixes the following error in mlx4_en, which doesn't support getting/setting RXFH fields. $ ethtool --set-rxfh-indir eth2 hfunc xor Cannot set RX flow hash configuration: Operation not supported Fixes: 72792461c8e8 ("net: ethtool: don't mux RXFH via rxnfc callbacks") Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250715140754.489677-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16selftests: rtnetlink: fix addrlft test flakiness on power-saving systemsHangbin Liu
Jakub reported that the rtnetlink test for the preferred lifetime of an address has become quite flaky. The issue started appearing around the 6.16 merge window in May, and the test fails with: FAIL: preferred_lft addresses remaining The flakiness might be related to power-saving behavior, as address expiration is handled by a "power-efficient" workqueue. To address this, use slowwait to check more frequently whether the address still exists. This reduces the likelihood of the system entering a low-power state during the test, improving reliability. Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250715043459.110523-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16Merge tag 'probes-fixes-v6.16-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: - fprobe-event: The @params variable was being used in an error path without being initialized. The fix to return an error code. * tag 'probes-fixes-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/probes: Avoid using params uninitialized in parse_btf_arg()
2025-07-16Bluetooth: btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant ↵Zijun Hu
without board ID For GF variant of WCN6855 without board ID programmed btusb_generate_qca_nvm_name() will chose wrong NVM 'qca/nvm_usb_00130201.bin' to download. Fix by choosing right NVM 'qca/nvm_usb_00130201_gf.bin'. Also simplify NVM choice logic of btusb_generate_qca_nvm_name(). Fixes: d6cba4e6d0e2 ("Bluetooth: btusb: Add support using different nvm for variant WCN6855 controller") Signed-off-by: Zijun Hu <zijun.hu@oss.qualcomm.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: hci_dev: replace 'quirks' integer by 'quirk_flags' bitmapChristian Eggers
The 'quirks' member already ran out of bits on some platforms some time ago. Replace the integer member by a bitmap in order to have enough bits in future. Replace raw bit operations by accessor macros. Fixes: ff26b2dd6568 ("Bluetooth: Add quirk for broken READ_VOICE_SETTING") Fixes: 127881334eaa ("Bluetooth: Add quirk for broken READ_PAGE_SCAN_TYPE") Suggested-by: Pauli Virtanen <pav@iki.fi> Tested-by: Ivan Pravdin <ipravdin.official@gmail.com> Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: hci_core: add missing braces when using macro parametersChristian Eggers
Macro parameters should always be put into braces when accessing it. Fixes: 4fc9857ab8c6 ("Bluetooth: hci_sync: Add check simultaneous roles support") Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: hci_core: fix typos in macrosChristian Eggers
The provided macro parameter is named 'dev' (rather than 'hdev', which may be a variable on the stack where the macro is used). Fixes: a9a830a676a9 ("Bluetooth: hci_event: Fix sending HCI_OP_READ_ENC_KEY_SIZE") Fixes: 6126ffabba6b ("Bluetooth: Introduce HCI_CONN_FLAG_DEVICE_PRIVACY device flag") Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeoutLuiz Augusto von Dentz
This replaces the usage of HCI_ERROR_REMOTE_USER_TERM, which as the name suggest is to indicate a regular disconnection initiated by an user, with HCI_ERROR_AUTH_FAILURE to indicate the session has timeout thus any pairing shall be considered as failed. Fixes: 1e91c29eb60c ("Bluetooth: Use hci_disconnect for immediate disconnection from SMP") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: SMP: If an unallowed command is received consider it a failureLuiz Augusto von Dentz
If a command is received while a bonding is ongoing consider it a pairing failure so the session is cleanup properly and the device is disconnected immediately instead of continuing with other commands that may result in the session to get stuck without ever completing such as the case bellow: > ACL Data RX: Handle 2048 flags 0x02 dlen 21 SMP: Identity Information (0x08) len 16 Identity resolving key[16]: d7e08edef97d3e62cd2331f82d8073b0 > ACL Data RX: Handle 2048 flags 0x02 dlen 21 SMP: Signing Information (0x0a) len 16 Signature key[16]: 1716c536f94e843a9aea8b13ffde477d Bluetooth: hci0: unexpected SMP command 0x0a from XX:XX:XX:XX:XX:XX > ACL Data RX: Handle 2048 flags 0x02 dlen 12 SMP: Identity Address Information (0x09) len 7 Address: XX:XX:XX:XX:XX:XX (Intel Corporate) While accourding to core spec 6.1 the expected order is always BD_ADDR first first then CSRK: When using LE legacy pairing, the keys shall be distributed in the following order: LTK by the Peripheral EDIV and Rand by the Peripheral IRK by the Peripheral BD_ADDR by the Peripheral CSRK by the Peripheral LTK by the Central EDIV and Rand by the Central IRK by the Central BD_ADDR by the Central CSRK by the Central When using LE Secure Connections, the keys shall be distributed in the following order: IRK by the Peripheral BD_ADDR by the Peripheral CSRK by the Peripheral IRK by the Central BD_ADDR by the Central CSRK by the Central According to the Core 6.1 for commands used for key distribution "Key Rejected" can be used: '3.6.1. Key distribution and generation A device may reject a distributed key by sending the Pairing Failed command with the reason set to "Key Rejected". Fixes: b28b4943660f ("Bluetooth: Add strict checks for allowed SMP PDUs") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: btintel: Check if controller is ISO capable on ↵Luiz Augusto von Dentz
btintel_classify_pkt_type Due to what seem to be a bug with variant version returned by some firmwares the code may set hdev->classify_pkt_type with btintel_classify_pkt_type when in fact the controller doesn't even support ISO channels feature but may use the handle range expected from a controllers that does causing the packets to be reclassified as ISO causing several bugs. To fix the above btintel_classify_pkt_type will attempt to check if the controller really supports ISO channels and in case it doesn't don't reclassify even if the handle range is considered to be ISO, this is considered safer than trying to fix the specific controller/firmware version as that could change over time and causing similar problems in the future. Link: https://bugzilla.kernel.org/show_bug.cgi?id=219553 Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2100565 Link: https://github.com/StarLabsLtd/firmware/issues/180 Fixes: f25b7fd36cc3 ("Bluetooth: Add vendor-specific packet classification for ISO data") Cc: stable@vger.kernel.org Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Sean Rhodes <sean@starlabs.systems>
2025-07-16Bluetooth: hci_sync: fix connectable extended advertising when using static ↵Alessandro Gasbarroni
random address Currently, the connectable flag used by the setup of an extended advertising instance drives whether we require privacy when trying to pass a random address to the advertising parameters (Own Address). If privacy is not required, then it automatically falls back to using the controller's public address. This can cause problems when using controllers that do not have a public address set, but instead use a static random address. e.g. Assume a BLE controller that does not have a public address set. The controller upon powering is set with a random static address by default by the kernel. < HCI Command: LE Set Random Address (0x08|0x0005) plen 6 Address: E4:AF:26:D8:3E:3A (Static) > HCI Event: Command Complete (0x0e) plen 4 LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) Setting non-connectable extended advertisement parameters in bluetoothctl mgmt add-ext-adv-params -r 0x801 -x 0x802 -P 2M -g 1 correctly sets Own address type as Random < HCI Command: LE Set Extended Advertising Parameters (0x08|0x0036) plen 25 ... Own address type: Random (0x01) Setting connectable extended advertisement parameters in bluetoothctl mgmt add-ext-adv-params -r 0x801 -x 0x802 -P 2M -g -c 1 mistakenly sets Own address type to Public (which causes to use Public Address 00:00:00:00:00:00) < HCI Command: LE Set Extended Advertising Parameters (0x08|0x0036) plen 25 ... Own address type: Public (0x00) This causes either the controller to emit an Invalid Parameters error or to mishandle the advertising. This patch makes sure that we use the already set static random address when requesting a connectable extended advertising when we don't require privacy and our public address is not set (00:00:00:00:00:00). Fixes: 3fe318ee72c5 ("Bluetooth: move hci_get_random_address() to hci_sync") Signed-off-by: Alessandro Gasbarroni <alex.gasbarroni@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: Fix null-ptr-deref in l2cap_sock_resume_cb()Kuniyuki Iwashima
syzbot reported null-ptr-deref in l2cap_sock_resume_cb(). [0] l2cap_sock_resume_cb() has a similar problem that was fixed by commit 1bff51ea59a9 ("Bluetooth: fix use-after-free error in lock_sock_nested()"). Since both l2cap_sock_kill() and l2cap_sock_resume_cb() are executed under l2cap_sock_resume_cb(), we can avoid the issue simply by checking if chan->data is NULL. Let's not access to the killed socket in l2cap_sock_resume_cb(). [0]: BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:82 [inline] BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline] BUG: KASAN: null-ptr-deref in l2cap_sock_resume_cb+0xb4/0x17c net/bluetooth/l2cap_sock.c:1711 Write of size 8 at addr 0000000000000570 by task kworker/u9:0/52 CPU: 1 UID: 0 PID: 52 Comm: kworker/u9:0 Not tainted 6.16.0-rc4-syzkaller-g7482bb149b9f #0 PREEMPT Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Workqueue: hci0 hci_rx_work Call trace: show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:501 (C) __dump_stack+0x30/0x40 lib/dump_stack.c:94 dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120 print_report+0x58/0x84 mm/kasan/report.c:524 kasan_report+0xb0/0x110 mm/kasan/report.c:634 check_region_inline mm/kasan/generic.c:-1 [inline] kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:189 __kasan_check_write+0x20/0x30 mm/kasan/shadow.c:37 instrument_atomic_write include/linux/instrumented.h:82 [inline] clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline] l2cap_sock_resume_cb+0xb4/0x17c net/bluetooth/l2cap_sock.c:1711 l2cap_security_cfm+0x524/0xea0 net/bluetooth/l2cap_core.c:7357 hci_auth_cfm include/net/bluetooth/hci_core.h:2092 [inline] hci_auth_complete_evt+0x2e8/0xa4c net/bluetooth/hci_event.c:3514 hci_event_func net/bluetooth/hci_event.c:7511 [inline] hci_event_packet+0x650/0xe9c net/bluetooth/hci_event.c:7565 hci_rx_work+0x320/0xb18 net/bluetooth/hci_core.c:4070 process_one_work+0x7e8/0x155c kernel/workqueue.c:3238 process_scheduled_works kernel/workqueue.c:3321 [inline] worker_thread+0x958/0xed8 kernel/workqueue.c:3402 kthread+0x5fc/0x75c kernel/kthread.c:464 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:847 Fixes: d97c899bde33 ("Bluetooth: Introduce L2CAP channel callback for resuming") Reported-by: syzbot+e4d73b165c3892852d22@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/686c12bd.a70a0220.29fe6c.0b13.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16ovpn: reset GSO metadata after decapsulationRalf Lici
The ovpn_netdev_write() function is responsible for injecting decapsulated and decrypted packets back into the local network stack. Prior to this patch, the skb could retain GSO metadata from the outer, encrypted tunnel packet. This original GSO metadata, relevant to the sender's transport context, becomes invalid and misleading for the tunnel/data path once the inner packet is exposed. Leaving this stale metadata intact causes internal GSO validation checks further down the kernel's network stack (validate_xmit_skb()) to fail, leading to packet drops. The reasons for these failures vary by protocol, for example: - for ICMP, no offload handler is registered; - for TCP and UDP, the respective offload handlers return errors when comparing skb->len to the outdated skb_shinfo(skb)->gso_size. By calling skb_gso_reset(skb) we ensure the inner packet is presented to gro_cells_receive() with a clean state, correctly indicating it is an individual packet from the perspective of the local stack. This change eliminates the "Driver has suspect GRO implementation, TCP performance may be compromised" warning and improves overall TCP performance by allowing GSO/GRO to function as intended on the decapsulated traffic. Fixes: 11851cbd60ea ("ovpn: implement TCP transport") Reported-by: Gert Doering <gert@greenie.muc.de> Closes: https://github.com/OpenVPN/ovpn-net-next/issues/4 Tested-by: Gert Doering <gert@greenie.muc.de> Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-07-16ovpn: reject unexpected netlink attributesAntonio Quartulli
Netlink ops do not expect all attributes to be always set, however this condition is not explicitly coded any where, leading the user to believe that all sent attributes are somewhat processed. Fix this behaviour by introducing explicit checks. For CMD_OVPN_PEER_GET and CMD_OVPN_KEY_GET directly open-code the needed condition in the related ops handlers. While for all other ops use attribute subsets in the ovpn.yaml spec file. Fixes: b7a63391aa98 ("ovpn: add basic netlink support") Reported-by: Ralf Lici <ralf@mandelbit.com> Closes: https://github.com/OpenVPN/ovpn-net-next/issues/19 Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-07-16ovpn: propagate socket mark to skb in UDPRalf Lici
OpenVPN allows users to configure a FW mark on sockets used to communicate with other peers. The mark is set by means of the `SO_MARK` Linux socket option. However, in the ovpn UDP code path, the socket's `sk_mark` value is currently ignored and it is not propagated to outgoing `skbs`. This commit ensures proper inheritance of the field by setting `skb->mark` to `sk->sk_mark` before handing the `skb` to the network stack for transmission. Fixes: 08857b5ec5d9 ("ovpn: implement basic TX path (UDP)") Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Ralf Lici <ralf@mandelbit.com> Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31877.html Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-07-16tracing/probes: Avoid using params uninitialized in parse_btf_arg()Nathan Chancellor
After a recent change in clang to strengthen uninitialized warnings [1], it points out that in one of the error paths in parse_btf_arg(), params is used uninitialized: kernel/trace/trace_probe.c:660:19: warning: variable 'params' is uninitialized when used here [-Wuninitialized] 660 | return PTR_ERR(params); | ^~~~~~ Match many other NO_BTF_ENTRY error cases and return -ENOENT, clearing up the warning. Link: https://lore.kernel.org/all/20250715-trace_probe-fix-const-uninit-warning-v1-1-98960f91dd04@kernel.org/ Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2110 Fixes: d157d7694460 ("tracing/probes: Support BTF field access from $retval") Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-07-15Merge branch 'mptcp-fix-fallback-related-races'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: fix fallback-related races This series contains 3 fixes somewhat related to various races we have while handling fallback. The root cause of the issues addressed here is that the check for "we can fallback to tcp now" and the related action are not atomic. That also applies to fallback due to MP_FAIL -- where the window race is even wider. Address the issue introducing an additional spinlock to bundle together all the relevant events, as per patch 1 and 2. These fixes can be backported up to v5.19 and v5.15. Note that mptcp_disconnect() unconditionally clears the fallback status (zeroing msk->flags) but don't touch the `allows_infinite_fallback` flag. Such issue is addressed in patch 3, and can be backported up to v5.17. ==================== Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-0-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15mptcp: reset fallback status gracefully at disconnect() timePaolo Abeni
mptcp_disconnect() clears the fallback bit unconditionally, without touching the associated flags. The bit clear is safe, as no fallback operation can race with that -- all subflow are already in TCP_CLOSE status thanks to the previous FASTCLOSE -- but we need to consistently reset all the fallback related status. Also acquire the relevant lock, to avoid fouling static analyzers. Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-3-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15mptcp: plug races between subflow fail and subflow creationPaolo Abeni
We have races similar to the one addressed by the previous patch between subflow failing and additional subflow creation. They are just harder to trigger. The solution is similar. Use a separate flag to track the condition 'socket state prevent any additional subflow creation' protected by the fallback lock. The socket fallback makes such flag true, and also receiving or sending an MP_FAIL option. The field 'allow_infinite_fallback' is now always touched under the relevant lock, we can drop the ONCE annotation on write. Fixes: 478d770008b0 ("mptcp: send out MP_FAIL when data checksum fails") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-2-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15mptcp: make fallback action and fallback decision atomicPaolo Abeni
Syzkaller reported the following splat: WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 __mptcp_do_fallback net/mptcp/protocol.h:1223 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_do_fallback net/mptcp/protocol.h:1244 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 check_fully_established net/mptcp/options.c:982 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_incoming_options+0x21a8/0x2510 net/mptcp/options.c:1153 Modules linked in: CPU: 1 UID: 0 PID: 7704 Comm: syz.3.1419 Not tainted 6.16.0-rc3-gbd5ce2324dba #20 PREEMPT(voluntary) Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:__mptcp_do_fallback net/mptcp/protocol.h:1223 [inline] RIP: 0010:mptcp_do_fallback net/mptcp/protocol.h:1244 [inline] RIP: 0010:check_fully_established net/mptcp/options.c:982 [inline] RIP: 0010:mptcp_incoming_options+0x21a8/0x2510 net/mptcp/options.c:1153 Code: 24 18 e8 bb 2a 00 fd e9 1b df ff ff e8 b1 21 0f 00 e8 ec 5f c4 fc 44 0f b7 ac 24 b0 00 00 00 e9 54 f1 ff ff e8 d9 5f c4 fc 90 <0f> 0b 90 e9 b8 f4 ff ff e8 8b 2a 00 fd e9 8d e6 ff ff e8 81 2a 00 RSP: 0018:ffff8880a3f08448 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880180a8000 RCX: ffffffff84afcf45 RDX: ffff888090223700 RSI: ffffffff84afdaa7 RDI: 0000000000000001 RBP: ffff888017955780 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8880180a8910 R14: ffff8880a3e9d058 R15: 0000000000000000 FS: 00005555791b8500(0000) GS:ffff88811c495000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000110c2800b7 CR3: 0000000058e44000 CR4: 0000000000350ef0 Call Trace: <IRQ> tcp_reset+0x26f/0x2b0 net/ipv4/tcp_input.c:4432 tcp_validate_incoming+0x1057/0x1b60 net/ipv4/tcp_input.c:5975 tcp_rcv_established+0x5b5/0x21f0 net/ipv4/tcp_input.c:6166 tcp_v4_do_rcv+0x5dc/0xa70 net/ipv4/tcp_ipv4.c:1925 tcp_v4_rcv+0x3473/0x44a0 net/ipv4/tcp_ipv4.c:2363 ip_protocol_deliver_rcu+0xba/0x480 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2f1/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:317 [inline] NF_HOOK include/linux/netfilter.h:311 [inline] ip_local_deliver+0x1be/0x560 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:469 [inline] ip_rcv_finish net/ipv4/ip_input.c:447 [inline] NF_HOOK include/linux/netfilter.h:317 [inline] NF_HOOK include/linux/netfilter.h:311 [inline] ip_rcv+0x514/0x810 net/ipv4/ip_input.c:567 __netif_receive_skb_one_core+0x197/0x1e0 net/core/dev.c:5975 __netif_receive_skb+0x1f/0x120 net/core/dev.c:6088 process_backlog+0x301/0x1360 net/core/dev.c:6440 __napi_poll.constprop.0+0xba/0x550 net/core/dev.c:7453 napi_poll net/core/dev.c:7517 [inline] net_rx_action+0xb44/0x1010 net/core/dev.c:7644 handle_softirqs+0x1d0/0x770 kernel/softirq.c:579 do_softirq+0x3f/0x90 kernel/softirq.c:480 </IRQ> <TASK> __local_bh_enable_ip+0xed/0x110 kernel/softirq.c:407 local_bh_enable include/linux/bottom_half.h:33 [inline] inet_csk_listen_stop+0x2c5/0x1070 net/ipv4/inet_connection_sock.c:1524 mptcp_check_listen_stop.part.0+0x1cc/0x220 net/mptcp/protocol.c:2985 mptcp_check_listen_stop net/mptcp/mib.h:118 [inline] __mptcp_close+0x9b9/0xbd0 net/mptcp/protocol.c:3000 mptcp_close+0x2f/0x140 net/mptcp/protocol.c:3066 inet_release+0xed/0x200 net/ipv4/af_inet.c:435 inet6_release+0x4f/0x70 net/ipv6/af_inet6.c:487 __sock_release+0xb3/0x270 net/socket.c:649 sock_close+0x1c/0x30 net/socket.c:1439 __fput+0x402/0xb70 fs/file_table.c:465 task_work_run+0x150/0x240 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xd4/0xe0 kernel/entry/common.c:114 exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline] do_syscall_64+0x245/0x360 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fc92f8a36ad Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffcf52802d8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 RAX: 0000000000000000 RBX: 00007ffcf52803a8 RCX: 00007fc92f8a36ad RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003 RBP: 00007fc92fae7ba0 R08: 0000000000000001 R09: 0000002800000000 R10: 00007fc92f700000 R11: 0000000000000246 R12: 00007fc92fae5fac R13: 00007fc92fae5fa0 R14: 0000000000026d00 R15: 0000000000026c51 </TASK> irq event stamp: 4068 hardirqs last enabled at (4076): [<ffffffff81544816>] __up_console_sem+0x76/0x80 kernel/printk/printk.c:344 hardirqs last disabled at (4085): [<ffffffff815447fb>] __up_console_sem+0x5b/0x80 kernel/printk/printk.c:342 softirqs last enabled at (3096): [<ffffffff840e1be0>] local_bh_enable include/linux/bottom_half.h:33 [inline] softirqs last enabled at (3096): [<ffffffff840e1be0>] inet_csk_listen_stop+0x2c0/0x1070 net/ipv4/inet_connection_sock.c:1524 softirqs last disabled at (3097): [<ffffffff813b6b9f>] do_softirq+0x3f/0x90 kernel/softirq.c:480 Since we need to track the 'fallback is possible' condition and the fallback status separately, there are a few possible races open between the check and the actual fallback action. Add a spinlock to protect the fallback related information and use it close all the possible related races. While at it also remove the too-early clearing of allow_infinite_fallback in __mptcp_subflow_connect(): the field will be correctly cleared by subflow_finish_connect() if/when the connection will complete successfully. If fallback is not possible, as per RFC, reset the current subflow. Since the fallback operation can now fail and return value should be checked, rename the helper accordingly. Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status") Cc: stable@vger.kernel.org Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/570 Reported-by: syzbot+5cf807c20386d699b524@syzkaller.appspotmail.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/555 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-1-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15net: libwx: fix multicast packets received countJiawen Wu
Multicast good packets received by PF rings that pass ethternet MAC address filtering are counted for rtnl_link_stats64.multicast. The counter is not cleared on read. Fix the duplicate counting on updating statistics. Fixes: 46b92e10d631 ("net: libwx: support hardware statistics") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/DA229A4F58B70E51+20250714015656.91772-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15Merge branch 'fix-rx-fatal-errors'Jakub Kicinski
Jiawen Wu says: ==================== Fix Rx fatal errors There are some fatal errors on the Rx NAPI path, which can cause the kernel to crash. Fix known issues and potential risks. The part of the patches has been mentioned before[1]. [1]: https://lore.kernel.org/all/C8A23A11DB646E60+20250630094102.22265-1-jiawenwu@trustnetic.com/ v1: https://lore.kernel.org/20250709064025.19436-1-jiawenwu@trustnetic.com ==================== Link: https://patch.msgid.link/20250714024755.17512-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15net: libwx: properly reset Rx ring descriptorJiawen Wu
When device reset is triggered by feature changes such as toggling Rx VLAN offload, wx->do_reset() is called to reinitialize Rx rings. The hardware descriptor ring may retain stale values from previous sessions. And only set the length to 0 in rx_desc[0] would result in building malformed SKBs. Fix it to ensure a clean slate after device reset. [ 549.186435] [ C16] ------------[ cut here ]------------ [ 549.186457] [ C16] kernel BUG at net/core/skbuff.c:2814! [ 549.186468] [ C16] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 549.186472] [ C16] CPU: 16 UID: 0 PID: 0 Comm: swapper/16 Kdump: loaded Not tainted 6.16.0-rc4+ #23 PREEMPT(voluntary) [ 549.186476] [ C16] Hardware name: Micro-Star International Co., Ltd. MS-7E16/X670E GAMING PLUS WIFI (MS-7E16), BIOS 1.90 12/31/2024 [ 549.186478] [ C16] RIP: 0010:__pskb_pull_tail+0x3ff/0x510 [ 549.186484] [ C16] Code: 06 f0 ff 4f 34 74 7b 4d 8b 8c 24 c8 00 00 00 45 8b 84 24 c0 00 00 00 e9 c8 fd ff ff 48 c7 44 24 08 00 00 00 00 e9 5e fe ff ff <0f> 0b 31 c0 e9 23 90 5b ff 41 f7 c6 ff 0f 00 00 75 bf 49 8b 06 a8 [ 549.186487] [ C16] RSP: 0018:ffffb391c0640d70 EFLAGS: 00010282 [ 549.186490] [ C16] RAX: 00000000fffffff2 RBX: ffff8fe7e4d40200 RCX: 00000000fffffff2 [ 549.186492] [ C16] RDX: ffff8fe7c3a4bf8e RSI: 0000000000000180 RDI: ffff8fe7c3a4bf40 [ 549.186494] [ C16] RBP: ffffb391c0640da8 R08: ffff8fe7c3a4c0c0 R09: 000000000000000e [ 549.186496] [ C16] R10: ffffb391c0640d88 R11: 000000000000000e R12: ffff8fe7e4d40200 [ 549.186497] [ C16] R13: 00000000fffffff2 R14: ffff8fe7fa01a000 R15: 00000000fffffff2 [ 549.186499] [ C16] FS: 0000000000000000(0000) GS:ffff8fef5ae40000(0000) knlGS:0000000000000000 [ 549.186502] [ C16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 549.186503] [ C16] CR2: 00007f77d81d6000 CR3: 000000051a032000 CR4: 0000000000750ef0 [ 549.186505] [ C16] PKRU: 55555554 [ 549.186507] [ C16] Call Trace: [ 549.186510] [ C16] <IRQ> [ 549.186513] [ C16] ? srso_alias_return_thunk+0x5/0xfbef5 [ 549.186517] [ C16] __skb_pad+0xc7/0xf0 [ 549.186523] [ C16] wx_clean_rx_irq+0x355/0x3b0 [libwx] [ 549.186533] [ C16] wx_poll+0x92/0x120 [libwx] [ 549.186540] [ C16] __napi_poll+0x28/0x190 [ 549.186544] [ C16] net_rx_action+0x301/0x3f0 [ 549.186548] [ C16] ? srso_alias_return_thunk+0x5/0xfbef5 [ 549.186551] [ C16] ? __raw_spin_lock_irqsave+0x1e/0x50 [ 549.186554] [ C16] ? srso_alias_return_thunk+0x5/0xfbef5 [ 549.186557] [ C16] ? wake_up_nohz_cpu+0x35/0x160 [ 549.186559] [ C16] ? srso_alias_return_thunk+0x5/0xfbef5 [ 549.186563] [ C16] handle_softirqs+0xf9/0x2c0 [ 549.186568] [ C16] __irq_exit_rcu+0xc7/0x130 [ 549.186572] [ C16] common_interrupt+0xb8/0xd0 [ 549.186576] [ C16] </IRQ> [ 549.186577] [ C16] <TASK> [ 549.186579] [ C16] asm_common_interrupt+0x22/0x40 [ 549.186582] [ C16] RIP: 0010:cpuidle_enter_state+0xc2/0x420 [ 549.186585] [ C16] Code: 00 00 e8 11 0e 5e ff e8 ac f0 ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 0d ed 5c ff 45 84 ff 0f 85 40 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 84 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d [ 549.186587] [ C16] RSP: 0018:ffffb391c0277e78 EFLAGS: 00000246 [ 549.186590] [ C16] RAX: ffff8fef5ae40000 RBX: 0000000000000003 RCX: 0000000000000000 [ 549.186591] [ C16] RDX: 0000007fde0faac5 RSI: ffffffff826e53f6 RDI: ffffffff826fa9b3 [ 549.186593] [ C16] RBP: ffff8fe7c3a20800 R08: 0000000000000002 R09: 0000000000000000 [ 549.186595] [ C16] R10: 0000000000000000 R11: 000000000000ffff R12: ffffffff82ed7a40 [ 549.186596] [ C16] R13: 0000007fde0faac5 R14: 0000000000000003 R15: 0000000000000000 [ 549.186601] [ C16] ? cpuidle_enter_state+0xb3/0x420 [ 549.186605] [ C16] cpuidle_enter+0x29/0x40 [ 549.186609] [ C16] cpuidle_idle_call+0xfd/0x170 [ 549.186613] [ C16] do_idle+0x7a/0xc0 [ 549.186616] [ C16] cpu_startup_entry+0x25/0x30 [ 549.186618] [ C16] start_secondary+0x117/0x140 [ 549.186623] [ C16] common_startup_64+0x13e/0x148 [ 549.186628] [ C16] </TASK> Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250714024755.17512-4-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15net: libwx: fix the using of Rx buffer DMAJiawen Wu
The wx_rx_buffer structure contained two DMA address fields: 'dma' and 'page_dma'. However, only 'page_dma' was actually initialized and used to program the Rx descriptor. But 'dma' was uninitialized and used in some paths. This could lead to undefined behavior, including DMA errors or use-after-free, if the uninitialized 'dma' was used. Althrough such error has not yet occurred, it is worth fixing in the code. Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250714024755.17512-3-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15net: libwx: remove duplicate page_pool_put_full_page()Jiawen Wu
page_pool_put_full_page() should only be invoked when freeing Rx buffers or building a skb if the size is too short. At other times, the pages need to be reused. So remove the redundant page put. In the original code, double free pages cause kernel panic: [ 876.949834] __irq_exit_rcu+0xc7/0x130 [ 876.949836] common_interrupt+0xb8/0xd0 [ 876.949838] </IRQ> [ 876.949838] <TASK> [ 876.949840] asm_common_interrupt+0x22/0x40 [ 876.949841] RIP: 0010:cpuidle_enter_state+0xc2/0x420 [ 876.949843] Code: 00 00 e8 d1 1d 5e ff e8 ac f0 ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 cd fc 5c ff 45 84 ff 0f 85 40 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 84 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d [ 876.949844] RSP: 0018:ffffaa7340267e78 EFLAGS: 00000246 [ 876.949845] RAX: ffff9e3f135be000 RBX: 0000000000000002 RCX: 0000000000000000 [ 876.949846] RDX: 000000cc2dc4cb7c RSI: ffffffff89ee49ae RDI: ffffffff89ef9f9e [ 876.949847] RBP: ffff9e378f940800 R08: 0000000000000002 R09: 00000000000000ed [ 876.949848] R10: 000000000000afc8 R11: ffff9e3e9e5a9b6c R12: ffffffff8a6d8580 [ 876.949849] R13: 000000cc2dc4cb7c R14: 0000000000000002 R15: 0000000000000000 [ 876.949852] ? cpuidle_enter_state+0xb3/0x420 [ 876.949855] cpuidle_enter+0x29/0x40 [ 876.949857] cpuidle_idle_call+0xfd/0x170 [ 876.949859] do_idle+0x7a/0xc0 [ 876.949861] cpu_startup_entry+0x25/0x30 [ 876.949862] start_secondary+0x117/0x140 [ 876.949864] common_startup_64+0x13e/0x148 [ 876.949867] </TASK> [ 876.949868] ---[ end trace 0000000000000000 ]--- [ 876.949869] ------------[ cut here ]------------ [ 876.949870] list_del corruption, ffffead40445a348->next is NULL [ 876.949873] WARNING: CPU: 14 PID: 0 at lib/list_debug.c:52 __list_del_entry_valid_or_report+0x67/0x120 [ 876.949875] Modules linked in: snd_hrtimer(E) bnep(E) binfmt_misc(E) amdgpu(E) squashfs(E) vfat(E) loop(E) fat(E) amd_atl(E) snd_hda_codec_realtek(E) intel_rapl_msr(E) snd_hda_codec_generic(E) intel_rapl_common(E) snd_hda_scodec_component(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) edac_mce_amd(E) snd_intel_dspcfg(E) snd_hda_codec(E) snd_hda_core(E) amdxcp(E) kvm_amd(E) snd_hwdep(E) gpu_sched(E) drm_panel_backlight_quirks(E) cec(E) snd_pcm(E) drm_buddy(E) snd_seq_dummy(E) drm_ttm_helper(E) btusb(E) kvm(E) snd_seq_oss(E) btrtl(E) ttm(E) btintel(E) snd_seq_midi(E) btbcm(E) drm_exec(E) snd_seq_midi_event(E) i2c_algo_bit(E) snd_rawmidi(E) bluetooth(E) drm_suballoc_helper(E) irqbypass(E) snd_seq(E) ghash_clmulni_intel(E) sha512_ssse3(E) drm_display_helper(E) aesni_intel(E) snd_seq_device(E) rfkill(E) snd_timer(E) gf128mul(E) drm_client_lib(E) drm_kms_helper(E) snd(E) i2c_piix4(E) joydev(E) soundcore(E) wmi_bmof(E) ccp(E) k10temp(E) i2c_smbus(E) gpio_amdpt(E) i2c_designware_platform(E) gpio_generic(E) sg(E) [ 876.949914] i2c_designware_core(E) sch_fq_codel(E) parport_pc(E) drm(E) ppdev(E) lp(E) parport(E) fuse(E) nfnetlink(E) ip_tables(E) ext4 crc16 mbcache jbd2 sd_mod sfp mdio_i2c i2c_core txgbe ahci ngbe pcs_xpcs libahci libwx r8169 phylink libata realtek ptp pps_core video wmi [ 876.949933] CPU: 14 UID: 0 PID: 0 Comm: swapper/14 Kdump: loaded Tainted: G W E 6.16.0-rc2+ #20 PREEMPT(voluntary) [ 876.949935] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE [ 876.949936] Hardware name: Micro-Star International Co., Ltd. MS-7E16/X670E GAMING PLUS WIFI (MS-7E16), BIOS 1.90 12/31/2024 [ 876.949936] RIP: 0010:__list_del_entry_valid_or_report+0x67/0x120 [ 876.949938] Code: 00 00 00 48 39 7d 08 0f 85 a6 00 00 00 5b b8 01 00 00 00 5d 41 5c e9 73 0d 93 ff 48 89 fe 48 c7 c7 a0 31 e8 89 e8 59 7c b3 ff <0f> 0b 31 c0 5b 5d 41 5c e9 57 0d 93 ff 48 89 fe 48 c7 c7 c8 31 e8 [ 876.949940] RSP: 0018:ffffaa73405d0c60 EFLAGS: 00010282 [ 876.949941] RAX: 0000000000000000 RBX: ffffead40445a348 RCX: 0000000000000000 [ 876.949942] RDX: 0000000000000105 RSI: 0000000000000001 RDI: 00000000ffffffff [ 876.949943] RBP: 0000000000000000 R08: 000000010006dfde R09: ffffffff8a47d150 [ 876.949944] R10: ffffffff8a47d150 R11: 0000000000000003 R12: dead000000000122 [ 876.949945] R13: ffff9e3e9e5af700 R14: ffffead40445a348 R15: ffff9e3e9e5af720 [ 876.949946] FS: 0000000000000000(0000) GS:ffff9e3f135be000(0000) knlGS:0000000000000000 [ 876.949947] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 876.949948] CR2: 00007fa58b480048 CR3: 0000000156724000 CR4: 0000000000750ef0 [ 876.949949] PKRU: 55555554 [ 876.949950] Call Trace: [ 876.949951] <IRQ> [ 876.949952] __rmqueue_pcplist+0x53/0x2c0 [ 876.949955] alloc_pages_bulk_noprof+0x2e0/0x660 [ 876.949958] __page_pool_alloc_pages_slow+0xa9/0x400 [ 876.949961] page_pool_alloc_pages+0xa/0x20 [ 876.949963] wx_alloc_rx_buffers+0xd7/0x110 [libwx] [ 876.949967] wx_clean_rx_irq+0x262/0x430 [libwx] [ 876.949971] wx_poll+0x92/0x130 [libwx] [ 876.949975] __napi_poll+0x28/0x190 [ 876.949977] net_rx_action+0x301/0x3f0 [ 876.949980] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949981] ? profile_tick+0x30/0x70 [ 876.949983] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949984] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949986] ? timerqueue_add+0xa3/0xc0 [ 876.949988] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949989] ? __raise_softirq_irqoff+0x16/0x70 [ 876.949991] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949993] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949994] ? wx_msix_clean_rings+0x41/0x50 [libwx] [ 876.949998] handle_softirqs+0xf9/0x2c0 Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250714024755.17512-2-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>