summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-08-27Merge branch 'introduce-refcount_t-for-reference-counting-of-rose_neigh'Jakub Kicinski
Takamitsu Iwai says: ==================== Introduce refcount_t for reference counting of rose_neigh The current implementation of rose_neigh uses 'use' and 'count' field of type unsigned short as a reference count. This approach lacks atomicity, leading to potential race conditions. As a result, syzbot has reported slab-use-after-free errors due to unintended removals. This series introduces refcount_t for reference counting to ensure atomicity and prevent race conditions. The patches are structured as follows: 1. Refactor rose_remove_neigh() to separate removal and freeing operations 2. Convert 'use' field to refcount_t for appropriate reference counting 3. Include references from rose_node to 'use' field These changes should resolve the reported slab-use-after-free issues and improve the overall stability of the ROSE network layer. v1: https://lore.kernel.org/20250820174707.83372-1-takamitz@amazon.co.jp ==================== Link: https://patch.msgid.link/20250823085857.47674-1-takamitz@amazon.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-27net: rose: include node references in rose_neigh refcountTakamitsu Iwai
Current implementation maintains two separate reference counting mechanisms: the 'count' field in struct rose_neigh tracks references from rose_node structures, while the 'use' field (now refcount_t) tracks references from rose_sock. This patch merges these two reference counting systems using 'use' field for proper reference management. Specifically, this patch adds incrementing and decrementing of rose_neigh->use when rose_neigh->count is incremented or decremented. This patch also modifies rose_rt_free(), rose_rt_device_down() and rose_clear_route() to properly release references to rose_neigh objects before freeing a rose_node through rose_remove_node(). These changes ensure rose_neigh structures are properly freed only when all references, including those from rose_node structures, are released. As a result, this resolves a slab-use-after-free issue reported by Syzbot. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+942297eecf7d2d61d1f1@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=942297eecf7d2d61d1f1 Signed-off-by: Takamitsu Iwai <takamitz@amazon.co.jp> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250823085857.47674-4-takamitz@amazon.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-27net: rose: convert 'use' field to refcount_tTakamitsu Iwai
The 'use' field in struct rose_neigh is used as a reference counter but lacks atomicity. This can lead to race conditions where a rose_neigh structure is freed while still being referenced by other code paths. For example, when rose_neigh->use becomes zero during an ioctl operation via rose_rt_ioctl(), the structure may be removed while its timer is still active, potentially causing use-after-free issues. This patch changes the type of 'use' from unsigned short to refcount_t and updates all code paths to use rose_neigh_hold() and rose_neigh_put() which operate reference counts atomically. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Takamitsu Iwai <takamitz@amazon.co.jp> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250823085857.47674-3-takamitz@amazon.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-27net: rose: split remove and free operations in rose_remove_neigh()Takamitsu Iwai
The current rose_remove_neigh() performs two distinct operations: 1. Removes rose_neigh from rose_neigh_list 2. Frees the rose_neigh structure Split these operations into separate functions to improve maintainability and prepare for upcoming refcount_t conversion. The timer cleanup remains in rose_remove_neigh() because free operations can be called from timer itself. This patch introduce rose_neigh_put() to handle the freeing of rose_neigh structures and modify rose_remove_neigh() to handle removal only. Signed-off-by: Takamitsu Iwai <takamitz@amazon.co.jp> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250823085857.47674-2-takamitz@amazon.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-27io_uring/kbuf: fix signedness in this_len calculationQingyue Zhang
When importing and using buffers, buf->len is considered unsigned. However, buf->len is converted to signed int when committing. This can lead to unexpected behavior if the buffer is large enough to be interpreted as a negative value. Make min_t calculation unsigned. Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") Co-developed-by: Suoxing Zhang <aftern00n@qq.com> Signed-off-by: Suoxing Zhang <aftern00n@qq.com> Signed-off-by: Qingyue Zhang <chunzhennn@qq.com> Link: https://lore.kernel.org/r/tencent_4DBB3674C0419BEC2C0C525949DA410CA307@qq.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-27ALSA: hda/tas2781: Fix EFI name for calibration beginning with 1 instead of 0Shenghao Ding
A bug reported by one of my customers that EFI name beginning with 0 instead of 1. Fixes: 4fe238513407 ("ALSA: hda/tas2781: Move and unified the calibrated-data getting function for SPI and I2C into the tas2781_hda lib") Signed-off-by: Shenghao Ding <shenghao-ding@ti.com> Link: https://patch.msgid.link/20250827043404.644-1-shenghao-ding@ti.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-08-27fuse: Block access to folio overlimitEdward Adam Davis
syz reported a slab-out-of-bounds Write in fuse_dev_do_write. When the number of bytes to be retrieved is truncated to the upper limit by fc->max_pages and there is an offset, the oob is triggered. Add a loop termination condition to prevent overruns. Fixes: 3568a9569326 ("fuse: support large folios for retrieves") Reported-by: syzbot+2d215d165f9354b9c4ea@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2d215d165f9354b9c4ea Tested-by: syzbot+2d215d165f9354b9c4ea@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-08-27netfilter: conntrack: helper: Replace -EEXIST by -EBUSYPhil Sutter
The helper registration return value is passed-through by module_init callbacks which modprobe confuses with the harmless -EEXIST returned when trying to load an already loaded module. Make sure modprobe fails so users notice their helper has not been registered and won't work. Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-08-27netfilter: br_netfilter: do not check confirmed bit in br_nf_local_in() ↵Wang Liang
after confirm When send a broadcast packet to a tap device, which was added to a bridge, br_nf_local_in() is called to confirm the conntrack. If another conntrack with the same hash value is added to the hash table, which can be triggered by a normal packet to a non-bridge device, the below warning may happen. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 96 at net/bridge/br_netfilter_hooks.c:632 br_nf_local_in+0x168/0x200 CPU: 1 UID: 0 PID: 96 Comm: tap_send Not tainted 6.17.0-rc2-dirty #44 PREEMPT(voluntary) RIP: 0010:br_nf_local_in+0x168/0x200 Call Trace: <TASK> nf_hook_slow+0x3e/0xf0 br_pass_frame_up+0x103/0x180 br_handle_frame_finish+0x2de/0x5b0 br_nf_hook_thresh+0xc0/0x120 br_nf_pre_routing_finish+0x168/0x3a0 br_nf_pre_routing+0x237/0x5e0 br_handle_frame+0x1ec/0x3c0 __netif_receive_skb_core+0x225/0x1210 __netif_receive_skb_one_core+0x37/0xa0 netif_receive_skb+0x36/0x160 tun_get_user+0xa54/0x10c0 tun_chr_write_iter+0x65/0xb0 vfs_write+0x305/0x410 ksys_write+0x60/0xd0 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> ---[ end trace 0000000000000000 ]--- To solve the hash conflict, nf_ct_resolve_clash() try to merge the conntracks, and update skb->_nfct. However, br_nf_local_in() still use the old ct from local variable 'nfct' after confirm(), which leads to this warning. If confirm() does not insert the conntrack entry and return NF_DROP, the warning may also occur. There is no need to reserve the WARN_ON_ONCE, just remove it. Link: https://lore.kernel.org/netdev/20250820043329.2902014-1-wangliang74@huawei.com/ Fixes: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack") Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Wang Liang <wangliang74@huawei.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-08-27x86/cpu/topology: Use initial APIC ID from XTOPOLOGY leaf on AMD/HYGONK Prateek Nayak
Prior to the topology parsing rewrite and the switchover to the new parsing logic for AMD processors in c749ce393b8f ("x86/cpu: Use common topology code for AMD"), the initial_apicid on these platforms was: - First initialized to the LocalApicId from CPUID leaf 0x1 EBX[31:24]. - Then overwritten by the ExtendedLocalApicId in CPUID leaf 0xb EDX[31:0] on processors that supported topoext. With the new parsing flow introduced in f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser"), parse_8000_001e() now unconditionally overwrites the initial_apicid already parsed during cpu_parse_topology_ext(). Although this has not been a problem on baremetal platforms, on virtualized AMD guests that feature more than 255 cores, QEMU zeros out the CPUID leaf 0x8000001e on CPUs with CoreID > 255 to prevent collision of these IDs in EBX[7:0] which can only represent a maximum of 255 cores [1]. This results in the following FW_BUG being logged when booting a guest with more than 255 cores: [Firmware Bug]: CPU 512: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0200 AMD64 Architecture Programmer's Manual Volume 2: System Programming Pub. 24593 Rev. 3.42 [2] Section 16.12 "x2APIC_ID" mentions the Extended Enumeration leaf 0xb (Fn0000_000B_EDX[31:0])(which was later superseded by the extended leaf 0x80000026) provides the full x2APIC ID under all circumstances unlike the one reported by CPUID leaf 0x8000001e EAX which depends on the mode in which APIC is configured. Rely on the APIC ID parsed during cpu_parse_topology_ext() from CPUID leaf 0x80000026 or 0xb and only use the APIC ID from leaf 0x8000001e if cpu_parse_topology_ext() failed (has_topoext is false). On platforms that support the 0xb leaf (Zen2 or later, AMD guests on QEMU) or the extended leaf 0x80000026 (Zen4 or later), the initial_apicid is now set to the value parsed from EDX[31:0]. On older AMD/Hygon platforms that do not support the 0xb leaf but support the TOPOEXT extension (families 0x15, 0x16, 0x17[Zen1], and Hygon), retain current behavior where the initial_apicid is set using the 0x8000001e leaf. Issue debugged by Naveen N Rao (AMD) <naveen@kernel.org> and Sairaj Kodilkar <sarunkod@amd.com>. [ bp: Massage commit message. ] Fixes: c749ce393b8f ("x86/cpu: Use common topology code for AMD") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Naveen N Rao (AMD) <naveen@kernel.org> Cc: stable@vger.kernel.org Link: https://github.com/qemu/qemu/commit/35ac5dfbcaa4b [1] Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 [2] Link: https://lore.kernel.org/20250825075732.10694-2-kprateek.nayak@amd.com
2025-08-27wifi: mt76: fix linked list corruptionFelix Fietkau
Never leave scheduled wcid entries on the temporary on-stack list Fixes: 0b3be9d1d34e ("wifi: mt76: add separate tx scheduling queue for off-channel tx") Link: https://patch.msgid.link/20250827085352.51636-6-nbd@nbd.name Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: free pending offchannel tx frames on wcid cleanupFelix Fietkau
Avoid leaking them or keeping the wcid on the tx list Fixes: 0b3be9d1d34e ("wifi: mt76: add separate tx scheduling queue for off-channel tx") Link: https://patch.msgid.link/20250827085352.51636-5-nbd@nbd.name Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: mt7915: fix list corruption after hardware restartFelix Fietkau
Since stations are recreated from scratch, all lists that wcids are added to must be cleared before calling ieee80211_restart_hw. Set wcid->sta = 0 for each wcid entry in order to ensure that they are not added again before they are ready. Fixes: 8a55712d124f ("wifi: mt76: mt7915: enable full system reset support") Link: https://patch.msgid.link/20250827085352.51636-4-nbd@nbd.name Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: mt7996: add missing check for rx wcid entriesFelix Fietkau
Non-station wcid entries must not be passed to the rx functions. In case of the global wcid entry, it could even lead to corruption in the wcid array due to pointer being casted to struct mt7996_sta_link using container_of. Fixes: 7464b12b7d92 ("wifi: mt76: mt7996: rework mt7996_rx_get_wcid to support MLO") Link: https://patch.msgid.link/20250827085352.51636-3-nbd@nbd.name Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: do not add non-sta wcid entries to the poll listFelix Fietkau
Polling and airtime reporting is valid for station entries only Link: https://patch.msgid.link/20250827085352.51636-2-nbd@nbd.name Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: mt7996: fix crash on some tx status reportsFelix Fietkau
When a wcid can't be found, link_sta can be stale from a previous batch. The code currently assumes that if link_sta is set, wcid is also non-zero. Fix wcid NULL pointer dereference by resetting link_sta when a wcid entry can't be found. Fixes: 62da647a2b20 ("wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()") Link: https://patch.msgid.link/20250827085352.51636-1-nbd@nbd.name Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: mt7996: use the correct vif link for scanning/rocChad Monroe
restore fix which was dropped during MLO rework Fixes: f0b0b239b8f3 ("wifi: mt76: mt7996: rework mt7996_mac_write_txwi() for MLO support") Signed-off-by: Chad Monroe <chad@monroe.io> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/180fffd409aa57f535a3d2c1951e41ae398ce09e.1754659732.git.chad@monroe.io Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: mt7996: disable beacons when going offchannelFelix Fietkau
Avoid leaking beacons on unrelated channels during scanning/roc Fixes: c56d6edebc1f ("wifi: mt76: mt7996: use emulated hardware scan support") Reported-by: Chad Monroe <chad.monroe@adtran.com> Link: https://patch.msgid.link/20250813121106.81559-1-nbd@nbd.name Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: prevent non-offchannel mgmt tx during scan/rocFelix Fietkau
Only put probe request packets in the offchannel queue if IEEE80211_TX_CTRL_DONT_USE_RATE_MASK is set and IEEE80211_TX_CTL_TX_OFFCHAN is unset. Fixes: 0b3be9d1d34e ("wifi: mt76: add separate tx scheduling queue for off-channel tx") Reported-by: Chad Monroe <chad.monroe@adtran.com> Link: https://patch.msgid.link/20250813121106.81559-2-nbd@nbd.name Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: mt7925: skip EHT MLD TLV on non-MLD and pass conn_state for sta_cmdMing Yen Hsieh
Return early in mt7925_mcu_sta_eht_mld_tlv() for non-MLD vifs to avoid bogus MLD TLVs, and pass the proper connection state to sta_basic TLV. Cc: stable@vger.kernel.org Fixes: cb1353ef3473 ("wifi: mt76: mt7925: integrate *mlo_sta_cmd and *sta_cmd") Reported-by: Tal Inbar <inbartdev@gmail.com> Tested-by: Tal Inbar <inbartdev@gmail.com> Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://patch.msgid.link/20250818030201.997940-1-mingyen.hsieh@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: mt7925u: use connac3 tx aggr check in tx completeMing Yen Hsieh
MT7925 is a connac3 device; using the connac2 helper mis-parses TXWI and breaks AMPDU/BA accounting. Use the connac3-specific helper mt7925_tx_check_aggr() instead, Cc: stable@vger.kernel.org Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 chips") Reported-by: Nick Morrow <morrownr@gmail.com> Tested-by: Nick Morrow <morrownr@gmail.com> Tested-on: Netgear A9000 USB WiFi adapter Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://patch.msgid.link/20250818020203.992338-1-mingyen.hsieh@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: mt7925: fix the wrong bss cleanup for SAPMing Yen Hsieh
When in SAP mode, if a STA disconnect, the SAP's BSS should not be cleared. Fixes: 0ebb60da8416 ("wifi: mt76: mt7925: adjust rm BSS flow to prevent next connection failure") Cc: stable@vger.kernel.org Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://patch.msgid.link/20250728052612.39751-1-mingyen.hsieh@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: mt7925: fix locking in mt7925_change_vif_links()Harshit Mogalapalli
&dev->mt76.mutex lock is taken using mt792x_mutex_acquire(dev) but not released in one of the error paths, add the unlock to fix it. Fixes: 5cd0bd815c8a ("wifi: mt76: mt7925: fix NULL deref check in mt7925_change_vif_links") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202503031055.3ZRqxhAl-lkp@intel.com/ Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://patch.msgid.link/20250727140416.1153406-1-harshit.m.mogalapalli@oracle.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: mt7921: don't disconnect when CSA to DFS chanJanusz Dziedzic
When station mode, don't disconnect when we get channel switch from AP to DFS channel. Most APs send CSA request after pass background CAC. In other case we should disconnect after detect beacon miss. Without patch when we get CSA to DFS channel get: "kernel: wlo1: preparing for channel switch failed, disconnecting" Fixes: 8aa2f59260eb ("wifi: mt76: mt7921: introduce CSA support") Signed-off-by: Janusz Dziedzic <janusz.dziedzic@gmail.com> Link: https://patch.msgid.link/20250716165443.28354-1-janusz.dziedzic@gmail.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27wifi: mt76: mt7996: Initialize hdr before passing to skb_put_data()Nathan Chancellor
A new warning in clang [1] points out a couple of places where a hdr variable is not initialized then passed along to skb_put_data(). drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:1894:21: warning: variable 'hdr' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer] 1894 | skb_put_data(skb, &hdr, sizeof(hdr)); | ^~~ drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:3386:21: warning: variable 'hdr' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer] 3386 | skb_put_data(skb, &hdr, sizeof(hdr)); | ^~~ Zero initialize these headers as done in other places in the driver when there is nothing stored in the header. Cc: stable@vger.kernel.org Fixes: 98686cd21624 ("wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices") Link: https://github.com/llvm/llvm-project/commit/00dacf8c22f065cb52efb14cd091d441f19b319e [1] Closes: https://github.com/ClangBuiltLinux/linux/issues/2104 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20250715-mt7996-fix-uninit-const-pointer-v1-1-b5d8d11d7b78@kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-08-27x86/microcode/AMD: Handle the case of no BIOS microcodeBorislav Petkov (AMD)
Machines can be shipped without any microcode in the BIOS. Which means, the microcode patch revision is 0. Handle that gracefully. Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID") Reported-by: Vítek Vávra <vit.vavra.kh@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org>
2025-08-27Merge tag 'kvm-x86-fixes-6.17-rc7' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 fixes and a selftest fix for 6.17-rcN - Use array_index_nospec() to sanitize the target vCPU ID when handling PV IPIs and yields as the ID is guest-controlled. - Drop a superfluous cpumask_empty() check when reclaiming SEV memory, as the common case, by far, is that at least one CPU will have entered the VM, and wbnoinvd_on_cpus_mask() will naturally handle the rare case where the set of have_run_cpus is empty. - Rename the is_signed_type() macro in kselftest_harness.h to is_signed_var() to fix a collision with linux/overflow.h. The collision generates compiler warnings due to the two macros having different implementations.
2025-08-27ALSA: usb-audio: move mixer_quirks' min_mute into common quirkCryolitia PukNgae
We have found more and more devices that have the same problem, that the mixer's minimum value is muted. Accroding to pipewire's MR[1] and Arch Linux wiki[2], this should be a very common problem in USB audio devices. Move the quirk into common quirk,as a preparation of more devices' quirk's patch coming on the road[3]. 1. https://gitlab.freedesktop.org/pipewire/pipewire/-/merge_requests/2514 2. https://wiki.archlinux.org/index.php?title=PipeWire&oldid=804138#No_sound_from_USB_DAC_until_30%_volume 3. On the road, in the physical sense. We have been buying ton of these devices for testing the problem. Tested-by: Guoli An <anguoli@uniontech.com> Signed-off-by: Cryolitia PukNgae <cryolitia@uniontech.com> Link: https://patch.msgid.link/20250827-sound-quirk-min-mute-v1-1-4717aa8a4f6a@uniontech.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-08-26net: hv_netvsc: fix loss of early receive events from host during channel open.Dipayaan Roy
The hv_netvsc driver currently enables NAPI after opening the primary and subchannels. This ordering creates a race: if the Hyper-V host places data in the host -> guest ring buffer and signals the channel before napi_enable() has been called, the channel callback will run but napi_schedule_prep() will return false. As a result, the NAPI poller never gets scheduled, the data in the ring buffer is not consumed, and the receive queue may remain permanently stuck until another interrupt happens to arrive. Fix this by enabling NAPI and registering it with the RX/TX queues before vmbus channel is opened. This guarantees that any early host signal after open will correctly trigger NAPI scheduling and the ring buffer will be drained. Fixes: 76bb5db5c749d ("netvsc: fix use after free on module removal") Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20250825115627.GA32189@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26Merge branch 'net-stmmac-xgmac-minor-fixes'Jakub Kicinski
Rohan G Thomas says: ==================== net: stmmac: xgmac: Minor fixes This patch series includes following minor fixes for stmmac dwxgmac driver: 1. Disable Rx FIFO overflow interrupt for dwxgmac 2. Correct supported speed modes for dwxgmac 3. Check for coe-unsupported flag before setting CIC bit of Tx Desc3 in the AF_XDP flow v2: https://lore.kernel.org/20250816-xgmac-minor-fixes-v2-0-699552cf8a7f@altera.com v1: https://lore.kernel.org/20250714-xgmac-minor-fixes-v1-0-c34092a88a72@altera.com ==================== Link: https://patch.msgid.link/20250825-xgmac-minor-fixes-v3-0-c225fe4444c0@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net: stmmac: Set CIC bit only for TX queues with COERohan G Thomas
Currently, in the AF_XDP transmit paths, the CIC bit of TX Desc3 is set for all packets. Setting this bit for packets transmitting through queues that don't support checksum offloading causes the TX DMA to get stuck after transmitting some packets. This patch ensures the CIC bit of TX Desc3 is set only if the TX queue supports checksum offloading. Fixes: 132c32ee5bc0 ("net: stmmac: Add TX via XDP zero-copy socket") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Link: https://patch.msgid.link/20250825-xgmac-minor-fixes-v3-3-c225fe4444c0@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net: stmmac: xgmac: Correct supported speed modesRohan G Thomas
Correct supported speed modes as per the XGMAC databook. Commit 9cb54af214a7 ("net: stmmac: Fix IP-cores specific MAC capabilities") removes support for 10M, 100M and 1000HD. 1000HD is not supported by XGMAC IP, but it does support 10M and 100M FD mode for XGMAC version >= 2_20, and it also supports 10M and 100M HD mode if the HDSEL bit is set in the MAC_HW_FEATURE0 reg. This commit enables support for 10M and 100M speed modes for XGMAC IP based on XGMAC version and MAC capabilities. Fixes: 9cb54af214a7 ("net: stmmac: Fix IP-cores specific MAC capabilities") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Link: https://patch.msgid.link/20250825-xgmac-minor-fixes-v3-2-c225fe4444c0@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net: stmmac: xgmac: Do not enable RX FIFO Overflow interruptsRohan G Thomas
Enabling RX FIFO Overflow interrupts is counterproductive and causes an interrupt storm when RX FIFO overflows. Disabling this interrupt has no side effect and eliminates interrupt storms when the RX FIFO overflows. Commit 8a7cb245cf28 ("net: stmmac: Do not enable RX FIFO overflow interrupts") disables RX FIFO overflow interrupts for DWMAC4 IP and removes the corresponding handling of this interrupt. This patch is doing the same thing for XGMAC IP. Fixes: 2142754f8b9c ("net: stmmac: Add MAC related callbacks for XGMAC2") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250825-xgmac-minor-fixes-v3-1-c225fe4444c0@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26Merge branch 'mlx5-misc-fixes-2025-08-25'Jakub Kicinski
Mark Bloch says: ==================== mlx5 misc fixes 2025-08-25 This patchset provides misc bug fixes from the team to the mlx5 core and Eth drivers. v1: https://lore.kernel.org/20250824083944.523858-1-mbloch@nvidia.com ==================== Link: https://patch.msgid.link/20250825143435.598584-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net/mlx5e: Set local Xoff after FW updateAlexei Lazar
The local Xoff value is being set before the firmware (FW) update. In case of a failure where the FW is not updated with the new value, there is no fallback to the previous value. Update the local Xoff value after the FW has been successfully set. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-12-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net/mlx5e: Update and set Xon/Xoff upon port speed setAlexei Lazar
Xon/Xoff sizes are derived from calculations that include the port speed. These settings need to be updated and applied whenever the port speed is changed. The port speed is typically set after the physical link goes down and is negotiated as part of the link-up process between the two connected interfaces. Xon/Xoff parameters being updated at the point where the new negotiated speed is established. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-11-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net/mlx5e: Update and set Xon/Xoff upon MTU setAlexei Lazar
Xon/Xoff sizes are derived from calculation that include the MTU size. Set Xon/Xoff when MTU is set. If Xon/Xoff fails, set the previous MTU. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-10-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net/mlx5: Prevent flow steering mode changes in switchdev modeMoshe Shemesh
Changing flow steering modes is not allowed when eswitch is in switchdev mode. This fix ensures that any steering mode change, including to firmware steering, is correctly blocked while eswitch mode is switchdev. Fixes: e890acd5ff18 ("net/mlx5: Add devlink flow_steering_mode parameter") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-9-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net/mlx5: Nack sync reset when SFs are presentMoshe Shemesh
If PF (Physical Function) has SFs (Sub-Functions), since the SFs are not taking part in the synchronization flow, sync reset can lead to fatal error on the SFs, as the function will be closed unexpectedly from the SF point of view. Add a check to prevent sync reset when there are SFs on a PF device which is not ECPF, as ECPF is teardowned gracefully before reset. Fixes: 92501fa6e421 ("net/mlx5: Ack on sync_reset_request only if PF can do reset_now") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-8-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net/mlx5: Fix lockdep assertion on sync reset unload eventMoshe Shemesh
Fix lockdep assertion triggered during sync reset unload event. When the sync reset flow is initiated using the devlink reload fw_activate option, the PF already holds the devlink lock while handling unload event. In this case, delegate sync reset unload event handling back to the devlink callback process to avoid double-locking and resolve the lockdep warning. Kernel log: WARNING: CPU: 9 PID: 1578 at devl_assert_locked+0x31/0x40 [...] Call Trace: <TASK> mlx5_unload_one_devl_locked+0x2c/0xc0 [mlx5_core] mlx5_sync_reset_unload_event+0xaf/0x2f0 [mlx5_core] process_one_work+0x222/0x640 worker_thread+0x199/0x350 kthread+0x10b/0x230 ? __pfx_worker_thread+0x10/0x10 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x8e/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 7a9770f1bfea ("net/mlx5: Handle sync reset unload event") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-7-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net/mlx5: Reload auxiliary drivers on fw_activateMoshe Shemesh
The devlink reload fw_activate command performs firmware activation followed by driver reload, while devlink reload driver_reinit triggers only driver reload. However, the driver reload logic differs between the two modes, as on driver_reinit mode mlx5 also reloads auxiliary drivers, while in fw_activate mode the auxiliary drivers are suspended where applicable. Additionally, following the cited commit, if the device has multiple PFs, the behavior during fw_activate may vary between PFs: one PF may suspend auxiliary drivers, while another reloads them. Align devlink dev reload fw_activate behavior with devlink dev reload driver_reinit, to reload all auxiliary drivers. Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-6-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net/mlx5: HWS, Fix pattern destruction in mlx5hws_pat_get_pattern error pathLama Kayal
In mlx5hws_pat_get_pattern(), when mlx5hws_pat_add_pattern_to_cache() fails, the function attempts to clean up the pattern created by mlx5hws_cmd_header_modify_pattern_create(). However, it incorrectly uses *pattern_id which hasn't been set yet, instead of the local ptrn_id variable that contains the actual pattern ID. This results in attempting to destroy a pattern using uninitialized data from the output parameter, rather than the valid pattern ID returned by the firmware. Use ptrn_id instead of *pattern_id in the cleanup path to properly destroy the created pattern. Fixes: aefc15a0fa1c ("net/mlx5: HWS, added modify header pattern and args handling") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net/mlx5: HWS, Fix uninitialized variables in mlx5hws_pat_calc_nop error flowLama Kayal
In mlx5hws_pat_calc_nop(), src_field and dst_field are passed to hws_action_modify_get_target_fields() which should set their values. However, if an invalid action type is encountered, these variables remain uninitialized and are later used to update prev_src_field and prev_dst_field. Initialize both variables to INVALID_FIELD to ensure they have defined values in all code paths. Fixes: 01e035fd0380 ("net/mlx5: HWS, handle modify header actions dependency") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-4-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net/mlx5: HWS, Fix memory leak in hws_action_get_shared_stc_nic error flowLama Kayal
When an invalid stc_type is provided, the function allocates memory for shared_stc but jumps to unlock_and_out without freeing it, causing a memory leak. Fix by jumping to free_shared_stc label instead to ensure proper cleanup. Fixes: 504e536d9010 ("net/mlx5: HWS, added actions handling") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26net/mlx5: HWS, Fix memory leak in hws_pool_buddy_init error pathLama Kayal
In the error path of hws_pool_buddy_init(), the buddy allocator cleanup doesn't free the allocator structure itself, causing a memory leak. Add the missing kfree() to properly release all allocated memory. Fixes: c61afff94373 ("net/mlx5: HWS, added memory management handling") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-08-25 (ice, ixgbe) For ice: Emil adds a check to ensure auxiliary device was created before tear down to prevent NULL a pointer dereference. Jake reworks flow for failed Tx scheduler configuration to allow for proper recovery and operation. He also adjusts ice_adapter index for E825C devices as use of DSN is incompatible with this device. Michal corrects tracking of buffer allocation failure in ice_clean_rx_irq(). For ixgbe: Jedrzej adds __packed attribute to ixgbe_orom_civd_info to compatibility with device OROM data. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ixgbe: fix ixgbe_orom_civd_info struct layout ice: fix incorrect counter for buffer allocation failures ice: use fixed adapter index for E825C embedded devices ice: don't leave device non-functional if Tx scheduler config fails ice: fix NULL pointer dereference in ice_unplug_aux_dev() on reset ==================== Link: https://patch.msgid.link/20250825215019.3442873-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26Merge branch 'bnxt_en-3-bug-fixes'Jakub Kicinski
Michael Chan says: ==================== bnxt_en: 3 bug fixes The first one fixes a memory corruption issue that can happen when FW resources change during ifdown with TCs created. The next two fix FW resource reservation logic for TX rings and stats context. ==================== Link: https://patch.msgid.link/20250825175927.459987-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26bnxt_en: Fix stats context reservation logicMichael Chan
The HW resource reservation logic allows the L2 driver to use the RoCE resources if the RoCE driver is not registered. When calculating the stats contexts available for L2, we should not blindly subtract the stats contexts reserved for RoCE unless the RoCE driver is registered. This bug may cause the L2 rings to be less than the number requested when we are close to running out of stats contexts. Fixes: 2e4592dc9bee ("bnxt_en: Change MSIX/NQs allocation policy") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250825175927.459987-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26bnxt_en: Adjust TX rings if reservation is less than requestedMichael Chan
Before we accept an ethtool request to increase a resource (such as rings), we call the FW to check that the requested resource is likely available first before we commit. But it is still possible that the actual reservation or allocation can fail. The existing code is missing the logic to adjust the TX rings in case the reserved TX rings are less than requested. Add a warning message (a similar message for RX rings already exists) and add the logic to adjust the TX rings. Without this fix, the number of TX rings reported to the stack can exceed the actual TX rings and ethtool -l will report more than the actual TX rings. Fixes: 674f50a5b026 ("bnxt_en: Implement new method to reserve rings.") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250825175927.459987-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26bnxt_en: Fix memory corruption when FW resources change during ifdownSreekanth Reddy
bnxt_set_dflt_rings() assumes that it is always called before any TC has been created. So it doesn't take bp->num_tc into account and assumes that it is always 0 or 1. In the FW resource or capability change scenario, the FW will return flags in bnxt_hwrm_if_change() that will cause the driver to reinitialize and call bnxt_cancel_reservations(). This will lead to bnxt_init_dflt_ring_mode() calling bnxt_set_dflt_rings() and bp->num_tc may be greater than 1. This will cause bp->tx_ring[] to be sized too small and cause memory corruption in bnxt_alloc_cp_rings(). Fix it by properly scaling the TX rings by bp->num_tc in the code paths mentioned above. Add 2 helper functions to determine bp->tx_nr_rings and bp->tx_nr_rings_per_tc. Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250825175927.459987-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>