linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-07-10	Merge tag 'wireless-next-2025-07-10' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Quite a bit more work, notably: - mt76: firmware recovery improvements, MLO work - iwlwifi: use embedded PNVM in (to be released) FW images to fix compatibility issues - cfg80211/mac80211: extended regulatory info support (6 GHz) - cfg80211: use "faux device" for regulatory * tag 'wireless-next-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (48 commits) wifi: mac80211: don't complete management TX on SAE commit wifi: cfg80211/mac80211: implement dot11ExtendedRegInfoSupport wifi: mac80211: send extended MLD capa/ops if AP has it wifi: mac80211: copy first_part into HW scan wifi: cfg80211: add a flag for the first part of a scan wifi: mac80211: remove DISALLOW_PUNCTURING_5GHZ code wifi: cfg80211: only verify part of Extended MLD Capabilities wifi: nl80211: make nl80211_check_scan_flags() type safe wifi: cfg80211: hide scan internals wifi: mac80211: fix deactivated link CSA wifi: mac80211: add mandatory bitrate support for 6 GHz wifi: mac80211: remove spurious blank line wifi: mac80211: verify state before connection wifi: mac80211: avoid weird state in error path wifi: iwlwifi: mvm: remove support for iwl_wowlan_info_notif_v4 wifi: iwlwifi: bump minimum API version in BZ wifi: iwlwifi: mvm: remove unneeded argument wifi: iwlwifi: mvm: remove MLO GTK rekey code wifi: iwlwifi: pcie: rename iwl_pci_gen1_2_probe() argument wifi: iwlwifi: match discrete/integrated to fix some names ... ==================== Link: https://patch.msgid.link/20250710123113.24878-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	Merge tag 'wireless-2025-07-10' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Quite a number of fixes still: - mt76 (hadn't sent any fixes so far) - RCU - scanning - decapsulation offload - interface combinations - rt2x00: build fix (bad function pointer prototype) - cfg80211: prevent A-MSDU flipping attacks in mesh - zd1211rw: prevent race ending with NULL ptr deref - cfg80211/mac80211: more S1G fixes - mwifiex: avoid WARN on certain RX frames - mac80211: - avoid stack data leak in WARN cases - fix non-transmitted BSSID search (on certain multi-BSSID APs) - always initialize key list so driver iteration won't crash - fix monitor interface in device restart - fix __free() annotation usage * tag 'wireless-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (26 commits) wifi: mac80211: add the virtual monitor after reconfig complete wifi: mac80211: always initialize sdata::key_list wifi: mac80211: Fix uninitialized variable with __free() in ieee80211_ml_epcs() wifi: mt76: mt792x: Limit the concurrent STA and SoftAP to operate on the same channel wifi: mt76: mt7925: Fix null-ptr-deref in mt7925_thermal_init() wifi: mt76: fix queue assignment for deauth packets wifi: mt76: add a wrapper for wcid access with validation wifi: mt76: mt7921: prevent decap offload config before STA initialization wifi: mt76: mt7925: prevent NULL pointer dereference in mt7925_sta_set_decap_offload() wifi: mt76: mt7925: fix incorrect scan probe IE handling for hw_scan wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan wifi: mt76: mt7925: fix the wrong config for tx interrupt wifi: mt76: Remove RCU section in mt7996_mac_sta_rc_work() wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl() wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl_fixed() wifi: mt76: Move RCU section in mt7996_mcu_set_fixed_field() wifi: mt76: Assume __mt76_connac_mcu_alloc_sta_req runs in atomic context wifi: prevent A-MSDU attacks in mesh networks wifi: rt2x00: fix remove callback type mismatch wifi: mac80211: reject VHT opmode for unsupported channel widths ... ==================== Link: https://patch.msgid.link/20250710122212.24272-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	netfilter: flowtable: account for Ethernet header in nf_flow_pppoe_proto()	Eric Dumazet
	syzbot found a potential access to uninit-value in nf_flow_pppoe_proto() Blamed commit forgot the Ethernet header. BUG: KMSAN: uninit-value in nf_flow_offload_inet_hook+0x7e4/0x940 net/netfilter/nf_flow_table_inet.c:27 nf_flow_offload_inet_hook+0x7e4/0x940 net/netfilter/nf_flow_table_inet.c:27 nf_hook_entry_hookfn include/linux/netfilter.h:157 [inline] nf_hook_slow+0xe1/0x3d0 net/netfilter/core.c:623 nf_hook_ingress include/linux/netfilter_netdev.h:34 [inline] nf_ingress net/core/dev.c:5742 [inline] __netif_receive_skb_core+0x4aff/0x70c0 net/core/dev.c:5837 __netif_receive_skb_one_core net/core/dev.c:5975 [inline] __netif_receive_skb+0xcc/0xac0 net/core/dev.c:6090 netif_receive_skb_internal net/core/dev.c:6176 [inline] netif_receive_skb+0x57/0x630 net/core/dev.c:6235 tun_rx_batched+0x1df/0x980 drivers/net/tun.c:1485 tun_get_user+0x4ee0/0x6b40 drivers/net/tun.c:1938 tun_chr_write_iter+0x3e9/0x5c0 drivers/net/tun.c:1984 new_sync_write fs/read_write.c:593 [inline] vfs_write+0xb4b/0x1580 fs/read_write.c:686 ksys_write fs/read_write.c:738 [inline] __do_sys_write fs/read_write.c:749 [inline] Reported-by: syzbot+bf6ed459397e307c3ad2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/686bc073.a00a0220.c7b3.0086.GAE@google.com/T/#u Fixes: 87b3593bed18 ("netfilter: flowtable: validate pppoe header") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://patch.msgid.link/20250707124517.614489-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	net: replace ND_PRINTK with dynamic debug	Wang Liang
	ND_PRINTK with val > 1 only works when the ND_DEBUG was set in compilation phase. Replace it with dynamic debug. Convert ND_PRINTK with val <= 1 to net_{err,warn}_ratelimited, and convert the rest to net_dbg_ratelimited. Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250708033342.1627636-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	Merge branch 'riscv-sophgo-add-ethernet-support-for-sg2042'	Jakub Kicinski
	Inochi Amaoto says: ==================== riscv: sophgo: Add ethernet support for SG2042 The ethernet controller of SG2042 is Synopsys DesignWare IP with tx clock. Add device id for it. This patch can only be tested on a SG2042 evb board, as pioneer does not expose this device. The user dts patch link: https://lore.kernel.org/linux-riscv/cover.1751700954.git.rabenda.cn@gmail.com ==================== Link: https://patch.msgid.link/20250708064052.507094-1-inochiama@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	net: stmmac: platform: Add snps,dwmac-5.00a IP compatible string	Inochi Amaoto
	Add "snps,dwmac-5.30a" compatible string for 5.00a version that can avoid to define some platform data in the glue layer. Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Tested-by: Han Gao <rabenda.cn@gmail.com> Link: https://patch.msgid.link/20250708064052.507094-4-inochiama@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	net: stmmac: dwmac-sophgo: Add support for Sophgo SG2042 SoC	Inochi Amaoto
	Adds device id of the ethernet controller on the Sophgo SG2042 SoC. Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Tested-by: Han Gao <rabenda.cn@gmail.com> Link: https://patch.msgid.link/20250708064052.507094-3-inochiama@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	dt-bindings: net: sophgo,sg2044-dwmac: Add support for Sophgo SG2042 dwmac	Inochi Amaoto
	The GMAC IP on SG2042 is a standard Synopsys DesignWare MAC (version 5.00a) with tx clock. Add necessary compatible string for this device. Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Tested-by: Han Gao <rabenda.cn@gmail.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20250708064052.507094-2-inochiama@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	Merge branch 'further-mt7988-devicetree-work'	Jakub Kicinski
	Frank Wunderlich says: ==================== further mt7988 devicetree work This series continues mt7988 devicetree work - Extend cpu frequency scaling with CCI - GPIO leds - Basic network-support (ethernet controller + builtin switch + SFP Cages) depencies (i hope this list is complete and latest patches/series linked): support interrupt-names is optional again as i re-added the reserved IRQs (they are not unusable as i thought and can allow features in future) https://patchwork.kernel.org/project/netdevbpf/patch/20250619132125.78368-2-linux@fw-web.de/ needs change in mtk ethernet driver for the sram to be read from separate node: https://patchwork.kernel.org/project/netdevbpf/patch/c2b9242229d06af4e468204bcf42daa1535c3a72.1751461762.git.daniel@makrotopia.org/ for SFP-Function (macs currently disabled): PCS clearance which is a 1.5 year discussion currently ongoing Daniel asked netdev for a way 2 go: https://lore.kernel.org/netdev/aEwfME3dYisQtdCj@pidgin.makrotopia.org/ e.g. something like this (one of): * https://patchwork.kernel.org/project/netdevbpf/patch/20250610233134.3588011-4-sean.anderson@linux.dev/ (v6) * https://patchwork.kernel.org/project/netdevbpf/patch/20250511201250.3789083-4-ansuelsmth@gmail.com/ (v4) * https://patchwork.kernel.org/project/netdevbpf/patch/ba4e359584a6b3bc4b3470822c42186d5b0856f9.1721910728.git.daniel@makrotopia.org/ full usxgmii driver: https://patchwork.kernel.org/project/netdevbpf/patch/07845ec900ba41ff992875dce12c622277592c32.1702352117.git.daniel@makrotopia.org/ first PCS-discussion is here: https://patchwork.kernel.org/project/netdevbpf/patch/8aa905080bdb6760875d62cb3b2b41258837f80e.1702352117.git.daniel@makrotopia.org/ some more here: https://lore.kernel.org/netdev/20250511201250.3789083-4-ansuelsmth@gmail.com/ and then dts nodes for sgmiisys+usxgmii+2g5 firmware when above depencies are solved the mac1/2 can be enabled and 2.5G phy/SFP slots will work. ==================== Link: https://patch.msgid.link/20250709111147.11843-1-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	dt-bindings: net: dsa: mediatek,mt7530: add internal mdio bus	Frank Wunderlich
	Mt7988 buildin switch has own mdio bus where ge-phys are connected. Add related property for this. Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patch.msgid.link/20250709111147.11843-7-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	dt-bindings: net: dsa: mediatek,mt7530: add dsa-port definition for mt7988	Frank Wunderlich
	Add own dsa-port binding for SoC with internal switch where only phy-mode 'internal' is valid. Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patch.msgid.link/20250709111147.11843-6-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	dt-bindings: net: mediatek,net: add sram property	Frank Wunderlich
	Meditak Filogic SoCs (MT798x) have dedicated MMIO-SRAM for dma operations. MT7981 and MT7986 currently use static offset to ethernet MAC register which will be changed in separate patch once this way is accepted. Add "sram" property to map ethernet controller to dedicated mmio-sram node. Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250709111147.11843-5-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	dt-bindings: net: mediatek,net: allow irq names	Frank Wunderlich
	In preparation for MT7988 and RSS/LRO allow the interrupt-names property. In this way driver can request the interrupts by name which is much more readable in the driver code and SoC's dtsi than relying on a specific order. Frame-engine-IRQs (fe0..3): MT7621, MT7628: 1 FE-IRQ MT7622, MT7623: 3 FE-IRQs (only two used by the driver for now) MT7981, MT7986: 4 FE-IRQs (only two used by the driver for now) RSS/LRO IRQs (pdma0..3) additional only on Filogic (MT798x) with count of 4. So all IRQ-names (8) for Filogic. Set boundaries for all compatibles same as irq count. Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250709111147.11843-4-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	dt-bindings: net: mediatek,net: allow up to 8 IRQs	Frank Wunderlich
	Increase the maximum IRQ count to 8 (4 FE + 4 RSS/LRO). Frame-engine-IRQs (max 4): MT7621, MT7628: 1 FE-IRQ MT7622, MT7623: 3 FE-IRQs (only two used by the driver for now) MT7981, MT7986, MT7988: 4 FE-IRQs (only two used by the driver for now) Mediatek Filogic SoCs (mt798x) have 4 additional IRQs for RSS and/or LRO. So MT798x have 8 IRQs in total. MT7981 does not have a ethernet-node yet. MT7986 Ethernet node is updated with RSS/LRO IRQs in this series. MT7988 Ethernet node is added in this series. Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250709111147.11843-3-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	dt-bindings: net: mediatek,net: update mac subnode pattern for mt7988	Frank Wunderlich
	MT7888 have 3 Macs and so its nodes have names from mac0 - mac2. Update pattern to fix this. Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250709111147.11843-2-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	irqchip/irq-msi-lib: Fix build with PCI disabled	Arnd Bergmann
	The armada-370-xp irqchip fails in some randconfig builds because of a missing declaration: In file included from drivers/irqchip/irq-armada-370-xp.c:23: include/linux/irqchip/irq-msi-lib.h:25:39: error: 'struct msi_domain_info' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] Add a forward declaration for the msi_domain_info structure. [ tglx: Fixed up the subsystem prefix. Is it really that hard to get right? ] Fixes: e51b27438a10 ("irqchip: Make irq-msi-lib.h globally available") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/all/20250710080021.2303640-1-arnd@kernel.org
2025-07-10	PCI/MSI: Prevent recursive locking in pci_msix_write_tph_tag()	Himanshu Madhani
	pci_msix_write_tph_tag() takes the per device MSI descriptor mutex and then invokes msi_domain_get_virq(), which takes the same mutex again. That obviously results in a system hang which is exposed by a softlockup or lockdep warning. Move the lock guard after the invocation of msi_domain_get_virq() to fix this. [ tglx: Massage changelog by adding a proper explanation and removing the not really useful stacktrace ] Fixes: d5124a9957b2 ("PCI/MSI: Provide a sane mechanism for TPH") Reported-by: Jorge Lopez <jorge.jo.lopez@oracle.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jorge Lopez <jorge.jo.lopez@oracle.com> Link: https://lore.kernel.org/all/20250708222530.1041477-1-himanshu.madhani@oracle.com
2025-07-10	ice: introduce ice_get_vf_by_dev() wrapper	Jacob Keller
	The ice_get_vf_by_id() function is used to obtain a reference to a VF structure based on its ID. The ice_sriov_set_msix_vec_count() function needs to get a VF reference starting from the VF PCI device, and uses pci_iov_vf_id() to get the VF ID. This pattern is currently uncommon in the ice driver. However, the live migration module will introduce many more such locations. Add a helper wrapper ice_get_vf_by_dev() which takes the VF PCI device and calls ice_get_vf_by_id() using pci_iov_vf_id() to get the VF ID. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10	ice: avoid rebuilding if MSI-X vector count is unchanged	Jacob Keller
	Commit 05c16687e0cc ("ice: set MSI-X vector count on VF") added support to change the vector count for VFs as part of ice_sriov_set_msix_vec_count(). This function modifies and rebuilds the target VF with the requested number of MSI-X vectors. Future support for live migration will add a call to ice_sriov_set_msix_vec_count() to ensure that a migrated VF has the proper MSI-X vector count. In most cases, this request will be to set the MSI-X vector count to its current value. In that case, no work is necessary. Rather than requiring the caller to check this, update the function to check and exit early if the vector count is already at the requested value. This avoids an unnecessary VF rebuild. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10	ice: use pci_iov_vf_id() to get VF ID	Jacob Keller
	The ice_sriov_set_msix_vec_count() obtains the VF device ID in a strange way by iterating over the possible VF IDs and calling pci_iov_virtfn_devfn to calculate the device and function combos and compare them to the pdev->devfn. This is unnecessary. The pci_iov_vf_id() helper already exists which does the reverse calculation of pci_iov_virtfn_devfn(), which is much simpler and avoids the loop construction. Use this instead. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10	ice: expose VF functions used by live migration	Jacob Keller
	The live migration process will require configuring the target VF with the data provided from the source host. A few helper functions in ice_sriov.c and ice_virtchnl.c will be needed for this process, but are currently static. Expose these functions in their respective headers so that the live migration module can use them during the migration process. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10	ice: move ice_vsi_update_l2tsel to ice_lib.c	Jacob Keller
	A future change is going to need to call ice_vsi_update_l2tsel from a new context outside of ice_virtchnl.c Since this function deals with a generic VSI, move it into ice_lib.c to enable calling it from other places in the ice driver. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10	ice: save RSS hash configuration for migration	Jacob Keller
	The VF can program the RSS hash configuration over virtchnl. It does this by sending a u64 bitmask which represents the current hash configuration. It is not trivial to reverse the hardware configuration back to this hash set for migration. Instead, save the value to the ice_vf structure when its modified by the VF. The rss_hashcfg value is an 8-byte field. Make room for it in ice_vf by re-arranging some of the existing fields. There is a 4-byte gap after the first_vector_idx, and a 4-byte gap between max_tx_rate and vf_states. Move first_vector_idx into the later 4-byte gap, creating an 8 byte area where rss_hashcfg can be placed. Also move the num_msix field near min_tx_rate, filling 2 bytes of a 3 byte hole. The end result of these changes enables placing the rss_hashcfg field into the structure while also saving 8 bytes in size. It looks like there are a handful of more possible cleanups to reduce the size even further, but those have been left as a future cleanup. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10	ice: add functions to get and set Tx queue context	Jacob Keller
	The live migration driver will need to save and restore the Tx queue context state from the hardware registers. This state contains both static fields which do not change during Tx traffic as well as dynamic fields which may change during Tx traffic. Unlike the Rx context, the Tx queue context is accessed indirectly from GLCOMM_QTX_CNTX_CTL and GLCOMM_QTX_CNTX_DATA registers. These registers are shared by multiple PFs on the same PCIe card. Multiple PFs cannot safely access the registers simultaneously, and there is no hardware semaphore or logic to control access. To handle this, introduce the txq_ctx_lock to the ice_adapter structure. This is similar to the ptp_gltsyn_time_lock. All PFs on the same adapter share this structure, and use it to serialize access to the registers to prevent error. Add a new functions to get and set the Tx queue context through the GLCOMM_QTX_CNTX_CTL interface. The hardware context values are stored in the registers using the same packed format as the Admin Queue buffer. The hardware buffer is 40 bytes wide, as it contains an additional 18 bytes of internal state not sent with the Admin Queue buffer. For this reason, a separate typedef and packing function must be used. We can share the same packed fields definitions because we never need to unpack the internal state. This is preferred, as it ensures the internal state is zero'd when writing into HW, and avoids issues with reading by u32 registers into a buffer of 22 bytes in length. Thanks to the typedefs, misuse of the API with the wrong size buffer can easily be caught at compile time. Note reading this data from hardware is essential because the current Tx queue context may be different from the context as initially programmed by the driver during VF initialization. When migrating a VF we must ensure the target VF has identical context as the source VF did. Co-developed-by: Yahui Cao <yahui.cao@intel.com> Signed-off-by: Yahui Cao <yahui.cao@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10	Merge branch 'virtio_udp_tunnel_08_07_2025' of ↵	Jakub Kicinski
	https://github.com/pabeni/linux-devel Paolo Abeni says: ==================== virtio: introduce GSO over UDP tunnel Some virtualized deployments use UDP tunnel pervasively and are impacted negatively by the lack of GSO support for such kind of traffic in the virtual NIC driver. The virtio_net specification recently introduced support for GSO over UDP tunnel, this series updates the virtio implementation to support such a feature. Currently the kernel virtio support limits the feature space to 64, while the virtio specification allows for a larger number of features. Specifically the GSO-over-UDP-tunnel-related virtio features use bits 65-69. The first four patches in this series rework the virtio and vhost feature support to cope with up to 128 bits. The limit is set by a define and could be easily raised in future, as needed. This implementation choice is aimed at keeping the code churn as limited as possible. For the same reason, only the virtio_net driver is reworked to leverage the extended feature space; all other virtio/vhost drivers are unaffected, but could be upgraded to support the extended features space in a later time. The last four patches bring in the actual GSO over UDP tunnel support. As per specification, some additional fields are introduced into the virtio net header to support the new offload. The presence of such fields depends on the negotiated features. New helpers are introduced to convert the UDP-tunneled skb metadata to an extended virtio net header and vice versa. Such helpers are used by the tun and virtio_net driver to cope with the newly supported offloads. Tested with basic stream transfer with all the possible permutations of host kernel/qemu/guest kernel with/without GSO over UDP tunnel support. ==================== Link: https://patch.msgid.link/cover.1751874094.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	ice: add support for reading and unpacking Rx queue context	Jacob Keller
	In order to support live migration, the ice driver will need to read certain data from the Rx queue context. This is stored in the hardware in a packed format. Since we use <linux/packing.h> for the mapping between the packed hardware format and the unpacked structure, it is trivial to enable unpacking support via the unpack_fields() function. Add the ice_unpack_rxq_ctx() function based on the unpack_fields() API. Re-use the same field definitions from the packing implementation. Add ice_copy_rxq_ctx_from_hw() to copy the Rx queue context data from the hardware registers. Use these to implement ice_read_rxq_ctx() which will return the Rx queue context to the caller in its unpacked ice_rlan_ctx struct. This will enable the migration logic access to the relevant data about the Rx device queues. It can easily be copied to the target system as part of the migration payload, where it will be used to configure the Rx queues. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-6.16-rc6). No conflicts. Adjacent changes: Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml 0a12c435a1d6 ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible") b3603c0466a8 ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10	wifi: iwlwifi: mvm: fix scan request validation	Avraham Stern
	The scan request validation function uses bitwise and instead of logical and. Fix it. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.3fbc1f27871b.I7a8ee91f463c1a2d9d8561c8232e196885d02c43@changeid
2025-07-10	wifi: iwlwifi: pcie: add a missing include	Miri Korenblit
	pcie/utils.h needs to include iwl-io.h for the iwl_read/iwl_write calls. Add it. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.716e8b54ebcb.If75c28a85b5ba4c2661bdf4ce20b97dbe7d2abb2@changeid
2025-07-10	wifi: iwlwifi: trans: remove retake_ownership parameter from sw_reset	Itamar Shalev
	Remove the retake_ownership parameter from the sw_reset function, as it was always set to true and is not needed by other opmodes. Simplify the sw_reset API function. Signed-off-by: Itamar Shalev <itamar.shalev@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.0a103d021815.I2a3da6f83aa691496a53a548bd73bddd4d4d2db8@changeid
2025-07-10	wifi: iwlwifi: assign a FW API range for GF	Miri Korenblit
	GF device is frozen on API 100, so it is not allowed to use FW APIs higher than that. Make sure of that by assigning a MIN and MAX API range for GF. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.3409de06db40.I2110ee6c0a2f5ff1e16156c5875f83d7a1723857@changeid
2025-07-10	wifi: iwlwifi: assign a FW API range for HR	Miri Korenblit
	HR device is frozen on API 100, so it is not allowed to use FW APIs higher than that. Make sure of that by assigning a MIN and MAX API range for HR. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.ea54c00de44d.I47340ecaefbf40bb0bd254485d242b7f39df85b1@changeid
2025-07-10	wifi: iwlwifi: pcie: accept new devices for MVM-only configs	Johannes Berg
	For newer MACs, the MVM opmode may be used for older firmware images or when the RF isn't EHT/WiFi7 capable. List such devices in the PCI device list when MLD isn't built. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.483c8112f655.Ic05530048fc0b67b1cd8772882a595d56b204e65@changeid
2025-07-10	wifi: iwlwifi: pcie: inform me when op mode leaving	Itamar Shalev
	Transport gen2 didn't inform ME when the op mode is leaving. Signed-off-by: Itamar Shalev <itamar.shalev@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.abd840f5e998.I3a3fe174ea55a30daa04a0a3e9a6264913677045@changeid
2025-07-10	wifi: iwlwifi: simplify iwl_poll_bits_mask return value	Itamar Shalev
	Update iwl_poll_bits_mask to return 0 on success or an error code. Remove timing information from the return value, as it is unused. Signed-off-by: Itamar Shalev <itamar.shalev@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.f77b9f484a78.Iae8ef99a94e25c23044e2c36244cda2b55328447@changeid
2025-07-10	wifi: iwlwifi: mvm: remove support for iwl_wowlan_status_v9	Miri Korenblit
	FWs with this version are no longer supported on any device. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.22864efb5074.I51f270f8848970fd2ca1078c14ad31f4a8853e7d@changeid
2025-07-10	wifi: iwlwifi: mvm: remove support for iwl_wowlan_status_v12	Miri Korenblit
	FWs with this version are no longer supported on any device. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.1b9177bfbe1d.I53c1527cc5097f05df352b6f2f99282b00a5d7ac@changeid
2025-07-10	wifi: iwlwifi: add a reference to iwl_wowlan_info_notif_v3	Miri Korenblit
	Mark this structure as one of the structures that represent WOWLAN_INFO_NOTIFICATION Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.19ebfa430c5c.Ie5aca3f0af11cc3137c6b6862a13777bae0cb06b@changeid
2025-07-10	wifi: iwlwifi: mvm: remove support for iwl_wowlan_info_notif_v2	Miri Korenblit
	FWs with this version are no longer supported on any device. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.d92f63207232.I8961ffbe04d0d9439d48a17840497ac926967914@changeid
2025-07-10	wifi: iwlwifi: bump minimum API version for SO/MA/TY	Miri Korenblit
	Stop supporting older FWs. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709230308.64f504f3690d.Idc95ca09101e52b4980b292945abe944c24fc5d1@changeid
2025-07-10	wifi: iwlwifi: assign a FW API range for JF	Miri Korenblit
	JF device is frozen on API 77. This prevented us from bumping the minimum FW API of SO (and get rid of older FWs). This is because SO can be combined with JF and then FW API 77 should be used. Now as we have separate FW API ranges for the mac and the crf, we can define for JF its own FW API range. This will allow bumping the minimum FW API of SO Independently. Do that now. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250709200543.1628666-3-miriam.rachel.korenblit@intel.com
2025-07-10	wifi: iwlwifi: handle non-overlapping API ranges	Miri Korenblit
	The option to set an api_version_min/max also to the RF was added. In the case that both the MAC and the RF has a range defined, we take the narrower range of both. This doesn't work for non-overlapping ranges. In this case, we should just take the lower range of both. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709200543.1628666-2-miriam.rachel.korenblit@intel.com
2025-07-10	Merge tag 'net-6.16-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth. Current release - regressions: - tcp: refine sk_rcvbuf increase for ooo packets - bluetooth: fix attempting to send HCI_Disconnect to BIS handle - rxrpc: fix over large frame size warning - eth: bcmgenet: initialize u64 stats seq counter Previous releases - regressions: - tcp: correct signedness in skb remaining space calculation - sched: abort __tc_modify_qdisc if parent class does not exist - vsock: fix transport_{g2h,h2g} TOCTOU - rxrpc: fix bug due to prealloc collision - tipc: fix use-after-free in tipc_conn_close(). - bluetooth: fix not marking Broadcast Sink BIS as connected - phy: qca808x: fix WoL issue by utilizing at8031_set_wol() - eth: am65-cpsw-nuss: fix skb size by accounting for skb_shared_info Previous releases - always broken: - netlink: fix wraparounds of sk->sk_rmem_alloc. - atm: fix infinite recursive call of clip_push(). - eth: - stmmac: fix interrupt handling for level-triggered mode in DWC_XGMAC2 - rtsn: fix a null pointer dereference in rtsn_probe()" * tag 'net-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits) net/sched: sch_qfq: Fix null-deref in agg_dequeue rxrpc: Fix oops due to non-existence of prealloc backlog struct rxrpc: Fix bug due to prealloc collision MAINTAINERS: remove myself as netronome maintainer selftests/net: packetdrill: add tcp_ooo-before-and-after-accept.pkt tcp: refine sk_rcvbuf increase for ooo packets net/sched: Abort __tc_modify_qdisc if parent class does not exist net: ethernet: ti: am65-cpsw-nuss: Fix skb size by accounting for skb_shared_info net: thunderx: avoid direct MTU assignment after WRITE_ONCE() selftests/tc-testing: Create test case for UAF scenario with DRR/NETEM/BLACKHOLE chain atm: clip: Fix NULL pointer dereference in vcc_sendmsg() atm: clip: Fix infinite recursive call of clip_push(). atm: clip: Fix memory leak of struct clip_vcc. atm: clip: Fix potential null-ptr-deref in to_atmarpd(). net: phy: smsc: Fix link failure in forced mode with Auto-MDIX net: phy: smsc: Force predictable MDI-X state on LAN87xx net: phy: smsc: Fix Auto-MDIX configuration when disabled by strap net: stmmac: Fix interrupt handling for level-triggered mode in DWC_XGMAC2 rxrpc: Fix over large frame size warning net: airoha: Fix an error handling path in airoha_probe() ...
2025-07-10	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull KVM fixes from Paolo Bonzini: "Many patches, pretty much all of them small, that accumulated while I was on vacation. ARM: - Remove the last leftovers of the ill-fated FPSIMD host state mapping at EL2 stage-1 - Fix unexpected advertisement to the guest of unimplemented S2 base granule sizes - Gracefully fail initialising pKVM if the interrupt controller isn't GICv3 - Also gracefully fail initialising pKVM if the carveout allocation fails - Fix the computing of the minimum MMIO range required for the host on stage-2 fault - Fix the generation of the GICv3 Maintenance Interrupt in nested mode x86: - Reject SEV{-ES} intra-host migration if one or more vCPUs are actively being created, so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM - Use a pre-allocated, per-vCPU buffer for handling de-sparsification of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too large" issue - Allow out-of-range/invalid Xen event channel ports when configuring IRQ routing, to avoid dictating a specific ioctl() ordering to userspace - Conditionally reschedule when setting memory attributes to avoid soft lockups when userspace converts huge swaths of memory to/from private - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest - Add a missing field in struct sev_data_snp_launch_start that resulted in the guest-visible workarounds field being filled at the wrong offset - Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid VM-Fail on INVVPID - Advertise supported TDX TDVMCALLs to userspace - Pass SetupEventNotifyInterrupt arguments to userspace - Fix TSC frequency underflow" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: avoid underflow when scaling TSC frequency KVM: arm64: Remove kvm_arch_vcpu_run_map_fp() KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes KVM: arm64: Don't free hyp pages with pKVM on GICv2 KVM: arm64: Fix error path in init_hyp_mode() KVM: arm64: Adjust range correctly during host stage-2 faults KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi() KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush KVM: SVM: Add missing member in SNP_LAUNCH_START command structure Documentation: KVM: Fix unexpected unindent warnings KVM: selftests: Add back the missing check of MONITOR/MWAIT availability KVM: Allow CPU to reschedule while setting per-page memory attributes KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table. KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
2025-07-10	drm/i915/bios: Apply vlv_fixup_mipi_sequences() to v2 mipi-sequences too	Hans de Goede
	It turns out that the fixup from vlv_fixup_mipi_sequences() is necessary for some DSI panel's with version 2 mipi-sequences too. Specifically the Acer Iconia One 8 A1-840 (not to be confused with the A1-840FHD which is different) has the following sequences: BDB block 53 (1284 bytes) - MIPI sequence block: Sequence block version v2 Panel 0 * Sequence 2 - MIPI_SEQ_INIT_OTP GPIO index 9, source 0, set 0 (0x00) Delay: 50000 us GPIO index 9, source 0, set 1 (0x01) Delay: 6000 us GPIO index 9, source 0, set 0 (0x00) Delay: 6000 us GPIO index 9, source 0, set 1 (0x01) Delay: 25000 us Send DCS: Port A, VC 0, LP, Type 39, Length 5, Data ff aa 55 a5 80 Send DCS: Port A, VC 0, LP, Type 39, Length 3, Data 6f 11 00 ... Send DCS: Port A, VC 0, LP, Type 05, Length 1, Data 29 Delay: 120000 us Sequence 4 - MIPI_SEQ_DISPLAY_OFF Send DCS: Port A, VC 0, LP, Type 05, Length 1, Data 28 Delay: 105000 us Send DCS: Port A, VC 0, LP, Type 05, Length 2, Data 10 00 Delay: 10000 us Sequence 5 - MIPI_SEQ_ASSERT_RESET Delay: 10000 us GPIO index 9, source 0, set 0 (0x00) Notice how there is no MIPI_SEQ_DEASSERT_RESET, instead the deassert is done at the beginning of MIPI_SEQ_INIT_OTP, which is exactly what the fixup from vlv_fixup_mipi_sequences() fixes up. Extend it to also apply to v2 sequences, this fixes the panel not working on the Acer Iconia One 8 A1-840. Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14605 Signed-off-by: Hans de Goede <hansg@kernel.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20250703143824.7121-1-hansg@kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 11895f375939d60efe7ed5dddc1cffe2e79f976c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10	dm-bufio: fix sched in atomic context	Sheng Yong
	If "try_verify_in_tasklet" is set for dm-verity, DM_BUFIO_CLIENT_NO_SLEEP is enabled for dm-bufio. However, when bufio tries to evict buffers, there is a chance to trigger scheduling in spin_lock_bh, the following warning is hit: BUG: sleeping function called from invalid context at drivers/md/dm-bufio.c:2745 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 123, name: kworker/2:2 preempt_count: 201, expected: 0 RCU nest depth: 0, expected: 0 4 locks held by kworker/2:2/123: #0: ffff88800a2d1548 ((wq_completion)dm_bufio_cache){....}-{0:0}, at: process_one_work+0xe46/0x1970 #1: ffffc90000d97d20 ((work_completion)(&dm_bufio_replacement_work)){....}-{0:0}, at: process_one_work+0x763/0x1970 #2: ffffffff8555b528 (dm_bufio_clients_lock){....}-{3:3}, at: do_global_cleanup+0x1ce/0x710 #3: ffff88801d5820b8 (&c->spinlock){....}-{2:2}, at: do_global_cleanup+0x2a5/0x710 Preemption disabled at: [<0000000000000000>] 0x0 CPU: 2 UID: 0 PID: 123 Comm: kworker/2:2 Not tainted 6.16.0-rc3-g90548c634bd0 #305 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: dm_bufio_cache do_global_cleanup Call Trace: <TASK> dump_stack_lvl+0x53/0x70 __might_resched+0x360/0x4e0 do_global_cleanup+0x2f5/0x710 process_one_work+0x7db/0x1970 worker_thread+0x518/0xea0 kthread+0x359/0x690 ret_from_fork+0xf3/0x1b0 ret_from_fork_asm+0x1a/0x30 </TASK> That can be reproduced by: veritysetup format --data-block-size=4096 --hash-block-size=4096 /dev/vda /dev/vdb SIZE=$(blockdev --getsz /dev/vda) dmsetup create myverity -r --table "0 $SIZE verity 1 /dev/vda /dev/vdb 4096 4096 <data_blocks> 1 sha256 <root_hash> <salt> 1 try_verify_in_tasklet" mount /dev/dm-0 /mnt -o ro echo 102400 > /sys/module/dm_bufio/parameters/max_cache_size_bytes [read files in /mnt] Cc: stable@vger.kernel.org # v6.4+ Fixes: 450e8dee51aa ("dm bufio: improve concurrent IO performance") Signed-off-by: Wang Shuai <wangshuai12@xiaomi.com> Signed-off-by: Sheng Yong <shengyong1@xiaomi.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-07-10	Merge branch 'net-dsa-rzn1_a5psw-add-compile_test'	Paolo Abeni
	Rosen Penev says: ==================== net: dsa: rzn1_a5psw: add COMPILE_TEST Allows the various bots to test compilation. Also threw in a small devm conversion for enabling clocks. ==================== Link: https://patch.msgid.link/20250708014144.2514-1-rosenp@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-10	net: dsa: rzn1_a5psw: use devm to enable clocks	Rosen Penev
	The remove function has these in the wrong order. The switch should be unregistered last. Simpler to use devm so that the right thing is done. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20250708014144.2514-3-rosenp@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-10	net: dsa: rzn1_a5psw: add COMPILE_TEST	Rosen Penev
	There's no architecture specific requirement for it to compile. Allows the bots to test compilation properly. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250708014144.2514-2-rosenp@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-10	net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt	Jason Xing
	This patch provides a setsockopt method to let applications leverage to adjust how many descs to be handled at most in one send syscall. It mitigates the situation where the default value (32) that is too small leads to higher frequency of triggering send syscall. Considering the prosperity/complexity the applications have, there is no absolutely ideal suggestion fitting all cases. So keep 32 as its default value like before. The patch does the following things: - Add XDP_MAX_TX_SKB_BUDGET socket option. - Set max_tx_budget to 32 by default in the initialization phase as a per-socket granular control. - Set the range of max_tx_budget as [32, xs->tx->nentries]. The idea behind this comes out of real workloads in production. We use a user-level stack with xsk support to accelerate sending packets and minimize triggering syscalls. When the packets are aggregated, it's not hard to hit the upper bound (namely, 32). The moment user-space stack fetches the -EAGAIN error number passed from sendto(), it will loop to try again until all the expected descs from tx ring are sent out to the driver. Enlarging the XDP_MAX_TX_SKB_BUDGET value contributes to less frequency of sendto() and higher throughput/PPS. Here is what I did in production, along with some numbers as follows: For one application I saw lately, I suggested using 128 as max_tx_budget because I saw two limitations without changing any default configuration: 1) XDP_MAX_TX_SKB_BUDGET, 2) socket sndbuf which is 212992 decided by net.core.wmem_default. As to XDP_MAX_TX_SKB_BUDGET, the scenario behind this was I counted how many descs are transmitted to the driver at one time of sendto() based on [1] patch and then I calculated the possibility of hitting the upper bound. Finally I chose 128 as a suitable value because 1) it covers most of the cases, 2) a higher number would not bring evident results. After twisting the parameters, a stable improvement of around 4% for both PPS and throughput and less resources consumption were found to be observed by strace -c -p xxx: 1) %time was decreased by 7.8% 2) error counter was decreased from 18367 to 572 [1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/ Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20250704160138.48677-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>