linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-06-16	gve: Add support to query the nic clock	Kevin Yang
	Query the nic clock and store the results. The timestamp delivered in descriptors has a wraparound time of ~4 seconds so 250ms is chosen as the sync cadence to provide a balance between performance, and drift potential when we do start associating host time and nic time. Leverage PTP's aux_work to query the nic clock periodically. Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Tim Hostetler <thostet@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-6-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	gve: Add adminq lock for queues creation and destruction	Ziwei Xiao
	Adminq commands for queues creation and destruction were not consistently protected by the driver's adminq_lock. This was previously benign as these operations were always initiated from contexts holding kernel-level locks (e.g., rtnl_lock, netdev_lock), which provided serialization. Upcoming PTP aux_work will issue adminq commands directly from the driver to read the NIC clock, without such kernel lock protection. To prevent race conditions with this new PTP work, this patch ensures the adminq_lock is held during queues creation and destruction. Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-5-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	gve: Add initial PTP device support	Harshitha Ramamurthy
	If the device supports reading of the nic clock, add support to initialize and register the PTP clock. Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-4-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	gve: Add adminq command to report nic timestamp	John Fraker
	Add an adminq command to read NIC's hardware clock. The driver allocates dma memory and passes that dma memory address to the device. The device then writes the clock to the given address. Signed-off-by: Jeff Rogers <jefrogers@google.com> Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-3-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	gve: Add device option for nic clock synchronization	John Fraker
	Add the device option and negotiation with the device for clock synchronization with the nic. This option is necessary before the driver will advertise support for hardware timestamping or other related features. Signed-off-by: Jeff Rogers <jefrogers@google.com> Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-2-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	net: mana: Add handler for hardware servicing events	Haiyang Zhang
	To collaborate with hardware servicing events, upon receiving the special EQE notification from the HW channel, remove the devices on this bus. Then, after a waiting period based on the device specs, rescan the parent bus to recover the devices. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1749834034-18498-1-git-send-email-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	Merge branch 'netpoll-untangle-netconsole-and-netpoll'	Jakub Kicinski
	Breno Leitao says: ==================== netpoll: Untangle netconsole and netpoll Initially netpoll and netconsole were created together, and some functions are in the wrong file. Seperate netconsole-only functions in netconsole, avoiding exports. 1. Expose netpoll logging macros in the public header to enable consistent log formatting across netpoll consumers. 2. Relocate netconsole-specific functions from netpoll to the netconsole module where they are actually used, reducing unnecessary coupling. 3. Remove unnecessary function exports 4. Rename netpoll parsing functions in netconsole to better reflect their specific usage. 5. Create a test to check that cmdline works fine. This was in my todo list since [1], this was a good time to add it here to make sure this patchset doesn't regress. PS: The code was split in a way that it is easy to review. When copying the functions from netpoll to netconsole, I do not change than other than adding `static`. This will make checkpatch unhappy, but, further patches will address the issues. It is done this way to make it easy for reviewers. Link: https://lore.kernel.org/netdev/Z36TlACdNMwFD7wv@dev-ushankar.dev.purestorage.com/ [1] v2: https://lore.kernel.org/20250611-rework-v2-0-ab1d92b458ca@debian.org v1: https://lore.kernel.org/20250610-rework-v1-0-7cfde283f246@debian.org ==================== Link: https://patch.msgid.link/20250613-rework-v3-0-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	selftests: net: add netconsole test for cmdline configuration	Breno Leitao
	Add a new selftest to verify netconsole module loading with command line arguments. This test exercises the init_netconsole() path and validates proper parsing of the netconsole= parameter format. The test: - Loads netconsole module with cmdline configuration instead of dynamic reconfiguration - Validates message transmission through the configured target - Adds helper functions for cmdline string generation and module validation This complements existing netconsole selftests by covering the module initialization code path that processes boot-time parameters. This test is useful to test issues like the one described in [1]. Link: https://lore.kernel.org/netdev/Z36TlACdNMwFD7wv@dev-ushankar.dev.purestorage.com/ [1] Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-8-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	selftests: net: Refactor cleanup logic in lib_netcons.sh	Breno Leitao
	Extract the network device and namespace cleanup logic from the cleanup() function into a new do_cleanup() helper in lib_netcons.sh. The do_cleanup() function only unconfigure the network and printk, while cleanup() cleans the netconsole targets plus the network and printk. This refactoring let this code to be reused in cases netconsole dynamic is not being used, as in the upcoming patch. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-7-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	netconsole: improve code style in parser function	Breno Leitao
	Split assignment from conditional checks and use preferred null pointer check style (!delim instead of == NULL) in netconsole_parser_cmdline(). This improves code readability and follows kernel coding style conventions. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-6-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	netconsole: rename functions to better reflect their purpose	Breno Leitao
	Rename netpoll_parse_options() to netconsole_parser_cmdline() and netpoll_print_options() to netconsole_print_banner() to better describe what these functions actually do within the netconsole context. Also fix minor code style issues including variable declaration ordering and spacing. These functions are specific to netconsole functionality rather than general netpoll operations, so the new names better reflect their actual purpose. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-5-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	netpoll: move netpoll_print_options to netconsole	Breno Leitao
	Move netpoll_print_options() from net/core/netpoll.c to drivers/net/netconsole.c and make it static. This function is only used by netconsole, so there's no need to export it or keep it in the public netpoll API. This reduces the netpoll API surface and improves code locality by keeping netconsole-specific functionality within the netconsole driver. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-4-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	netpoll: relocate netconsole-specific functions to netconsole module	Breno Leitao
	Move netpoll_parse_ip_addr() and netpoll_parse_options() from the generic netpoll module to the netconsole module where they are actually used. These functions were originally placed in netpoll but are only consumed by netconsole. This refactoring improves code organization by: - Removing unnecessary exported symbols from netpoll - Making netpoll_parse_options() static (no longer needs global visibility) - Reducing coupling between netpoll and netconsole modules The functions remain functionally identical - this is purely a code reorganization to better reflect their actual usage patterns. Here are the changes: 1) Move both functions from netpoll to netconsole 2) Add static to netpoll_parse_options() 3) Removed the EXPORT_SYMBOL() PS: This diff does not change the function format, so, it is easy to review, but, checkpatch will not be happy. A follow-up patch will address the current issues reported by checkpatch. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-3-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	netpoll: expose netpoll logging macros in public header	Breno Leitao
	Move np_info(), np_err(), and np_notice() macros from internal implementation to the public netpoll header file to make them available for use by netpoll consumers. These logging macros provide consistent formatting for netpoll-related messages by automatically prefixing log output with the netpoll instance name. The goal is to use the exact same format that is being displayed today, instead of creating something netconsole-specific. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-2-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	netpoll: remove __netpoll_cleanup from exported API	Breno Leitao
	Since commit 97714695ef90 ("net: netconsole: Defer netpoll cleanup to avoid lock release during list traversal"), netconsole no longer uses __netpoll_cleanup(). With no remaining users, remove this function from the exported netpoll API. The function remains available internally within netpoll for use by netpoll_cleanup(). Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-1-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	ptp: Use ratelimite for freerun error message	Breno Leitao
	Replace pr_err() with pr_err_ratelimited() in ptp_clock_settime() to prevent log flooding when the physical clock is free running, which happens on some of my hosts. This ensures error messages are rate-limited and improves kernel log readability. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250613-ptp-v1-1-ee44260ce9e2@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	selftests/tc-testing: sfq: check perturb timer values	Eric Dumazet
	Add one test to check that the kernel rejects a negative perturb timer. Add a second test checking that the kernel rejects a too big perturb timer. All test results: 1..2 ok 1 cdc1 - Check that a negative perturb timer is rejected ok 2 a9f0 - Check that a too big perturb timer is rejected Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250613064136.3911944-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	Merge branch 'net-phy-make-phy_package-a-separate-module'	Jakub Kicinski
	Heiner Kallweit says: ==================== net: phy: make phy_package a separate module Only a handful of PHY drivers needs the PHY package functionality, therefore make it a separate module which is built only if needed. ==================== Link: https://patch.msgid.link/eec346a4-e903-48af-8150-0191932a7a0b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	net: phy: add Kconfig symbol PHY_PACKAGE	Heiner Kallweit
	Only a handful of PHY drivers needs the PHY package functionality, therefore build the module only if needed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/42c05496-61b2-4b09-b853-3d99b3dfe95c@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	net: phy: make phy_package a separate module	Heiner Kallweit
	Make phy_package a separate module, so that this code is only loaded if needed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/66bb4cce-b6a3-421e-9a7b-5d4a0c75290e@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	net: phy: move __phy_package_[read\|write]_mmd to phy_package.c	Heiner Kallweit
	Move both functions to phy_package.c, so that phy_core.c no longer has a dependency on phy_package.c (phy_package_address). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/8956fa53-3eda-4079-8203-a8fddcc17bf3@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	net: stmmac: remove pcs_get_adv_lp() support	Russell King (Oracle)
	It appears that the GMAC_ANE_ADV and GMAC_ANE_LPA registers are only available for TBI and RTBI PHY interfaces. In commit 482b3c3ba757 ("net: stmmac: Drop TBI/RTBI PCS flags") support for these was dropped, and thus it no longer makes sense to access these registers. Remove the _get_adv_lp() functions, and the now redundant struct rgmii_adv and STMMAC_PCS_ definitions. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/E1uPkbT-004EyG-OQ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	net/tcp_ao: tracing: Hide tcp_ao events under CONFIG_TCP_AO	Steven Rostedt
	Several of the tcp_ao events are only called when CONFIG_TCP_AO is defined. As each event can take up to 5K regardless if they are used or not, it's best not to define them when they are not used. Add #ifdef around these events when they are not used. Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250612094616.4222daf0@batman.local.home Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	scsi: elx: efct: Fix memory leak in efct_hw_parse_filter()	Vitaliy Shevtsov
	strsep() modifies the address of the pointer passed to it so that it no longer points to the original address. This means kfree() gets the wrong pointer. Fix this by passing unmodified pointer returned from kstrdup() to kfree(). Found by Linux Verification Center (linuxtesting.org) with Svace. Fixes: 4df84e846624 ("scsi: elx: efct: Driver initialization routines") Signed-off-by: Vitaliy Shevtsov <v.shevtsov@mt-integration.ru> Link: https://lore.kernel.org/r/20250612163616.24298-1-v.shevtsov@mt-integration.ru Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-16	pldmfw: Select CRC32 when PLDMFW is selected	Simon Horman
	pldmfw calls crc32 code and depends on it being enabled, else there is a link error as follows. So PLDMFW should select CRC32. lib/pldmfw/pldmfw.o: In function `pldmfw_flash_image': pldmfw.c:(.text+0x70f): undefined reference to `crc32_le_base' This problem was introduced by commit b8265621f488 ("Add pldmfw library for PLDM firmware update"). It manifests as of commit d69ea414c9b4 ("ice: implement device flash update via devlink"). And is more likely to occur as of commit 9ad19171b6d6 ("lib/crc: remove unnecessary prompt for CONFIG_CRC32 and drop 'default y'"). Found by chance while exercising builds based on tinyconfig. Fixes: b8265621f488 ("Add pldmfw library for PLDM firmware update") Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250613-pldmfw-crc32-v1-1-f3fad109eee6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16	EDAC/amd64: Correct number of UMCs for family 19h models 70h-7fh	Avadhut Naik
	AMD's Family 19h-based Models 70h-7fh support 4 unified memory controllers (UMC) per processor die. The amd64_edac driver, however, assumes only 2 UMCs are supported since max_mcs variable for the models has not been explicitly set to 4. The same results in incomplete or incorrect memory information being logged to dmesg by the module during initialization in some instances. Fixes: 6c79e42169fe ("EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh") Closes: https://lore.kernel.org/all/27dc093f-ce27-4c71-9e81-786150a040b6@reox.at/ Reported-by: reox <mailinglist@reox.at> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@kernel.org Link: https://lore.kernel.org/20250613005233.2330627-1-avadhut.naik@amd.com
2025-06-16	lib/crypto/poly1305: Fix arm64's poly1305_blocks_arch()	Eric Biggers
	For some reason arm64's Poly1305 code got changed to ignore the padbit argument. As a result, the output is incorrect when the message length is not a multiple of 16 (which is not reached with the standard ChaCha20Poly1305, but bcachefs could reach this). Fix this. Fixes: a59e5468a921 ("crypto: arm64/poly1305 - Add block-only interface") Reported-by: Kent Overstreet <kent.overstreet@linux.dev> Tested-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20250616010654.367302-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-06-16	x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl ↵	Qinyun Tan
	subsystem In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain structure representing a NUMA node relies on the cacheinfo interface (rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map) for monitoring. The L3 cache information of a SNC NUMA node determines which domains are summed for the "top level" L3-scoped events. rdt_mon_domain::ci is initialized using the first online CPU of a NUMA node. When this CPU goes offline, its shared_cpu_map is cleared to contain only the offline CPU itself. Subsequently, attempting to read counters via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning zero values for "top-level events" without any error indication. Replace the cacheinfo references in struct rdt_mon_domain and struct rmid_read with the cacheinfo ID (a unique identifier for the L3 cache). rdt_domain_hdr::cpu_mask contains the online CPUs associated with that domain. When reading "top-level events", select a CPU from rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine valid CPUs for reading RMID counter via the MSR interface. Considering all CPUs associated with the L3 cache improves the chances of picking a housekeeping CPU on which the counter reading work can be queued, avoiding an unnecessary IPI. Fixes: 328ea68874642 ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files") Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250530182053.37502-2-qinyuntan@linux.alibaba.com
2025-06-16	libeth: xdp, xsk: access adjacent u32s as u64 where applicable	Alexander Lobakin
	On 64-bit systems, writing/reading one u64 is faster than two u32s even when they're are adjacent in a struct. The compilers won't guarantee they will combine those; I observed both successful and unsuccessful attempts with both GCC and Clang, and it's not easy to say what it depends on. There's a few places in libeth_xdp winning up to several percent from combined access (both performance and object code size, especially when unrolling). Add __LIBETH_WORD_ACCESS and use it there on LE. Drivers are free to optimize HW-specific callbacks under the same definition. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xsk: add XSkFQ refill and XSk wakeup helpers	Alexander Lobakin
	XSkFQ refill is pretty generic across the drivers minus FQ descriptor filling and can easily be unified with one inline callback. XSk wakeup is usually not, but here, instead of commonly used "SW interrupts", I picked firing an IPI. In most tests, it showed better performance; it also provides better control for userspace on which CPU will handle the xmit, as SW interrupts honor IRQ affinity no matter which core produces XSk xmit descs (while XDPSQs are associated 1:1 with cores having the same ID). Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xsk: add XSk Rx processing support	Alexander Lobakin
	Add XSk counterparts for preparing XSk &libeth_xdp_buff (adding head and frags), running the program, and handling the verdict, inc. XDP_PASS. Shortcuts in comparison with regular Rx: frags and all verdicts except XDP_REDIRECT are under unlikely() and out of line; no checks for XDP program presence as it's always true for XSk. Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # optimizations Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xsk: add XSk xmit functions	Alexander Lobakin
	Reuse core sending functions to send XSk xmit frames. Both metadata and no metadata pools/driver are supported. libeth_xdp also provides generic XSk metadata ops, currently with the checksum offload only and for cases when HW doesn't require supplying L3/L4 checksum offsets. Drivers are free to pass their own ops. &libeth_xdp_tx_bulk is not used here as it would be redundant; pool->tx_descs are accessed directly. Fake "libeth_xsktmo" is needed to hide implementation details from the drivers when they want to use the generic ops: the original struct is defined in the same file where dev->xsk_tx_metadata_ops gets set to avoid duplication of slowpath; at the same time; XSk xmit functions use local "fast" copy to inline XMO callbacks. Tx descriptor filling loop is unrolled by 8. Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # optimizations Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xsk: add XSk XDP_TX sending helpers	Alexander Lobakin
	Add Xsk counterparts for XDP_TX buffer sending and completion. The same base structures and functions used from the libeth_xdp core, with adjustments to that XSk Rx always operates on &xdp_buff_xsk for both head and frags. And unlike regular Rx, here unlikely() are used for frags, as the header split gives no benefits for XSk Rx, at least for now. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xdp: add RSS hash hint and XDP features setup helpers	Alexander Lobakin
	End the XDP section by adding helpers to setup XDP features, flipping .ndo_xdp_xmit() support at runtime (in case when it's not always on), and calculating the queue clean/refill threshold. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xdp: add templates for building driver-side callbacks	Alexander Lobakin
	Defining driver-specific functions to pass to libeth_xdp functions can induce boilerplates and/or look a bit cryptic with all those layers of indirection. On the other hand, this indirection is needed to allow compilers to uninline big functions even when passed to __always_inline helpers (too much inlining also hurts performance in some cases), plus to reuse some XDP helpers in XSk code. Add macros to quickly build them, with the detailed kdoc. They take names of the actual callbacks for filling a Tx descriptor and other purely HW-specific things and wrap them appropriately. LIBETH_XDP_DEFINE_{BEGIN,END}() is needed for GCC 8+ unfortunately to let the drivers control which functions will be static and which global without hitting `-Wold-style-declaration`. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xdp: add XDP prog run and verdict result handling	Alexander Lobakin
	Running a prog and handling the verdicts, up to napi_gro_receive() is also pretty generic code not really differing between vendors (except for Tx descriptor filling and Rx descriptor parsing). Define a couple inlines to do that. The inline callbacks a driver needs to pass is mentioned above: Tx descriptor filling for XDP_TX, populating skb with the descriptor data for XDP_PASS, finalizing XDPSQs after the polling loop for XDP_TX (kicking the HW to start sending). The populate callback passes only &libeth_xdp_buff assuming buff::desc pointer is enough, plus you can always get the corresponding Rx queue structure via container_of(buff::rxq). If not, a driver can extend the buff with more fields directly on the stack without touching libeth_xdp definitions. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xdp: add helpers for preparing/processing &libeth_xdp_buff	Alexander Lobakin
	Add convenience helpers to build an &xdp_buff. This means: general initialization before the NAPI loop, adding head, adding frags etc. libeth_xdp_process_buff() is the same what everybody have in their drivers: dma_sync_for_cpu(); if (!frag) { add_head(); prefetch(); } else { add_frag(); } Note that I don't use net_prefetch(), sticking to the original prefetch(). In none of my tests prefetching 128 bytes yielded better perf than 64 bytes. That might differ if the headers are huge enough, but then additional tunneling etc. overhead takes place, you either way won't win a lot. &libeth_xdp_stash is for cases when you exit the polling loop without finishing building the buff. If that happens, you need to store the buffer in the queue structure until the next loop and then restore it. It makes no sense to place a whole full &xdp_buff there. Define a minimal structure, which would store only the fields essential to restore it. I was able to pack it into 16 bytes, which is only 8 bytes bigger than `struct sk_buff *skb` on x64. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xdp: add XDPSQ cleanup timers	Alexander Lobakin
	When XDP Tx queues are not interrupt-driven but use lazy cleaning, i.e. only when there are less than `threshold` free descriptors left, we also need cleanup timers to avoid &xdp_buff and &xdp_frame stall for too long, especially with Page Pool (it warns every about inflight pages every 60 second). Let's say we sent 256 frames and don't need to send more, but we clean only when the number of pending items >= 384. In that case, those 256 will stall until 128 more are sent. For this, add simple helpers to run a timer which will clean the queue regardless, after 1 second of the last send. The timer is triggered when finalizing the queue. As long as there is regular active traffic, the timer doesn't fire. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xdp: add XDPSQ locking helpers	Alexander Lobakin
	Unfortunately, it's not always possible to allocate max(num_rxqs, nr_cpu_ids) even on hi-end NICs. To mitigate this, add simple locking helpers to libeth_xdp. As long as XDPSQs are not shared, the whole functionality is gated behind a static lock. Otherwise, each bulk flush locks the queue for the time of cleaning and filling the descriptors. As long as this particular queue is not used by more than 1 CPU, the impact is minimal (runtime check for boolean twice per 16+ descriptors). Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # static key Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xdp: add XDPSQE completion helpers	Alexander Lobakin
	Similarly to libeth_tx_complete(), add libeth_xdp_complete_tx() to handle XDP_TX and xmit buffers. Both use bulk return under the hood. Also add out of line libeth_tx_complete_any() which handles both regular and XDP frames (if libeth_xdp is loaded), for example, to call on queue destroy, where we don't need inlining but convenience. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xdp: add .ndo_xdp_xmit() helpers	Alexander Lobakin
	Add helpers for implementing .ndo_xdp_xmit(). Same as for XDP_TX, accumulate up to 16 DMA-mapped frames on the stack, then flush. If DMA mapping is failed for some reason, don't try mapping further frames, but still flush what was already prepared. DMA address of a head frame is stored in its headroom, assuming it has enough of it for an 8 (or 4) byte value. In addition to @prep and @xmit driver callbacks in XDP_TX, xmit also needs @finalize to kick the XDPSQ after filling. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: xdp: add XDP_TX buffers sending	Alexander Lobakin
	Start adding XDP-specific code to libeth, namely handling XDP_TX buffers (only sending). The idea is that we accumulate up to 16 buffers on the stack, then, if either the limit is reached or the polling is finished, flush them at once with only one XDPSQ cleaning (if needed). The main sending function will be aware of the sending budget and already have all the info to send the buffers, so it can't fail. Drivers need to provide 2 inline callbacks to the main sending function: for cleaning an XDPSQ and for filling descriptors; the library code takes care of the rest. Note that unlike the generic code, multi-buffer support is not wrapped here with unlikely() to not hurt header split setups. &libeth_xdp_buff is a simple extension over &xdp_buff which has a direct pointer to the corresponding Rx descriptor (and, luckily, precisely 1 CL size and 16-byte alignment on x86_64). Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # xmit logic Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: support native XDP and register memory model	Alexander Lobakin
	Expand libeth's Page Pool functionality by adding native XDP support. This means picking the appropriate headroom and DMA direction. Also, register all the created &page_pools as XDP memory models. A driver then can call xdp_rxq_info_attach_page_pool() when registering its RxQ info. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth: convert to netmem	Alexander Lobakin
	Back when the libeth Rx core was initially written, devmem was a draft and netmem_ref didn't exist in the mainline. Now that it's here, make libeth MP-agnostic before introducing any new code or any new library users. When it's known that the created PP/FQ is for header buffers, use faster "unsafe" underscored netmem <--> virt accessors as netmem_is_net_iov() is always false in that case, but consumes some cycles (bit test + true branch). Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	libeth, libie: clean symbol exports up a little	Alexander Lobakin
	Change EXPORT_SYMBOL_NS_GPL(x, "LIBETH") to EXPORT_SYMBOL_GPL(x) + DEFAULT_SYMBOL_NAMESPACE "LIBETH" to make the code more compact. Also, explicitly include <linux/export.h> to satisfy new requirements from scripts/misc-check. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16	Merge tag 'x86_urgent_for_6.16-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Dave Hansen: "This is a pretty scattered set of fixes. The majority of them are further fixups around the recent ITS mitigations. The rest don't really have a coherent story: - Some flavors of Xen PV guests don't support large pages, but the set_memory.c code assumes all CPUs support them. Avoid problems with a quick CPU feature check. - The TDX code has some wrappers to help retry calls to the TDX module. They use function pointers to assembly functions and the compiler usually generates direct CALLs. But some new compilers, plus -Os turned them in to indirect CALLs and the assembly code was not annotated for indirect calls. Force inlining of the helper to fix it up. - Last, a FRED issue showed up when single-stepping. It's fine when using an external debugger, but was getting stuck returning from a SIGTRAP handler otherwise. Clear the FRED 'swevent' bit to ensure that forward progress is made" * tag 'x86_urgent_for_6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "mm/execmem: Unify early execmem_cache behaviour" x86/its: explicitly manage permissions for ITS pages x86/its: move its_pages array to struct mod_arch_specific x86/Kconfig: only enable ROX cache in execmem when STRICT_MODULE_RWX is set x86/mm/pat: don't collapse pages without PSE set x86/virt/tdx: Avoid indirect calls to TDX assembly functions selftests/x86: Add a test to detect infinite SIGTRAP handler loop x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler
2025-06-16	scsi: target: Fix NULL pointer dereference in core_scsi3_decode_spec_i_port()	Maurizio Lombardi
	The function core_scsi3_decode_spec_i_port(), in its error code path, unconditionally calls core_scsi3_lunacl_undepend_item() passing the dest_se_deve pointer, which may be NULL. This can lead to a NULL pointer dereference if dest_se_deve remains unset. SPC-3 PR SPEC_I_PT: Unable to locate dest_tpg Unable to handle kernel paging request at virtual address dfff800000000012 Call trace: core_scsi3_lunacl_undepend_item+0x2c/0xf0 [target_core_mod] (P) core_scsi3_decode_spec_i_port+0x120c/0x1c30 [target_core_mod] core_scsi3_emulate_pro_register+0x6b8/0xcd8 [target_core_mod] target_scsi3_emulate_pr_out+0x56c/0x840 [target_core_mod] Fix this by adding a NULL check before calling core_scsi3_lunacl_undepend_item() Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Link: https://lore.kernel.org/r/20250612101556.24829-1-mlombard@redhat.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-16	tools arch x86: Sync the msr-index.h copy with the kernel sources	Arnaldo Carvalho de Melo
	To pick up the changes from these csets: 159013a7ca18c271 ("x86/its: Enumerate Indirect Target Selection (ITS) bug") f4138de5e41fae1a ("x86/msr: Standardize on u64 in <asm/msr-index.h>") ec980e4facef8110 ("perf/x86/intel: Support auto counter reload") That cause no changes to tooling as it doesn't include a new MSR to be captured by the tools/perf/trace/beauty/tracepoints/x86_msr.sh script. Just silences this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/aEtAUg83OQGx8Kay@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-16	tools headers: Syncronize linux/build_bug.h with the kernel sources	Arnaldo Carvalho de Melo
	To pick up the changes in: 243c90e917f5cfc9 ("build_bug.h: more user friendly error messages in BUILD_BUG_ON_ZERO()") This also needed to pick the __BUILD_BUG_ON_ZERO_MSG() in linux/compiler.h, that needed to be polished to avoid hitting old clang problems with _Static_assert on arrays of structs: Debian clang version 11.0.1-2~deb10u1 Debian clang version 11.0.1-2~deb10u1 $ make NO_LIBTRACEEVENT=1 ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C tools/perf O=/tmp/build/perf CC=clang <SNIP> btf_dump.c:895:18: error: type name does not allow storage class to be specified for (i = 0; i < ARRAY_SIZE(pads); i++) { ^ /git/perf-6.16.0-rc1/tools/include/linux/kernel.h:91:59: note: expanded from macro 'ARRAY_SIZE' #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr)) ^ /git/perf-6.16.0-rc1/tools/include/linux/compiler-gcc.h:26:28: note: expanded from macro '__must_be_array' #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0])) ^ /git/perf-6.16.0-rc1/tools/include/linux/build_bug.h:17:2: note: expanded from macro 'BUILD_BUG_ON_ZERO' __BUILD_BUG_ON_ZERO_MSG(e, ##__VA_ARGS__, #e " is true") ^ /git/perf-6.16.0-rc1/tools/include/linux/compiler.h:248:67: note: expanded from macro '__BUILD_BUG_ON_ZERO_MSG' #define __BUILD_BUG_ON_ZERO_MSG(e, msg, ...) ((int)sizeof(struct {_Static_assert(!(e), msg);})) ^ /usr/include/x86_64-linux-gnu/sys/cdefs.h:438:5: note: expanded from macro '_Static_assert' extern int (*__Static_assert_function (void)) \ ^ These also failed: toolsbuilder@five:~$ grep FAIL dm.log/summary \| grep clang 1 72.87 almalinux:8 : FAIL clang version 19.1.7 ( 19.1.7-2.module_el8.10.0+3990+33d0d926) 15 73.39 centos:stream : FAIL clang version 17.0.6 (Red Hat 17.0.6-1.module_el8+767+9fa966b8) 36 87.14 opensuse:15.4 : FAIL clang version 15.0.7 37 80.08 opensuse:15.5 : FAIL clang version 15.0.7 40 72.12 oraclelinux:8 : FAIL clang version 16.0.6 (Red Hat 16.0.6-2.0.1.module+el8.9.0+90129+d3ee8717) 42 74.12 rockylinux:8 : FAIL clang version 16.0.6 (Red Hat 16.0.6-2.module+el8.9.0+1651+e10a8f6d) toolsbuilder@five:~$ This addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/include/linux/build_bug.h include/linux/build_bug.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/aEszb7SSIJB6Lp6f@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-16	tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'	Arnaldo Carvalho de Melo
	Also add SYM_PIC_ALIAS() to tools/perf/util/include/linux/linkage.h. This is to get the changes from: 419cbaf6a56a6e4b ("x86/boot: Add a bunch of PIC aliases") That addresses these perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/aEry7L3fibwIG5au@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>