summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-17dpll: remove documentation of rclk_dev_nameSimon Horman
Remove documentation of rclk_dev_name member of dpll_device which doesn't exist. Flagged by ./scripts/kernel-doc -none Introduced by commit 9431063ad323 ("dpll: core: Add DPLL framework base functions") Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250616-dpll-member-v1-1-8c9e6b8e1fd4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge branch '200GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== libeth: add libeth_xdp helper lib Alexander Lobakin says: Time to add XDP helpers infra to libeth to greatly simplify adding XDP to idpf and iavf, as well as improve and extend XDP in ice and i40e. Any vendor is free to reuse helpers. If this happens, I'm fine with moving the folder of out intel/. The helpers greatly simplify building xdp_buff, running a prog, handling the verdict, implement XDP_TX, .ndo_xdp_xmit, XDP buffer completion. Same applies to XSk (with XSk xmit instead of .ndo_xdp_xmit, plus stuff like XSk wakeup). They are entirely generic with no HW definitions or assumptions. HW-specific stuff like parsing Rx desc / filling Tx desc is passed from the driver as inline callbacks. For now, key assumptions that optimize performance / avoid code bloat, but might not fit every driver in driver/net/: * netmem holding the buffers are always order-0; * driver has separate XDP Tx queues, doesn't use stack queues for that. For best efficiency, you may want to have nr_cpu_ids XDP queues, but less (queue sharing) is also supported; * XDP Tx queues are interrupt-less and use "lazy" cleaning only when there are less than 1/4 free Tx descriptors of the queue size; * main target platforms are 64-bit, although 32-bit is also fully supported, but the code might be not as optimized for them. Library code already supports multi-buffer for all kinds of Tx and both header split and no split for Rx and Tx. Frags can come from devmem/io_uring etc., direct `struct page *` is used only for header buffers for which it's always true. Drivers are free to pass their own Rx hints and XSK xmit hints ops. XDP_TX and ndo_xdp_xmit use onstack bulk for the frames to be sent and send them by batches of 16 buffers. This eats ~280 bytes on the stack, but gives good boosts and allow to greatly optimize the main sending function leaving it without any error/exception paths. XSk xmit fills Tx descriptors in the loop unrolled by 8. This was proven to improve perf on ice and i40e. XDP_TX and ndo_xdp_xmit doesn't use unrolling as I wasn't able to get any improvements in those scenenarios from this, while +1 Kb for their sending functions for nothing doesn't sound reasonable. XSk wakeup, instead of traditionally used "SW interrupts" provided by NICs, uses IPI to schedule NAPI on the CPU corresponding to the given queue pair. It gives better control over CPU distribution and in general performs way better than "SW interrupts", plus allows us to not pass any HW-specific callbacks there. The code is built the way that all callbacks passed from drivers get inlined; in general, most of hotpath gets inlined. Everything slow/exception lands to .c files in the libeth folder, doesn't create copies in the drivers themselves and doesn't overloat hotpath. Sure, inlining means that hotpath will be compiled into every driver that uses the lib, but the core code is written in one place, so no copying of bugs happens. Fixed once -- works everywhere. The last commit might look like sorta hack, but it gives really good boosts and decreases object code size, plus there are checks that all those wider accesses are fully safe, so I don't feel anything bad about it. An example of using libeth_xdp can be found either on my GitHub or on the mailing lists here ("XDP for idpf"). Macros for building driver XDP functions lead to that some implementations (XDP_TX, ndo_xdp_xmit etc.) consist of really only a few lines. * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: libeth: xdp, xsk: access adjacent u32s as u64 where applicable libeth: xsk: add XSkFQ refill and XSk wakeup helpers libeth: xsk: add XSk Rx processing support libeth: xsk: add XSk xmit functions libeth: xsk: add XSk XDP_TX sending helpers libeth: xdp: add RSS hash hint and XDP features setup helpers libeth: xdp: add templates for building driver-side callbacks libeth: xdp: add XDP prog run and verdict result handling libeth: xdp: add helpers for preparing/processing &libeth_xdp_buff libeth: xdp: add XDPSQ cleanup timers libeth: xdp: add XDPSQ locking helpers libeth: xdp: add XDPSQE completion helpers libeth: xdp: add .ndo_xdp_xmit() helpers libeth: xdp: add XDP_TX buffers sending libeth: support native XDP and register memory model libeth: convert to netmem libeth, libie: clean symbol exports up a little ==================== Link: https://patch.msgid.link/20250616201639.710420-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge branch 'net-mlx5e-add-support-for-devmem-and-io_uring-tcp-zero-copy'Jakub Kicinski
Mark Bloch says: ==================== net/mlx5e: Add support for devmem and io_uring TCP zero-copy This series adds support for zerocopy rx TCP with devmem and io_uring for ConnectX7 NICs and above. For performance reasons and simplicity HW-GRO will also be turned on when header-data split mode is on. Performance =========== Test setup: * CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (single NUMA) * NIC: ConnectX7 * Benchmarking tool: kperf [0] * Single TCP flow * Test duration: 60s With application thread and interrupts pinned to the *same* core: |------+-----------+----------| | MTU | epoll | io_uring | |------+-----------+----------| | 1500 | 61.6 Gbps | 114 Gbps | | 4096 | 69.3 Gbps | 151 Gbps | | 9000 | 67.8 Gbps | 187 Gbps | |------+-----------+----------| The CPU usage for io_uring is 95%. Reproduction steps for io_uring: server --no-daemon -a 2001:db8::1 --no-memcmp --iou --iou_sendzc \ --iou_zcrx --iou_dev_name eth2 --iou_zcrx_queue_id 2 server --no-daemon -a 2001:db8::2 --no-memcmp --iou --iou_sendzc client --src 2001:db8::2 --dst 2001:db8::1 \ --msg-zerocopy -t 60 --cpu-min=2 --cpu-max=2 Patch overview: ================ First, a netmem API for skb_can_coalesce is added to the core to be able to do skb fragment coalescing on netmems. The next patches introduce some cleanups in the internal SHAMPO code and improvements to hw gro capability checks in FW. A separate page_pool is introduced for headers, to be used only when the rxq has a memory provider. Then the driver is converted to use the netmem API and to allow support for unreadable netmem page pool. The queue management ops are implemented. Finally, the tcp-data-split ring parameter is exposed. References ========== [0] kperf: git://git.kernel.dk/kperf.git v1: https://lore.kernel.org/20250116215530.158886-1-saeed@kernel.org v2: https://lore.kernel.org/1747950086-1246773-1-git-send-email-tariqt@nvidia.com v3: https://lore.kernel.org/20250609145833.990793-1-mbloch@nvidia.com v4: https://lore.kernel.org/20250610150950.1094376-1-mbloch@nvidia.com v5: https://lore.kernel.org/20250612154648.1161201-1-mbloch@nvidia.com ==================== Link: https://patch.msgid.link/20250616141441.1243044-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: Add TX support for netmemsDragos Tatulea
Declare netmem TX support in netdev. As required, use the netmem aware dma unmapping APIs for unmapping netmems in tx completion path. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-13-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: Support ethtool tcp-data-split settingsSaeed Mahameed
In mlx5, tcp header-data split requires HW GRO to be on. Enabling it fails when HW GRO is off. mlx5e_fix_features now keeps HW GRO on when tcp data split is enabled. Finally, when tcp data split is disabled, features are updated to maybe remove the forced HW GRO. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-12-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: Implement queue mgmt ops and single channel swapSaeed Mahameed
The bulk of the work is done in mlx5e_queue_mem_alloc, where we allocate and create the new channel resources, similar to mlx5e_safe_switch_params, but here we do it for a single channel using existing params, sort of a clone channel. To swap the old channel with the new one, we deactivate and close the old channel then replace it with the new one, since the swap procedure doesn't fail in mlx5, we do it all in one place (mlx5e_queue_start). Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250616141441.1243044-11-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: Add support for UNREADABLE netmem page poolsSaeed Mahameed
On netdev_rx_queue_restart, a special type of page pool maybe expected. In this patch declare support for UNREADABLE netmem iov pages in the pool params only when header data split shampo RQ mode is enabled, also set the queue index in the page pool params struct. Shampo mode requirement: Without header split rx needs to peek at the data, we can't do UNREADABLE_NETMEM. The patch also enables the use of a separate page pool for headers when a memory provider is installed for the queue, otherwise the same common page pool continues to be used. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-10-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: Convert over to netmemSaeed Mahameed
mlx5e_page_frag holds the physical page itself, to naturally support zc page pools, remove physical page reference from mlx5 and replace it with netmem_ref, to avoid internal handling in mlx5 for net_iov backed pages. SHAMPO can issue packets that are not split into header and data. These packets will be dropped if the data part resides in a net_iov as the driver can't read into this area. No performance degradation observed. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-9-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: SHAMPO: Separate pool for headersSaeed Mahameed
Allow allocating a separate page pool for headers when SHAMPO is on. This will be useful for adding support to zc page pool, which has to be different from the headers page pool. For now, the pools are the same. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-8-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: SHAMPO: Improve hw gro capability checkingSaeed Mahameed
Add missing HW capabilities, declare the feature in netdev->vlan_features, similar to other features in mlx5e_build_nic_netdev. No functional change here as all by default disabled features are explicitly disabled at the bottom of the function. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-7-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: SHAMPO: Remove redundant paramsSaeed Mahameed
Two SHAMPO params are static and always the same, remove them from the global mlx5e_params struct. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-6-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_allocSaeed Mahameed
Drop redundant SHAMPO structure alloc/free functions. Gather together function calls pertaining to header split info, pass header per WQE (hd_per_wqe) as parameter to those function to avoid use before initialization future mistakes. Allocate HW GRO related info outside of the header related info scope. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17page_pool: Add page_pool_dev_alloc_netmems helperDragos Tatulea
This is the netmem counterpart of page_pool_dev_alloc_pages() which uses the default GFP flags for RX. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-4-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: Add skb_can_coalesce for netmemDragos Tatulea
Allow drivers that have moved over to netmem to do fragment coalescing. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: Allow const args for of page_to_netmem()Dragos Tatulea
This allows calling page_to_netmem() with a const page * argument. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: tcp: tsq: Convert from tasklet to BH workqueueTejun Heo
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts TCP Small Queues implementation from tasklet to BH workqueue. Semantically, this is an equivalent conversion and there shouldn't be any user-visible behavior changes. While workqueue's queueing and execution paths are a bit heavier than tasklet's, unless the work item is being queued every packet, the difference hopefully shouldn't matter. My experience with the networking stack is very limited and this patch definitely needs attention from someone who actually understands networking. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/aFBeJ38AS1ZF3Dq5@slm.duckdns.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge branch 'ipmr-ip6mr-allow-mc-routing-locally-generated-mc-packets'Jakub Kicinski
Petr Machata says: ==================== ipmr, ip6mr: Allow MC-routing locally-generated MC packets Multicast routing is today handled in the input path. Locally generated MC packets don't hit the IPMR code. Thus if a VXLAN remote address is multicast, the driver needs to set an OIF during route lookup. In practice that means that MC routing configuration needs to be kept in sync with the VXLAN FDB and MDB. Ideally, the VXLAN packets would be routed by the MC routing code instead. To that end, this patchset adds support to route locally generated multicast packets. However, an installation that uses a VXLAN underlay netdevice for which it also has matching MC routes, would get a different routing with this patch. Previously, the MC packets would be delivered directly to the underlay port, whereas now they would be MC-routed. In order to avoid this change in behavior, introduce an IPCB/IP6CB flag. Unless the flag is set, the new MC-routing code is skipped. All this is keyed to a new VXLAN attribute, IFLA_VXLAN_MC_ROUTE. Only when it is set does any of the above engage. In addition to that, and as is the case today with MC forwarding, IPV4_DEVCONF_MC_FORWARDING must be enabled for the netdevice that acts as a source of MC traffic (i.e. the VXLAN PHYS_DEV), so an MC daemon must be attached to the netdevice. When a VXLAN netdevice with a MC remote is brought up, the physical netdevice joins the indicated MC group. This is important for local delivery of MC packets, so it is still necessary to configure a physical netdevice -- the parameter cannot go away. The netdevice would however typically not be a front panel port, but a dummy. An MC daemon would then sit on top of that netdevice as well as any front panel ports that it needs to service, and have routes set up between the two. A way to configure the VXLAN netdevice to take advantage of the new MC routing would be: # ip link add name d up type dummy # ip link add name vx10 up type vxlan id 1000 dstport 4789 \ local 192.0.2.1 group 225.0.0.1 ttl 16 dev d mrcoute # ip link set dev vx10 master br # plus vlans etc. With the following MC routes: (192.0.2.1, 225.0.0.1) iif=d oil=swp1,swp2 # TX route (*, 225.0.0.1) iif=swp1 oil=d,swp2 # RX route (*, 225.0.0.1) iif=swp2 oil=d,swp1 # RX route The RX path has not changed, with the exception of an extra MC hop. Packets are delivered to the front panel port and MC-forwarded to the VXLAN physical port, here "d". Since the port has joined the multicast group, the packets are locally delivered, and end up being processed by the VXLAN netdevice. This patchset is based on earlier patches from Nikolay Aleksandrov and Roopa Prabhu, though it underwent significant changes. Roopa broadly presented the topic on LPC 2019 [0]. Patchset progression: - Patches #1 to #4 add ip_mr_output() - Patches #5 to #10 add ip6_mr_output() - Patch #11 adds the VXLAN bits to enable MR engagement - Patches #12 to #14 prepare selftest libraries - Patch #15 includes a new test suite [0] https://www.youtube.com/watch?v=xlReECfi-uo ==================== Link: https://patch.msgid.link/cover.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17selftests: forwarding: Add a test for verifying VXLAN MC underlayPetr Machata
Add tests for MC-routing underlay VXLAN traffic. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/eecd2c0fefc754182e74be8e8e65751bf5749c21.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17selftests: forwarding: adf_mcd_start(): Allow configuring custom interfacesPetr Machata
Tests may wish to add other interfaces to listen on. Notably locally generated traffic uses dummy interfaces. The multicast daemon needs to know about these so that it allows forming rules that involve these interfaces, and so that net.ipv4.conf.X.mc_forwarding is set for the interfaces. To that end, allow passing in a list of interfaces to configure in addition to all the physical ones. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/2e8d83297985933be4850f2b9f296b3c27110388.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17selftests: net: lib: Add ip_link_has_flag()Petr Machata
Add a helper to determine whether a given netdevice has a given flag. Rewrite ip_link_is_up() in terms of the new helper. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/e1eb174a411f9d24735d095984c731d1d4a5a592.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17selftests: forwarding: lib: Move smcrouted helpers herePetr Machata
router_multicast.sh has several helpers for work with smcrouted. Extract them to lib.sh so that other selftests can use them as well. Convert the helpers to defer in the process, because that simplifies the interface quite a bit. Therefore have router_multicast.sh invoke defer_scopes_cleanup() in its cleanup() function. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/410411c1a81225ce6e44542289b9c3ec21e5786c.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17vxlan: Support MC routing in the underlayPetr Machata
Locally-generated MC packets have so far not been subject to MC routing. Instead an MC-enabled installation would maintain the MC routing tables, and separately from that the list of interfaces to send packets to as part of the VXLAN FDB and MDB. In a previous patch, a ip_mr_output() and ip6_mr_output() routines were added for IPv4 and IPv6. All locally generated MC traffic is now passed through these functions. For reasons of backward compatibility, an SKB (IPCB / IP6CB) flag guards the actual MC routing. This patch adds logic to set the flag, and the UAPI to enable the behavior. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/d899655bb7e9b2521ee8c793e67056b9fd02ba12.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: ipv6: Add ip6_mr_output()Petr Machata
Multicast routing is today handled in the input path. Locally generated MC packets don't hit the IPMR code today. Thus if a VXLAN remote address is multicast, the driver needs to set an OIF during route lookup. Thus MC routing configuration needs to be kept in sync with the VXLAN FDB and MDB. Ideally, the VXLAN packets would be routed by the MC routing code instead. To that end, this patch adds support to route locally generated multicast packets. The newly-added routines do largely what ip6_mr_input() and ip6_mr_forward() do: make an MR cache lookup to find where to send the packets, and use ip6_output() to send each of them. When no cache entry is found, the packet is punted to the daemon for resolution. Similarly to the IPv4 case in a previous patch, the new logic is contingent on a newly-added IP6CB flag being set. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/3bcc034a3ab4d3c291072fff38f78d7fbbeef4e6.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: ipv6: ip6mr: Split ip6mr_forward2() in twoPetr Machata
Some of the work of ip6mr_forward2() is specific to IPMR forwarding, and should not take place on the output path. In order to allow reuse of the common parts, extract out of the function a helper, ip6mr_prepare_forward(). Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/8932bd5c0fbe3f662b158803b8509604fa7bc113.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: ipv6: ip6mr: Make ip6mr_forward2() voidPetr Machata
Nobody uses the return value, so convert the function to void. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/e0bee259da0da58da96647ea9e21e6360c8f7e11.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: ipv6: ip6mr: Fix in/out netdev to pass to the FORWARD chainPetr Machata
The netfilter hook is invoked with skb->dev for input netdevice, and vif_dev for output netdevice. However at the point of invocation, skb->dev is already set to vif_dev, and MR-forwarded packets are reported with in=out: # ip6tables -A FORWARD -j LOG --log-prefix '[forw]' # cd tools/testing/selftests/net/forwarding # ./router_multicast.sh # dmesg | fgrep '[forw]' [ 1670.248245] [forw]IN=v5 OUT=v5 [...] For reference, IPv4 MR code shows in and out as appropriate. Fix by caching skb->dev and using the updated value for output netdev. Fixes: 7bc570c8b4f7 ("[IPV6] MROUTE: Support multicast forwarding.") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/3141ae8386fbe13fef4b793faa75e6bae58d798a.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: ipv6: Add a flags argument to ip6tunnel_xmit(), udp_tunnel6_xmit_skb()Petr Machata
ip6tunnel_xmit() erases the contents of the SKB control block. In order to be able to set particular IP6CB flags on the SKB, add a corresponding parameter, and propagate it to udp_tunnel6_xmit_skb() as well. In one of the following patches, VXLAN driver will use this facility to mark packets as subject to IPv6 multicast routing. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/acb4f9f3e40c3a931236c3af08a720b017fbfbfb.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: ipv6: Make udp_tunnel6_xmit_skb() voidPetr Machata
The function always returns zero, thus the return value does not carry any signal. Just make it void. Most callers already ignore the return value. However: - Refold arguments of the call from sctp_v6_xmit() so that they fit into the 80-column limit. - tipc_udp_xmit() initializes err from the return value, but that should already be always zero at that point. So there's no practical change, but elision of the assignment prompts a couple more tweaks to clean up the function. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/7facacf9d8ca3ca9391a4aee88160913671b868d.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: ipv4: Add ip_mr_output()Petr Machata
Multicast routing is today handled in the input path. Locally generated MC packets don't hit the IPMR code today. Thus if a VXLAN remote address is multicast, the driver needs to set an OIF during route lookup. Thus MC routing configuration needs to be kept in sync with the VXLAN FDB and MDB. Ideally, the VXLAN packets would be routed by the MC routing code instead. To that end, this patch adds support to route locally generated multicast packets. The newly-added routines do largely what ip_mr_input() and ip_mr_forward() do: make an MR cache lookup to find where to send the packets, and use ip_mc_output() to send each of them. When no cache entry is found, the packet is punted to the daemon for resolution. However, an installation that uses a VXLAN underlay netdevice for which it also has matching MC routes, would get a different routing with this patch. Previously, the MC packets would be delivered directly to the underlay port, whereas now they would be MC-routed. In order to avoid this change in behavior, introduce an IPCB flag. Only if the flag is set will ip_mr_output() actually engage, otherwise it reverts to ip_mc_output(). This code is based on work by Roopa Prabhu and Nikolay Aleksandrov. Signed-off-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/0aadbd49330471c0f758d54afb05eb3b6e3a6b65.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: ipv4: ipmr: Split ipmr_queue_xmit() in twoPetr Machata
Some of the work of ipmr_queue_xmit() is specific to IPMR forwarding, and should not take place on the output path. In order to allow reuse of the common parts, split the function into two: the ipmr_prepare_xmit() helper that takes care of the common bits, and the ipmr_queue_fwd_xmit(), which invokes the former and encapsulates the whole forwarding algorithm. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/4e8db165572a4f8bd29a723a801e854e9d20df4d.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: ipv4: ipmr: ipmr_queue_xmit(): Drop local variable `dev'Petr Machata
The variable is used for caching of rt->dst.dev. The netdevice referenced therein does not change during the scope of validity of that local. At the same time, the local is only used twice, and each of these uses will end up in a different function in the following patches, further eliminating any use the local could have had. Drop the local altogether and inline the uses. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/c80600a4b51679fe78f429ccb6d60892c2f9e4de.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: ipv4: Add a flags argument to iptunnel_xmit(), udp_tunnel_xmit_skb()Petr Machata
iptunnel_xmit() erases the contents of the SKB control block. In order to be able to set particular IPCB flags on the SKB, add a corresponding parameter, and propagate it to udp_tunnel_xmit_skb() as well. In one of the following patches, VXLAN driver will use this facility to mark packets as subject to IP multicast routing. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/89c9daf9f2dc088b6b92ccebcc929f51742de91f.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge branch 'misc-vlan-cleanups'Jakub Kicinski
Gal Pressman says: ==================== Misc vlan cleanups This patch series addresses compilation issues with objtool when VLAN support is disabled (CONFIG_VLAN_8021Q=n) and makes related improvements to the VLAN infrastructure. When CONFIG_VLAN_8021Q=n, CONFIG_OBJTOOL=y, and CONFIG_OBJTOOL_WERROR=y, the following compilation error occurs: drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o: error: objtool: parse_mirred.isra.0+0x370: mlx5e_tc_act_vlan_add_push_action() missing __noreturn in .c/.h or NORETURN() in noreturns.h The error occurs because objtool cannot determine that unreachable BUG() calls in VLAN code paths are actually dead code when VLAN support is disabled. First patch makes is_vlan_dev() a stub when VLAN is not configured, allows compile-out of VLAN-dependent dead code paths and resolves the objtool compilation error. Second patch replaces BUG() calls with WARN_ON_ONCE(), as the usage of BUG() should be avoided. Third patch uses the "kernel" way of testing whether an option is configured as builtin/module, instead of open-coding it. v2: https://lore.kernel.org/20250610072611.1647593-1-gal@nvidia.com/ ==================== Link: https://patch.msgid.link/20250616132626.1749331-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: vlan: Use IS_ENABLED() helper for CONFIG_VLAN_8021Q guardGal Pressman
The header currently tests the VLAN core with an explicit pair of 'if defined' checks: #if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE) Instead, use IS_ENABLED() which is the kernel way to test whether an option is configured as builtin/module. This is purely cosmetic – no functional changes. Reviewed-by: Alex Lazar <alazar@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250616132626.1749331-4-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: vlan: Replace BUG() with WARN_ON_ONCE() in vlan_dev_* stubsGal Pressman
When CONFIG_VLAN_8021Q=n, a set of stub helpers are used, three of these helpers use BUG() unconditionally. This code should not be reached, as callers of these functions should always check for is_vlan_dev() first, but the usage of BUG() is not recommended, replace it with WARN_ON() instead. Reviewed-by: Alex Lazar <alazar@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250616132626.1749331-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: vlan: Make is_vlan_dev() a stub when VLAN is not configuredGal Pressman
Add a stub implementation of is_vlan_dev() that returns false when VLAN support is not compiled in (CONFIG_VLAN_8021Q=n). This allows us to compile-out VLAN-dependent dead code when it is not needed. This also resolves the following compilation error when: * CONFIG_VLAN_8021Q=n * CONFIG_OBJTOOL=y * CONFIG_OBJTOOL_WERROR=y drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o: error: objtool: parse_mirred.isra.0+0x370: mlx5e_tc_act_vlan_add_push_action() missing __noreturn in .c/.h or NORETURN() in noreturns.h The error occurs because objtool cannot determine that unreachable BUG() (which doesn't return) calls in VLAN code paths are actually dead code when VLAN support is disabled. Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250616132626.1749331-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge branch 'net-use-new-gpio-line-value-setter-callbacks'Jakub Kicinski
Bartosz Golaszewski says: ==================== net: use new GPIO line value setter callbacks Commit 98ce1eb1fd87e ("gpiolib: introduce gpio_chip setters that return values") added new line setter callbacks to struct gpio_chip. They allow to indicate failures to callers. We're in the process of converting all GPIO controllers to using them before removing the old ones. This series converts all GPIO chips implemented under drivers/net/. v1: https://lore.kernel.org/20250610-gpiochip-set-rv-net-v1-0-35668dd1c76f@linaro.org ==================== Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-0-cae0b182a552@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: phy: qca807x: use new GPIO line value setter callbacksBartosz Golaszewski
struct gpio_chip now has callbacks for setting line values that return an integer, allowing to indicate failures. Convert the driver to using them. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-5-cae0b182a552@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: can: mcp251x: use new GPIO line value setter callbacksBartosz Golaszewski
struct gpio_chip now has callbacks for setting line values that return an integer, allowing to indicate failures. Convert the driver to using them. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-4-cae0b182a552@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: can: mcp251x: propagate the return value of mcp251x_spi_write()Bartosz Golaszewski
Add an integer return value to mcp251x_write_bits() and use it to propagate the one returned by mcp251x_spi_write(). Return that value on error in the request() GPIO callback. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-3-cae0b182a552@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: dsa: mt7530: use new GPIO line value setter callbacksBartosz Golaszewski
struct gpio_chip now has callbacks for setting line values that return an integer, allowing to indicate failures. Convert the driver to using them. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-2-cae0b182a552@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: dsa: vsc73xx: use new GPIO line value setter callbacksBartosz Golaszewski
struct gpio_chip now has callbacks for setting line values that return an integer, allowing to indicate failures. Convert the driver to using them. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-1-cae0b182a552@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17gve: Return error for unknown admin queue commandAlok Tiwari
In gve_adminq_issue_cmd(), return -EINVAL instead of 0 when an unknown admin queue command opcode is encountered. This prevents the function from silently succeeding on invalid input and prevents undefined behavior by ensuring the function fails gracefully when an unrecognized opcode is provided. These changes improve error handling. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20250616054504.1644770-2-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17gve: Fix various typos and improve code commentsAlok Tiwari
- Correct spelling and improves the clarity of comments "confiugration" -> "configuration" "spilt" -> "split" "It if is 0" -> "If it is 0" "DQ" -> "DQO" (correct abbreviation) - Clarify BIT(0) flag usage in gve_get_priv_flags() - Replaced hardcoded array size with GVE_NUM_PTYPES for clarity and maintainability. These changes are purely cosmetic and do not affect functionality. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250616054504.1644770-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17selftests: devmem: add ipv4 support to chunks testMina Almasry
Add ipv4 support to the recently added chunks tests, which was added as ipv6 only. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250615203511.591438-3-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17selftests: devmem: remove unused variableMina Almasry
Trivial fix to unused variable. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250615203511.591438-2-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17netmem: fix netmem commentsMina Almasry
Trivial fix to a couple of outdated netmem comments. No code changes, just more accurately describing current code. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250615203511.591438-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17selftest: Add selftest for multicast address notificationsYuyang Huang
This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and removing a dummy interface and then confirming that the system correctly receives join and removal notifications for the 224.0.0.1 and ff02::1 multicast addresses. The test relies on the iproute2 version to be 6.13+. Tested by the following command: $ vng -v --user root --cpus 16 -- \ make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink_notification.sh \ TEST_GEN_PROGS="" run_tests Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Yuyang Huang <yuyanghuang@google.com> Link: https://patch.msgid.link/20250614053522.623820-1-yuyanghuang@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge branch 'net-dsa-b53-fix-bcm5325-support'Jakub Kicinski
Álvaro Fernández Rojas says: ==================== net: dsa: b53: fix BCM5325 support These patches get the BCM5325 switch working with b53. The existing brcm legacy tag only works with BCM63xx switches. We need to add a new legacy tag for BCM5325 and BCM5365 switches, which require including the FCS and length. I'm not really sure that everything here is correct since I don't work for Broadcom and all this is based on the public datasheet available for the BCM5325 and my own experiments with a Huawei HG556a (BCM6358). Both sets of patches have been merged due to the change requested by Jonas about BRCM_HDR register access depending on legacy tags. ==================== Link: https://patch.msgid.link/20250614080000.1884236-1-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: dsa: b53: ensure BCM5325 PHYs are enabledÁlvaro Fernández Rojas
According to the datasheet, BCM5325 uses B53_PD_MODE_CTRL_25 register to disable clocking to individual PHYs. Only ports 1-4 can be enabled or disabled and the datasheet is explicit about not toggling BIT(0) since it disables the PLL power and the switch. Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250614080000.1884236-15-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>