summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-24net: txgbe: Add basic support for new AML devicesJiawen Wu
There is a new 40/25/10 Gigabit Ethernet device. To support basic functions, PHYLINK is temporarily skipped as it is intended to implement these configurations in the firmware. And the associated link IRQ is also skipped. And Implement the new SW-FW interaction interface, which use 64 Byte message buffer. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250221065718.197544-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net: ethernet: renesas: rcar_gen4_ptp: Remove bool conversionThorsten Blum
Remove the unnecessary bool conversion and simplify the code. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://patch.msgid.link/20250223233613.100518-2-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net: Remove shadow variable in netdev_run_todo()Breno Leitao
Fix a shadow variable warning in net/core/dev.c when compiled with CONFIG_LOCKDEP enabled. The warning occurs because 'dev' is redeclared inside the while loop, shadowing the outer scope declaration. net/core/dev.c:11211:22: warning: declaration shadows a local variable [-Wshadow] struct net_device *dev = list_first_entry(&unlink_list, net/core/dev.c:11202:21: note: previous declaration is here struct net_device *dev, *tmp; Remove the redundant declaration since the variable is already defined in the outer scope and will be overwritten in the subsequent list_for_each_entry_safe() loop anyway. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250221-netcons_fix_shadow-v1-1-dee20c8658dd@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24Merge branch 'net-stmmac-thead-clean-up-clock-rate-setting'Jakub Kicinski
Russell King says: ==================== net: stmmac: thead: clean up clock rate setting This series cleans up the thead clock rate setting to use the rgmii_clock() helper function added to phylib. The first patch switches over to using the rgmii_clock() helper, and the second patch cleans up the verification that the desired clock rate is achievable, allowing the private clock rate definitions to be removed. ==================== Tested-by: Drew Fustini <drew@pdp7.com> Link: https://patch.msgid.link/Z7iKdaCp4hLWWgJ2@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net: stmmac: thead: ensure divisor gives proper rateRussell King (Oracle)
thead was checking that the stmmac_clk rate was a multiple of the RGMII rates for 1G and 100M, but didn't check for 10M. Rather than use this with hard-coded speeds, check that the calculated divisor gives the required rate by multplying the transmit clock rate back up to the stmmac clock rate and checking that it agrees. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Drew Fustini <drew@pdp7.com> Link: https://patch.msgid.link/E1tlToD-004W3g-HB@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net: stmmac: thead: use rgmii_clock() for RGMII clock rateRussell King (Oracle)
Switch to using rgmii_clock() to get the RGMII TXC rate, and calculate the divisor from the parent clock rate and the rate indicated by rgmii_clock(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Drew Fustini <drew@pdp7.com> Link: https://patch.msgid.link/E1tlTo8-004W3a-CO@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24Merge branch 'net-remove-skb_flow_get_ports'Jakub Kicinski
Nicolas Dichtel says: ==================== net: remove skb_flow_get_ports() Remove skb_flow_get_ports() and rename __skb_flow_get_ports() to skb_flow_get_ports(). ==================== Link: https://patch.msgid.link/20250221110941.2041629-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net: remove '__' from __skb_flow_get_ports()Nicolas Dichtel
Only one version of skb_flow_get_ports() exists after the previous commit, so let's remove the useless '__'. Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20250221110941.2041629-3-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24skbuff: kill skb_flow_get_ports()Nicolas Dichtel
Since commit a815bde56b15 ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff"), this function is not used anymore. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250221110941.2041629-2-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net: stmmac: Correct usage of maximum queue number macrosKunihiko Hayashi
The maximum numbers of each Rx and Tx queues are defined by MTL_MAX_RX_QUEUES and MTL_MAX_TX_QUEUES respectively. There are some places where Rx and Tx are used in reverse. There is no issue when the Tx and Rx macros have the same value, but should correct usage of macros for maximum queue number to keep consistency and prevent unexpected mistakes. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Huacai Chen <chenhuacai@kernel.org> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Link: https://patch.msgid.link/20250221051818.4163678-1-hayashi.kunihiko@socionext.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net-sysfs: restore behavior for not running devicesEric Dumazet
modprobe dummy dumdummies=1 Old behavior : $ cat /sys/class/net/dummy0/carrier cat: /sys/class/net/dummy0/carrier: Invalid argument After blamed commit, an empty string is reported. $ cat /sys/class/net/dummy0/carrier $ In this commit, I restore the old behavior for carrier, speed and duplex attributes. Fixes: 79c61899b5ee ("net-sysfs: remove rtnl_trylock from device attributes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Marco Leogrande <leogrande@google.com> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250221051223.576726-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net: stmmac: qcom-ethqos: use rgmii_clock() to set the link clockRussell King (Oracle)
The link clock operates at twice the RGMII clock rate. Therefore, we can use the rgmii_clock() helper to set this clock rate. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vinod Koul <vkoul@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1tlRMK-004Vsx-Ss@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24virtio-net: tweak for better TX performance in NAPI modeJason Wang
There are several issues existed in start_xmit(): - Transmitted packets need to be freed before sending a packet, this introduces delay and increases the average packets transmit time. This also increase the time that spent in holding the TX lock. - Notification is enabled after free_old_xmit_skbs() which will introduce unnecessary interrupts if TX notification happens on the same CPU that is doing the transmission now (actually, virtio-net driver are optimized for this case). So this patch tries to avoid those issues by not cleaning transmitted packets in start_xmit() when TX NAPI is enabled and disable notifications even more aggressively. Notification will be since the beginning of the start_xmit(). But we can't enable delayed notification after TX is stopped as we will lose the notifications. Instead, the delayed notification needs is enabled after the virtqueue is kicked for best performance. Performance numbers: 1) single queue 2 vcpus guest with pktgen_sample03_burst_single_flow.sh (burst 256) + testpmd (rxonly) on the host: - When pinning TX IRQ to pktgen VCPU: split virtqueue PPS were increased 55% from 6.89 Mpps to 10.7 Mpps and 32% TX interrupts were eliminated. Packed virtqueue PPS were increased 50% from 7.09 Mpps to 10.7 Mpps, 99% TX interrupts were eliminated. - When pinning TX IRQ to VCPU other than pktgen: split virtqueue PPS were increased 96% from 5.29 Mpps to 10.4 Mpps and 45% TX interrupts were eliminated; Packed virtqueue PPS were increased 78% from 6.12 Mpps to 10.9 Mpps and 99% TX interrupts were eliminated. 2) single queue 1 vcpu guest + vhost-net/TAP on the host: single session netperf from guest to host shows 82% improvement from 31Gb/s to 58Gb/s, %stddev were reduced from 34.5% to 1.9% and 88% of TX interrupts were eliminated. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-23net/mlx5: Change POOL_NEXT_SIZE define value and make it globalPatrisious Haddad
Change POOL_NEXT_SIZE define value from 0 to BIT(30), since this define is used to request the available maximum sized flow table, and zero doesn't make sense for it, whereas some places in the driver use zero explicitly expecting the smallest table size possible but instead due to this define they end up allocating the biggest table size unawarely. In addition move the definition to "include/linux/mlx5/fs.h" to expose the define to IB driver as well, while appropriately renaming it. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250219085808.349923-3-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-23net/mlx5: Add new health syndrome error and crr bit offsetShahar Shitrit
Add new error value for trust lockdown in health syndrome enum. Also, include the offset for crr bit in the health buffer layout. These changes prepare for downstream patches that update health event handling. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250219085808.349923-2-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-21Merge branch 'mctp-add-mctp-over-usb-hardware-transport-binding'Jakub Kicinski
Jeremy Kerr says: ==================== mctp: Add MCTP-over-USB hardware transport binding Add an implementation of the DMTF standard DSP0283, providing an MCTP channel over high-speed USB. This is a fairly trivial first implementation, in that we only submit one tx and one rx URB at a time. We do accept multi-packet transfers, but do not yet generate them on transmit. Of course, questions and comments are most welcome, particularly on the USB interfaces. v2: https://lore.kernel.org/20250212-dev-mctp-usb-v2-0-76e67025d764@codeconstruct.com.au v1: https://lore.kernel.org/20250206-dev-mctp-usb-v1-0-81453fe26a61@codeconstruct.com.au ==================== Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-0-3353030fe9cc@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: mctp: Add MCTP USB transport driverJeremy Kerr
Add an implementation for DMTF DSP0283, which defines a MCTP-over-USB transport. As per that spec, we're restricted to full speed mode, requiring 512-byte transfers. Each MCTP-over-USB interface is a peer-to-peer link to a single MCTP endpoint, so no physical addressing is required (of course, that MCTP endpoint may then bridge to further MCTP endpoints). Consequently, interfaces will report with no lladdr data: # mctp link dev lo index 1 address 00:00:00:00:00:00 net 1 mtu 65536 up dev mctpusb0 index 6 address none net 1 mtu 68 up This is a simple initial implementation, with single rx & tx urbs, and no multi-packet tx transfers - although we do accept multi-packet rx from the device. Includes suggested fixes from Santosh Puranik <spuranik@nvidia.com>. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Cc: Santosh Puranik <spuranik@nvidia.com> Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-2-3353030fe9cc@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21usb: Add base USB MCTP definitionsJeremy Kerr
Upcoming changes will add a USB host (and later gadget) driver for the MCTP-over-USB protocol. Add a header that provides common definitions for protocol support: the packet header format and a few framing definitions. Add a define for the MCTP class code, as per https://usb.org/defined-class-codes. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-1-3353030fe9cc@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: cadence: macb: Implement BQLSean Anderson
Implement byte queue limits to allow queuing disciplines to account for packets enqueued in the ring buffer but not yet transmitted. There are a separate set of transmit functions for AT91 that I haven't touched since I don't have hardware to test on. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250220164257.96859-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: stmmac: print stmmac_init_dma_engine() errors using netdev_err()Russell King (Oracle)
stmmac_init_dma_engine() uses dev_err() which leads to errors being reported as e.g: dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed stmmac_init_dma_engine() is only called from stmmac_hw_setup() which itself uses netdev_err(), and we will have a net_device setup. So, change the dev_err() to netdev_err() to give consistent error messages. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tl5y1-004UgG-8X@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21selftests: fib_nexthops: do not mark skipped tests as failedHangbin Liu
The current test marks all unexpected return values as failed and sets ret to 1. If a test is skipped, the entire test also returns 1, incorrectly indicating failure. To fix this, add a skipped variable and set ret to 4 if it was previously 0. Otherwise, keep ret set to 1. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250220085326.1512814-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21Merge branch 'net-fib_rules-add-dscp-mask-support'Jakub Kicinski
Ido Schimmel says: ==================== net: fib_rules: Add DSCP mask support In some deployments users would like to encode path information into certain bits of the IPv6 flow label, the UDP source port and the DSCP field and use this information to route packets accordingly. Redirecting traffic to a routing table based on specific bits in the DSCP field is not currently possible. Only exact match is currently supported by FIB rules. This patchset extends FIB rules to match on the DSCP field with an optional mask. Patches #1-#5 gradually extend FIB rules to match on the DSCP field with an optional mask. Patch #6 adds test cases for the new functionality. iproute2 support can be found here [1]. [1] https://github.com/idosch/iproute2/tree/submit/fib_rule_mask_v1 ==================== Link: https://patch.msgid.link/20250220080525.831924-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21selftests: fib_rule_tests: Add DSCP mask match testsIdo Schimmel
Add tests for FIB rules that match on DSCP with a mask. Test both good and bad flows and both the input and output paths. # ./fib_rule_tests.sh IPv6 FIB rule tests [...] TEST: rule6 check: dscp redirect to table [ OK ] TEST: rule6 check: dscp no redirect to table [ OK ] TEST: rule6 del by pref: dscp redirect to table [ OK ] TEST: rule6 check: iif dscp redirect to table [ OK ] TEST: rule6 check: iif dscp no redirect to table [ OK ] TEST: rule6 del by pref: iif dscp redirect to table [ OK ] TEST: rule6 check: dscp masked redirect to table [ OK ] TEST: rule6 check: dscp masked no redirect to table [ OK ] TEST: rule6 del by pref: dscp masked redirect to table [ OK ] TEST: rule6 check: iif dscp masked redirect to table [ OK ] TEST: rule6 check: iif dscp masked no redirect to table [ OK ] TEST: rule6 del by pref: iif dscp masked redirect to table [ OK ] [...] Tests passed: 316 Tests failed: 0 Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20250220080525.831924-7-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21netlink: specs: Add FIB rule DSCP mask attributeIdo Schimmel
Add new DSCP mask attribute to the spec. Example: # ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \ --do newrule \ --json '{"family": 2, "dscp": 10, "dscp-mask": 63, "action": 1, "table": 1}' None $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \ --dump getrule --json '{"family": 2}' --output-json | jq '.[]' [...] { "table": 1, "suppress-prefixlen": "0xffffffff", "protocol": 0, "priority": 32765, "dscp": 10, "dscp-mask": "0x3f", "family": 2, "dst-len": 0, "src-len": 0, "tos": 0, "action": "to-tbl", "flags": 0 } [...] Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20250220080525.831924-6-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: fib_rules: Enable DSCP mask usageIdo Schimmel
Allow user space to configure FIB rules that match on DSCP with a mask, now that support has been added to the IPv4 and IPv6 address families. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20250220080525.831924-5-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21ipv6: fib_rules: Add DSCP mask matchingIdo Schimmel
Extend IPv6 FIB rules to match on DSCP using a mask. Unlike IPv4, also initialize the DSCP mask when a non-zero 'tos' is specified as there is no difference in matching between 'tos' and 'dscp'. As a side effect, this makes it possible to match on 'dscp 0', like in IPv4. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20250220080525.831924-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21ipv4: fib_rules: Add DSCP mask matchingIdo Schimmel
Extend IPv4 FIB rules to match on DSCP using a mask. The mask is only set in rules that match on DSCP (not TOS) and initialized to cover the entire DSCP field if the mask attribute is not specified. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20250220080525.831924-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: fib_rules: Add DSCP mask attributeIdo Schimmel
Add an attribute that allows matching on DSCP with a mask. Matching on DSCP with a mask is needed in deployments where users encode path information into certain bits of the DSCP field. Temporarily set the type of the attribute to 'NLA_REJECT' while support is being added. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20250220080525.831924-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-02-20 We've added 19 non-merge commits during the last 8 day(s) which contain a total of 35 files changed, 1126 insertions(+), 53 deletions(-). The main changes are: 1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing 2) Add network TX timestamping support to BPF sock_ops, from Jason Xing 3) Add TX metadata Launch Time support, from Song Yoong Siang * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: igc: Add launch time support to XDP ZC igc: Refactor empty frame insertion for launch time support net: stmmac: Add launch time support to XDP ZC selftests/bpf: Add launch time request to xdp_hw_metadata xsk: Add launch time hardware offload support to XDP Tx metadata selftests/bpf: Add simple bpf tests in the tx path for timestamping feature bpf: Support selective sampling for bpf timestamping bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING bpf: Disable unsafe helpers in TX timestamping callbacks bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping bpf: Add networking timestamping support to bpf_get/setsockopt() selftests/bpf: Add rto max for bpf_setsockopt test bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt ==================== Link: https://patch.msgid.link/20250221022104.386462-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21gve: Add RSS cache for non RSS device option scenarioZiwei Xiao
Not all the devices have the capability for the driver to query for the registered RSS configuration. The driver can discover this by checking the relevant device option during setup. If it cannot, the driver needs to store the RSS config cache and directly return such cache when queried by the ethtool. RSS config is inited when driver probes. Also the default RSS config will be adjusted when there is RX queue count change. At this point, only keys of GVE_RSS_KEY_SIZE and indirection tables of GVE_RSS_INDIR_SIZE are supported. Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250219200451.3348166-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net/rds: Replace deprecated strncpy() with strscpy_pad()Thorsten Blum
strncpy() is deprecated for NUL-terminated destination buffers. Use strscpy_pad() instead and remove the manual NUL-termination. Compile-tested only. Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Tested-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20250219224730.73093-2-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21Merge branch 'net-improve-netns-handling-in-rtnetlink'Jakub Kicinski
Xiao Liang says: ==================== net: Improve netns handling in rtnetlink This patch series includes some netns-related improvements and fixes for rtnetlink, to make link creation more intuitive: 1) Creating link in another net namespace doesn't conflict with link names in current one. 2) Refector rtnetlink link creation. Create link in target namespace directly. So that # ip link add netns ns1 link-netns ns2 tun0 type gre ... will create tun0 in ns1, rather than create it in ns2 and move to ns1. And don't conflict with another interface named "tun0" in current netns. Patch 01 avoids link name conflict in different netns. To achieve 2), there're mainly 3 steps: - Patch 02 packs newlink() parameters into a struct, including the original "src_net" along with more netns context. No semantic changes are introduced. - Patch 03 ~ 09 converts device drivers to use the explicit netns extracted from params. - Patch 10 ~ 11 removes the old netns parameter, and converts rtnetlink to create device in target netns directly. Patch 12 ~ 13 adds some tests for link name and link netns. --- Please note there're some issues found in current code: - In amt_newlink() drivers/net/amt.c: amt->net = net; ... amt->stream_dev = dev_get_by_index(net, ... Uses net, but amt_lookup_upper_dev() only searches in dev_net. So the AMT device may not be properly deleted if it's in a different netns from lower dev. - In lowpan_newlink() in net/ieee802154/6lowpan/core.c: wdev = dev_get_by_index(dev_net(ldev), nla_get_u32(tb[IFLA_LINK])); Looks for IFLA_LINK in dev_net, but in theory the ifindex is defined in link netns. And thanks to Kuniyuki for fixing related issues in gtp and pfcp: https://lore.kernel.org/netdev/20250110014754.33847-1-kuniyu@amazon.com/ v9: https://lore.kernel.org/20250210133002.883422-1-shaw.leon@gmail.com v8: https://lore.kernel.org/20250113143719.7948-1-shaw.leon@gmail.com v7: https://lore.kernel.org/20250104125732.17335-1-shaw.leon@gmail.com v6: https://lore.kernel.org/20241218130909.2173-1-shaw.leon@gmail.com v5: https://lore.kernel.org/20241209140151.231257-1-shaw.leon@gmail.com v4: https://lore.kernel.org/20241118143244.1773-1-shaw.leon@gmail.com v3: https://lore.kernel.org/20241113125715.150201-1-shaw.leon@gmail.com v2: https://lore.kernel.org/20241107133004.7469-1-shaw.leon@gmail.com v1: https://lore.kernel.org/20241023023146.372653-1-shaw.leon@gmail.com ==================== Link: https://patch.msgid.link/20250219125039.18024-1-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21selftests: net: Add test cases for link and peer netnsXiao Liang
- Add test for creating link in another netns when a link of the same name and ifindex exists in current netns. - Add test to verify that link is created in target netns directly - no link new/del events should be generated in link netns or current netns. - Add test cases to verify that link-netns is set as expected for various drivers and combination of namespace-related parameters. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Link: https://patch.msgid.link/20250219125039.18024-14-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21selftests: net: Add python context manager for netns enteringXiao Liang
Change netns of current thread and switch back on context exit. For example: with NetNSEnter("ns1"): ip("link add dummy0 type dummy") The command be executed in netns "ns1". Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Link: https://patch.msgid.link/20250219125039.18024-13-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21rtnetlink: Create link directly in target net namespaceXiao Liang
Make rtnl_newlink_create() create device in target namespace directly. Avoid extra netns change when link netns is provided. Device drivers has been converted to be aware of link netns, that is not assuming device netns is and link netns is the same when ops->newlink() is called. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-12-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21rtnetlink: Remove "net" from newlink paramsXiao Liang
Now that devices have been converted to use the specific netns instead of ambiguous "net", let's remove it from newlink parameters. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-11-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: xfrm: Use link netns in newlink() of rtnl_link_opsXiao Liang
When link_net is set, use it as link netns instead of dev_net(). This prepares for rtnetlink core to create device in target netns directly, in which case the two namespaces may be different. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-10-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: ipv6: Use link netns in newlink() of rtnl_link_opsXiao Liang
When link_net is set, use it as link netns instead of dev_net(). This prepares for rtnetlink core to create device in target netns directly, in which case the two namespaces may be different. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-9-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: ipv6: Init tunnel link-netns before registering devXiao Liang
Currently some IPv6 tunnel drivers set tnl->net to dev_net(dev) in ndo_init(), which is called in register_netdevice(). However, it lacks the context of link-netns when we enable cross-net tunnels at device registration time. Let's move the init of tunnel link-netns before register_netdevice(). ip6_gre has already initialized netns, so just remove the redundant assignment. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-8-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: ip_tunnel: Use link netns in newlink() of rtnl_link_opsXiao Liang
When link_net is set, use it as link netns instead of dev_net(). This prepares for rtnetlink core to create device in target netns directly, in which case the two namespaces may be different. Convert common ip_tunnel_newlink() to accept an extra link netns argument. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-7-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: ip_tunnel: Don't set tunnel->net in ip_tunnel_init()Xiao Liang
ip_tunnel_init() is called from register_netdevice(). In all code paths reaching here, tunnel->net should already have been set (either in ip_tunnel_newlink() or __ip_tunnel_create()). So don't set it again. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-6-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21ieee802154: 6lowpan: Validate link netns in newlink() of rtnl_link_opsXiao Liang
Device denoted by IFLA_LINK is in link_net (IFLA_LINK_NETNSID) or source netns by design, but 6lowpan uses dev_net. Note dev->netns_local is set to true and currently link_net is implemented via a netns change. These together effectively reject IFLA_LINK_NETNSID. This patch adds a validation to ensure link_net is either NULL or identical to dev_net. Thus it would be fine to continue using dev_net when rtnetlink core begins to create devices directly in target netns. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-5-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: Use link/peer netns in newlink() of rtnl_link_opsXiao Liang
Add two helper functions - rtnl_newlink_link_net() and rtnl_newlink_peer_net() for netns fallback logic. Peer netns falls back to link netns, and link netns falls back to source netns. Convert the use of params->net in netdevice drivers to one of the helper functions for clarity. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-4-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21rtnetlink: Pack newlink() params into structXiao Liang
There are 4 net namespaces involved when creating links: - source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend), - peer netns - netns of peer device. Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on netlink attributes in the request. +------------+-------------------+---------+---------+ | peer netns | IFLA_LINK_NETNSID | src_net | dev_net | +------------+-------------------+---------+---------+ | | absent | source | target | | absent +-------------------+---------+---------+ | | present | link | link | +------------+-------------------+---------+---------+ | | absent | peer | target | | present +-------------------+---------+---------+ | | present | peer | link | +------------+-------------------+---------+---------+ When IFLA_LINK_NETNSID is present, the device is created in link netns first and then moved to target netns. This has some side effects, including extra ifindex allocation, ifname validation and link events. These could be avoided if we create it in target netns from the beginning. On the other hand, the meaning of src_net parameter is ambiguous. It varies depending on how parameters are passed. It is the effective link (or peer netns) by design, but some drivers ignore it and use dev_net instead. To provide more netns context for drivers, this patch packs existing newlink() parameters, along with the source netns, link netns and peer netns, into a struct. The old "src_net" is renamed to "net" to avoid confusion with real source netns, and will be deprecated later. The use of src_net are converted to params->net trivially. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21rtnetlink: Lookup device in target netns when creating linkXiao Liang
When creating link, lookup for existing device in target net namespace instead of current one. For example, two links created by: # ip link add dummy1 type dummy # ip link add netns ns1 dummy1 type dummy should have no conflict since they are in different namespaces. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-2-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21Merge branch 'dt-bindings-net-realtek-rtl9301-switch'Jakub Kicinski
Chris Packham says: ==================== dt-bindings: net: realtek,rtl9301-switch (schema part) This is my attempt at trying to sort out the mess I've created with the RTL9300 switch dt-bindings. Some context is available on [1] and [2]. The first patch just moves the binding from mfd/ to net/ (with an adjustment of the internal path name). The next two patches are successors to patches already sent as part of the series [3]. [1] - https://lore.kernel.org/lkml/20250204-eccentric-deer-of-felicity-02b7ee@krzk-bin/ [2] - https://lore.kernel.org/lkml/4e3c5d83-d215-4eff-bf02-6d420592df8f@alliedtelesis.co.nz/ [3] - https://lore.kernel.org/lkml/20250204030249.1965444-1-chris.packham@alliedtelesis.co.nz/ ==================== Link: https://patch.msgid.link/20250218195216.1034220-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21dt-bindings: net: Add Realtek MDIO controllerChris Packham
Add dtschema for the MDIO controller found in the RTL9300 Ethernet switch. The controller is slightly unusual in that direct MDIO communication is not possible. We model the MDIO controller with the MDIO buses as child nodes and the PHYs as children of the buses. The mapping of switch port number to MDIO bus/addr requires the ethernet-ports sibling to provide the mapping via the phy-handle property. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250218195216.1034220-4-chris.packham@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21dt-bindings: net: Add switch ports and interrupts to RTL9300Chris Packham
Add bindings for the ethernet-switch and interrupt properties for the RTL9300. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250218195216.1034220-3-chris.packham@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21dt-bindings: net: Move realtek,rtl9301-switch to netChris Packham
Initially realtek,rtl9301-switch was placed under mfd/ because it had some non-switch related blocks (specifically i2c and reset) but with a bit more review it has become apparent that this was wrong and the binding should live under net/. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Lee Jones <lee@kernel.org> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250218195216.1034220-2-chris.packham@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: sfp: add quirk for 2.5G OEM BX SFPBirger Koblitz
The OEM SFP-2.5G-BX10-D/U SFP module pair is meant to operate with 2500Base-X. However, in their EEPROM they incorrectly specify: Transceiver codes : 0x00 0x12 0x00 0x00 0x12 0x00 0x01 0x05 0x00 BR, Nominal : 2500MBd Use sfp_quirk_2500basex for this module to allow 2500Base-X mode anyway. Tested on BananaPi R3. Signed-off-by: Birger Koblitz <mail@birger-koblitz.de> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/20250218-b4-lkmsub-v1-1-1e51dcabed90@birger-koblitz.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>