summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-19can: c_can: Simplify handling syscon error pathKrzysztof Kozlowski
Use error handling block instead of open-coding it in one of probe failure cases. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20250212-syscon-phandle-args-can-v2-2-ac9a1253396b@linaro.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-02-19can: c_can: Drop useless final probe failure messageKrzysztof Kozlowski
Generic probe failure message is useless: does not give information what failed and it duplicates messages provided by the core, e.g. from memory allocation or platform_get_irq(). It also floods dmesg in case of deferred probe, e.g. resulting from devm_clk_get(). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20250212-syscon-phandle-args-can-v2-1-ac9a1253396b@linaro.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-02-19Merge branch 'net-mana-big-tcp'David S. Miller
Shradha Gupta says: ==================== net: Enable Big TCP for MANA devices Allow the max gso/gro aggregated pkt size to go up to GSO_MAX_SIZE for MANA NIC. On Azure, this not possible without allowing the same for netvsc NIC (as the NICs are bonded together). Therefore, we use netif_set_tso_max_size() to set max aggregated pkt size to VF's tso_max_size for netvsc too, when the data path is switched over to the VF The first patch allows MANA to configure aggregated pkt size of up-to GSO_MAX_SIZE The second patch enables the same on the netvsc NIC, if the data path for the bonded NIC is switched to the VF --- Changes in v3 * Add ipv6_hopopt_jumbo_remove() while sending Big TCP packets --- Changes in v2 * Instead of using 'tcp segment' throughout the patch used the words 'aggregated pkt size' ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-19hv_netvsc: Use VF's tso_max_size value when data path is VFShradha Gupta
On Azure, increasing VF's gso/gro packet size to up-to GSO_MAX_SIZE is not possible without allowing the same for netvsc NIC (as the NICs are bonded together). For bonded NICs, the min of the max aggregated pkt size of the members is propagated in the stack. Therefore, we use netif_set_tso_max_size() to set max aggregated pkt size to VF's packet size for netvsc too, when the data path is switched over to the VF Tested on azure env with Accelerated Networking enabled and disabled. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-19net: mana: Allow tso_max_size to go up-to GSO_MAX_SIZEShradha Gupta
Allow the max aggregated pkt size to go up-to GSO_MAX_SIZE for MANA NIC. This patch only increases the max allowable gso/gro pkt size for MANA devices and does not change the defaults. Following are the perf benefits by increasing the pkt aggregate size from legacy gso_max_size value(64K) to newer one(up-to 511K IPv4 tests for i in {1..10}; do netperf -t TCP_RR -H 10.0.0.5 -p50000 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done min p90 p99 Throughput gso_max_size 93 171 194 6594.25 97 154 180 7183.74 95 165 189 6927.86 96 165 188 6976.04 93 154 185 7338.05 64K 93 168 189 6938.03 94 169 189 6784.93 92 166 189 7117.56 94 179 191 6678.44 95 157 183 7277.81 min p90 p99 Throughput 93 134 146 8448.75 95 134 140 8396.54 94 137 148 8204.12 94 137 148 8244.41 94 128 139 8666.52 80K 94 141 153 8116.86 94 138 149 8163.92 92 135 142 8362.72 92 134 142 8497.57 93 136 148 8393.23 IPv6 Tests for i in {1..10}; do netperf -t TCP_RR -H fd00:9013:cadd::4 -p50000 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done min p90 p99 Throughput gso_max_size 108 165 170 6673.2 101 169 189 6451.69 101 165 169 6737.65 102 167 175 6614.64 101 178 189 6247.13 64K 107 163 169 6678.63 106 176 187 6350.86 100 164 169 6617.36 102 163 170 6849.21 102 168 175 6605.7 min p90 p99 Throughput 108 155 166 7183 110 154 163 7268.87 109 152 159 7434.35 107 145 157 7569.15 107 149 164 7496.17 80K 110 154 159 7245.85 108 156 162 7266.24 109 145 158 7526.66 106 145 151 7785.75 111 148 157 7246.65 Tested on azure env with Accelerated Networking enabled and disabled. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-19cgroup/dmem: Don't open-code css_for_each_descendant_preFriedrich Vock
The current implementation has a bug: If the current css doesn't contain any pool that is a descendant of the "pool" (i.e. when found_descendant == false), then "pool" will point to some unrelated pool. If the current css has a child, we'll overwrite parent_pool with this unrelated pool on the next iteration. Since we can just check whether a pool refers to the same region to determine whether or not it's related, all the additional pool tracking is unnecessary, so just switch to using css_for_each_descendant_pre for traversal. Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup") Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250127152754.21325-1-friedrich.vock@gmx.de Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-02-19cxl: Fix cross-reference in documentation and add deprecation warningAndrew Donnellan
commit 5731d41af924 ("cxl: Deprecate driver") labelled the cxl driver as deprecated and moved the ABI documentation to the obsolete/ subdirectory, but didn't update cxl.rst, causing a warning once ff7ff6eb4f809 ("docs: media: Allow creating cross-references for RC ABI") was merged. Fix the cross-reference, and also add a deprecation warning. Fixes: 5731d41af924 ("cxl: Deprecate driver") Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Acked-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250219064807.175107-1-ajd@linux.ibm.com
2025-02-18Merge branch 'net-fix-race-of-rtnl_net_lock-dev_net-dev'Jakub Kicinski
Kuniyuki Iwashima says: ==================== net: Fix race of rtnl_net_lock(dev_net(dev)). Yael Chemla reported that commit 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().") started to trigger KASAN's use-after-free splat. The problem is that dev_net(dev) fetched before rtnl_net_lock() might be different after rtnl_net_lock(). The patch 2 fixes the issue by checking dev_net(dev) after rtnl_net_lock(), and the patch 3 fixes the same potential issue that would emerge once RTNL is removed. v4: https://lore.kernel.org/20250212064206.18159-1-kuniyu@amazon.com v3: https://lore.kernel.org/20250211051217.12613-1-kuniyu@amazon.com v2: https://lore.kernel.org/20250207044251.65421-1-kuniyu@amazon.com v1: https://lore.kernel.org/20250130232435.43622-1-kuniyu@amazon.com ==================== Link: https://patch.msgid.link/20250217191129.19967-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18dev: Use rtnl_net_dev_lock() in unregister_netdev().Kuniyuki Iwashima
The following sequence is basically illegal when dev was fetched without lookup because dev_net(dev) might be different after holding rtnl_net_lock(): net = dev_net(dev); rtnl_net_lock(net); Let's use rtnl_net_dev_lock() in unregister_netdev(). Note that there is no real bug in unregister_netdev() for now because RTNL protects the scope even if dev_net(dev) is changed before/after RTNL. Fixes: 00fb9823939e ("dev: Hold per-netns RTNL in (un)?register_netdev().") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250217191129.19967-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().Kuniyuki Iwashima
After the cited commit, dev_net(dev) is fetched before holding RTNL and passed to __unregister_netdevice_notifier_net(). However, dev_net(dev) might be different after holding RTNL. In the reported case [0], while removing a VF device, its netns was being dismantled and the VF was moved to init_net. So the following sequence is basically illegal when dev was fetched without lookup: net = dev_net(dev); rtnl_net_lock(net); Let's use a new helper rtnl_net_dev_lock() to fix the race. It fetches dev_net_rcu(dev), bumps its net->passive, and checks if dev_net_rcu(dev) is changed after rtnl_net_lock(). [0]: BUG: KASAN: slab-use-after-free in notifier_call_chain (kernel/notifier.c:75 (discriminator 2)) Read of size 8 at addr ffff88810cefb4c8 by task test-bridge-lag/21127 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) print_report (mm/kasan/report.c:379 mm/kasan/report.c:489) kasan_report (mm/kasan/report.c:604) notifier_call_chain (kernel/notifier.c:75 (discriminator 2)) call_netdevice_notifiers_info (net/core/dev.c:2011) unregister_netdevice_many_notify (net/core/dev.c:11551) unregister_netdevice_queue (net/core/dev.c:11487) unregister_netdev (net/core/dev.c:11635) mlx5e_remove (drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6552 drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6579) mlx5_core auxiliary_bus_remove (drivers/base/auxiliary.c:230) device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296) bus_remove_device (./include/linux/kobject.h:193 drivers/base/base.h:73 drivers/base/bus.c:583) device_del (drivers/base/power/power.h:142 drivers/base/core.c:3855) mlx5_rescan_drivers_locked (./include/linux/auxiliary_bus.h:241 drivers/net/ethernet/mellanox/mlx5/core/dev.c:333 drivers/net/ethernet/mellanox/mlx5/core/dev.c:535 drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core mlx5_unregister_device (drivers/net/ethernet/mellanox/mlx5/core/dev.c:468) mlx5_core mlx5_uninit_one (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 drivers/net/ethernet/mellanox/mlx5/core/main.c:1563) mlx5_core remove_one (drivers/net/ethernet/mellanox/mlx5/core/main.c:965 drivers/net/ethernet/mellanox/mlx5/core/main.c:2019) mlx5_core pci_device_remove (./include/linux/pm_runtime.h:129 drivers/pci/pci-driver.c:475) device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296) unbind_store (drivers/base/bus.c:245) kernfs_fop_write_iter (fs/kernfs/file.c:338) vfs_write (fs/read_write.c:587 (discriminator 1) fs/read_write.c:679 (discriminator 1)) ksys_write (fs/read_write.c:732) do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) RIP: 0033:0x7f6a4d5018b7 Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().") Reported-by: Yael Chemla <ychemla@nvidia.com> Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250217191129.19967-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: Add net_passive_inc() and net_passive_dec().Kuniyuki Iwashima
net_drop_ns() is NULL when CONFIG_NET_NS is disabled. The next patch introduces a function that increments and decrements net->passive. As a prep, let's rename and export net_free() to net_passive_dec() and add net_passive_inc(). Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/CANn89i+oUCt2VGvrbrweniTendZFEh+nwS=uonc004-aPkWy-Q@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250217191129.19967-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: pse-pd: pd692x0: Fix power limit retrievalKory Maincent
Fix incorrect data offset read in the pd692x0_pi_get_pw_limit callback. The issue was previously unnoticed as it was only used by the regulator API and not thoroughly tested, since the PSE is mainly controlled via ethtool. The function became actively used by ethtool after commit 3e9dbfec4998 ("net: pse-pd: Split ethtool_get_status into multiple callbacks"), which led to the discovery of this issue. Fix it by using the correct data offset. Fixes: a87e699c9d33 ("net: pse-pd: pd692x0: Enhance with new current limit and voltage read callbacks") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250217134812.1925345-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18Merge branch 'net-deduplicate-cookie-logic'Jakub Kicinski
Willem de Bruijn says: ==================== net: deduplicate cookie logic Reuse standard sk, ip and ipv6 cookie init handlers where possible. Avoid repeated open coding of the same logic. Harmonize feature sets across protocols. Make IPv4 and IPv6 logic more alike. Simplify adding future new fields with a single init point. ==================== Link: https://patch.msgid.link/20250214222720.3205500-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18ipv6: initialize inet socket cookies with sockcm_initWillem de Bruijn
Avoid open coding the same logic. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-8-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18ipv6: replace ipcm6_init calls with ipcm6_init_skWillem de Bruijn
This initializes tclass and dontfrag before cmsg parsing, removing the need for explicit checks against -1 in each caller. Leave hlimit set to -1, because its full initialization (in ip6_sk_dst_hoplimit) requires more state (dst, flowi6, ..). This also prepares for calling sockcm_init in a follow-on patch. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-7-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18icmp: reflect tos through ip cookie rather than updating inet_skWillem de Bruijn
Do not modify socket fields if it can be avoided. The current code predates the introduction of ip cookies in commit aa6615814533 ("ipv4: processing ancillary IP_TOS or IP_TTL"). Now that cookies exist and support tos, update that field directly. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-6-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18ipv4: remove get_rttosWillem de Bruijn
Initialize the ip cookie tos field when initializing the cookie, in ipcm_init_sk. The existing code inverts the standard pattern for initializing cookie fields. Default is to initialize the field from the sk, then possibly overwrite that when parsing cmsgs (the unlikely case). This field inverts that, setting the field to an illegal value and after cmsg parsing checking whether the value is still illegal and thus should be overridden. Be careful to always apply mask INET_DSCP_MASK, as before. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-5-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18ipv4: initialize inet socket cookies with sockcm_initWillem de Bruijn
Avoid open coding the same logic. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-4-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: initialize mark in sockcm_initWillem de Bruijn
Avoid open coding initialization of sockcm fields. Avoid reading the sk_priority field twice. This ensures all callers, existing and future, will correctly try a cmsg passed mark before sk_mark. This patch extends support for cmsg mark to: packet_spkt and packet_tpacket and net/can/raw.c. This patch extends support for cmsg priority to: packet_spkt and packet_tpacket. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-3-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18tcp: only initialize sockcm tsflags fieldWillem de Bruijn
TCP only reads the tsflags field. Don't bother initializing others. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-2-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: stmmac: Use str_enabled_disabled() helperYu-Chun Lin
As kernel test robot reported, the following warning occurs: cocci warnings: (new ones prefixed by >>) >> drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c:582:6-8: opportunity for str_enabled_disabled(on) Replace ternary (condition ? "enabled" : "disabled") with str_enabled_disabled() from string_choices.h to improve readability, maintain uniform string usage, and reduce binary size through linker deduplication. Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> Link: https://patch.msgid.link/20250217155833.3105775-1-eleanor15x@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: Remove redundant variable declaration in __dev_change_flags()Breno Leitao
The old_flags variable is declared twice in __dev_change_flags(), causing a shadow variable warning. This patch fixes the issue by removing the redundant declaration, reusing the existing old_flags variable instead. net/core/dev.c:9225:16: warning: declaration shadows a local variable [-Wshadow] 9225 | unsigned int old_flags = dev->flags; | ^ net/core/dev.c:9185:15: note: previous declaration is here 9185 | unsigned int old_flags = dev->flags; | ^ 1 warning generated. Remove the redundant inner declaration and reuse the existing old_flags variable since its value is not needed outside the if block, and it is safe to reuse the variable. This eliminates the warning while maintaining the same functionality. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250217-old_flags-v2-1-4cda3b43a35f@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18selftests: net: Fix few spelling mistakesChandra Mohan Sundar
Fix few spelling mistakes in net selftests Signed-off-by: Chandra Mohan Sundar <chandru.dav@gmail.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250217141520.81033-1-chandru.dav@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: ethernet: mediatek: add EEE supportQingfang Deng
Add EEE support to MediaTek SoC Ethernet. The register fields are similar to the ones in MT7531, except that the LPI threshold is in milliseconds. Signed-off-by: Qingfang Deng <dqfext@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250217094022.1065436-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: freescale: ucc_geth: make ugeth_mac_ops be static constPei Xiao
sparse warning: sparse: symbol 'ugeth_mac_ops' was not declared. Should it be static. Add static to fix sparse warnings and add const. phylink_create() will accept a const struct. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/202502141128.9HfxcdIE-lkp@intel.com Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18Merge branch 'net-phy-improve-and-simplify-eee-handling-in-phylib'Jakub Kicinski
Heiner Kallweit says: ==================== net: phy: improve and simplify EEE handling in phylib This series improves and simplifies phylib's EEE handling. ==================== Link: https://patch.msgid.link/3caa3151-13ac-44a8-9bb6-20f82563f698@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: phy: c45: remove local advertisement parameter from ↵Heiner Kallweit
genphy_c45_eee_is_active After the last user has gone, we can remove the local advertisement parameter from genphy_c45_eee_is_active. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/bd121330-9e28-4bc8-8422-794bd54d561f@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: phy: c45: use cached EEE advertisement in genphy_c45_ethtool_get_eeeHeiner Kallweit
Now that disabled EEE modes are considered when populating advertising_eee, we can use this bitmap here instead of reading the PHY register. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/e57ed3d4-d0bc-4f91-83f6-8f48dfb6d7d7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: phy: c45: Don't silently remove disabled EEE modes any longer when ↵Heiner Kallweit
writing advertisement register advertising_eee is adjusted now whenever an EEE mode gets disabled. Therefore we can remove the silent removal of disabled EEE modes here. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/e95b9dad-24a7-4e3e-9af9-6f0770cf1520@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: phy: remove disabled EEE modes from advertising_eee in phy_probeHeiner Kallweit
A PHY driver may populate eee_disabled_modes in its probe or get_features callback, therefore filter the EEE advertisement read from the PHY. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/493f3e2e-9cfc-445d-adbe-58d9c117a489@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: phy: improve phy_disable_eee_modeHeiner Kallweit
If a mode is to be disabled, remove it from advertising_eee. Disabling EEE modes shall be done before calling phy_start(), warn if that's not the case. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/92164896-38ff-4474-b98b-e83fc05b9509@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: phy: move definition of phy_is_started before phy_disable_eee_modeHeiner Kallweit
In preparation of a follow-up patch, move phy_is_started() to before phy_disable_eee_mode(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/04d1e7a5-f4c0-42ab-8fa4-88ad26b74813@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18MAINTAINERS: trim the GVE entryJakub Kicinski
We requested in the past that GVE patches coming out of Google should be submitted only by GVE maintainers. There were too many patches posted which didn't follow the subsystem guidance. Recently Joshua was added to maintainers, but even tho he was asked to follow the netdev "FAQ" in the past [1] he does not follow the local customs. It is not reasonable for a person who hasn't read the maintainer entry for the subsystem to be a driver maintainer. We can re-add once Joshua does some on-list reviews to prove the fluency with the upstream process. Link: https://lore.kernel.org/20240610172720.073d5912@kernel.org # [1] Link: https://patch.msgid.link/20250215162646.2446559-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: phy: realtek: add defines for shadowed c45 standard registersHeiner Kallweit
Realtek shadows standard c45 registers in VEND2 device register space. Add defines for these VEND2 registers, based on the names of the standard c45 registers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/c90bdf76-f8b8-4d06-9656-7a52d5658ee6@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18netlink: Unset cb_running when terminating dump on releaseSiddh Raman Pant
When we terminated the dump, the callback isn't running, so cb_running should be set to false to be logically consistent. cb_running signifies whether a dump is ongoing. It is set to true in cb->start(), and is checked in netlink_dump() to be true initially. After the dump, it is set to false in the same function. This is just a cleanup, no path should access this field on a closed socket. Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com> Link: https://patch.msgid.link/aff028e3eb2b768b9895fa6349fa1981ae22f098.camel@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18gve: set xdp redirect target only when it is availableJoshua Washington
Before this patch the NETDEV_XDP_ACT_NDO_XMIT XDP feature flag is set by default as part of driver initialization, and is never cleared. However, this flag differs from others in that it is used as an indicator for whether the driver is ready to perform the ndo_xdp_xmit operation as part of an XDP_REDIRECT. Kernel helpers xdp_features_(set|clear)_redirect_target exist to convey this meaning. This patch ensures that the netdev is only reported as a redirect target when XDP queues exist to forward traffic. Fixes: 39a7f4aa3e4a ("gve: Add XDP REDIRECT support for GQI-QPL format") Cc: stable@vger.kernel.org Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20250214224417.1237818-1-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18Merge branch 'net-cadence-macb-modernize-statistics-reporting'Jakub Kicinski
Sean Anderson says: ==================== net: cadence: macb: Modernize statistics reporting Implement the modern interfaces for statistics reporting. ==================== Link: https://patch.msgid.link/20250214212703.2618652-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: cadence: macb: Report standard statsSean Anderson
Report standard statistics using the dedicated callbacks instead of get_ethtool_stats. OCTTX is split over two registers. Accumulating these registers separately in gem_stats just means we need to combine them again later. Instead, combine these stats before saving them, like is done for ethtool_stats. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250214212703.2618652-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: cadence: macb: Convert to get_stats64Sean Anderson
Convert the existing get_stats implementation to get_stats64. Since we now report 64-bit values, increase the counters to 64-bits as well. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250214212703.2618652-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: xilinx: axienet: Implement BQLSean Anderson
Implement byte queue limits to allow queueing disciplines to account for packets enqueued in the ring buffers but not yet transmitted. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250214211252.2615573-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18scsi: ufs: core: Fix ufshcd_is_ufs_dev_busy() and ufshcd_eh_timed_out()Bart Van Assche
ufshcd_is_ufs_dev_busy(), ufshcd_print_host_state() and ufshcd_eh_timed_out() are used in both modes (legacy mode and MCQ mode). hba->outstanding_reqs only represents the outstanding requests in legacy mode. Hence, change hba->outstanding_reqs into scsi_host_busy(hba->host) in these functions. Fixes: eacb139b77ff ("scsi: ufs: core: mcq: Enable multi-circular queue") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250214224352.3025151-1-bvanassche@acm.org Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-18Merge branch 'bpf-skip-non-exist-keys-in-generic_map_lookup_batch'Alexei Starovoitov
Yan Zhai says: ==================== bpf: skip non exist keys in generic_map_lookup_batch The generic_map_lookup_batch currently returns EINTR if it fails with ENOENT and retries several times on bpf_map_copy_value. The next batch would start from the same location, presuming it's a transient issue. This is incorrect if a map can actually have "holes", i.e. "get_next_key" can return a key that does not point to a valid value. At least the array of maps type may contain such holes legitly. Right now these holes show up, generic batch lookup cannot proceed any more. It will always fail with EINTR errors. This patch fixes this behavior by skipping the non-existing key, and does not return EINTR any more. V2->V3: deleted a unused macro V1->V2: split the fix and selftests; fixed a few selftests issues. V2: https://lore.kernel.org/bpf/cover.1738905497.git.yan@cloudflare.com/ V1: https://lore.kernel.org/bpf/Z6OYbS4WqQnmzi2z@debian.debian/ ==================== Link: https://patch.msgid.link/cover.1739171594.git.yan@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-18selftests: bpf: test batch lookup on array of maps with holesYan Zhai
Iterating through array of maps may encounter non existing keys. The batch operation should not fail on when this happens. Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/9007237b9606dc2ee44465a4447fe46e13f3bea6.1739171594.git.yan@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-18bpf: skip non exist keys in generic_map_lookup_batchYan Zhai
The generic_map_lookup_batch currently returns EINTR if it fails with ENOENT and retries several times on bpf_map_copy_value. The next batch would start from the same location, presuming it's a transient issue. This is incorrect if a map can actually have "holes", i.e. "get_next_key" can return a key that does not point to a valid value. At least the array of maps type may contain such holes legitly. Right now these holes show up, generic batch lookup cannot proceed any more. It will always fail with EINTR errors. Rather, do not retry in generic_map_lookup_batch. If it finds a non existing element, skip to the next key. This simple solution comes with a price that transient errors may not be recovered, and the iteration might cycle back to the first key under parallel deletion. For example, Hou Tao <houtao@huaweicloud.com> pointed out a following scenario: For LPM trie map: (1) ->map_get_next_key(map, prev_key, key) returns a valid key (2) bpf_map_copy_value() return -ENOMENT It means the key must be deleted concurrently. (3) goto next_key It swaps the prev_key and key (4) ->map_get_next_key(map, prev_key, key) again prev_key points to a non-existing key, for LPM trie it will treat just like prev_key=NULL case, the returned key will be duplicated. With the retry logic, the iteration can continue to the key next to the deleted one. But if we directly skip to the next key, the iteration loop would restart from the first key for the lpm_trie type. However, not all races may be recovered. For example, if current key is deleted after instead of before bpf_map_copy_value, or if the prev_key also gets deleted, then the loop will still restart from the first key for lpm_tire anyway. For generic lookup it might be better to stay simple, i.e. just skip to the next key. To guarantee that the output keys are not duplicated, it is better to implement map type specific batch operations, which can properly lock the trie and synchronize with concurrent mutators. Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op") Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/ Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/85618439eea75930630685c467ccefeac0942e2b.1739171594.git.yan@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-18tcp: adjust rcvq_space after updating scaling ratioJakub Kicinski
Since commit under Fixes we set the window clamp in accordance to newly measured rcvbuf scaling_ratio. If the scaling_ratio decreased significantly we may put ourselves in a situation where windows become smaller than rcvq_space, preventing tcp_rcv_space_adjust() from increasing rcvbuf. The significant decrease of scaling_ratio is far more likely since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"), which increased the "default" scaling ratio from ~30% to 50%. Hitting the bad condition depends a lot on TCP tuning, and drivers at play. One of Meta's workloads hits it reliably under following conditions: - default rcvbuf of 125k - sender MTU 1500, receiver MTU 5000 - driver settles on scaling_ratio of 78 for the config above. Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss (10 * 5k = 50k). Once we find out the true scaling ratio and MSS we clamp the windows to 38k. Triggering the condition also depends on the message sequence of this workload. I can't repro the problem with simple iperf or TCP_RR-style tests. Fixes: a2cbb1603943 ("tcp: Update window clamping condition") Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20250217232905.3162187-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18io_uring: fix spelling error in uapi io_uring.hJens Axboe
This is obviously not that important, but when changes are synced back from the kernel to liburing, the codespell CI ends up erroring because of this misspelling. Let's just correct it and avoid this biting us again on an import. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-18net: phy: realtek: add helper RTL822X_VND2_C22_REGHeiner Kallweit
C22 register space is mapped to 0xa400 in MMD VEND2 register space. Add a helper to access mapped C22 registers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/6344277b-c5c7-449b-ac89-d5425306ca76@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18Merge branch 'eth-mlx4-use-the-page-pool-for-rx-buffers'Jakub Kicinski
Jakub Kicinski says: ==================== eth: mlx4: use the page pool for Rx buffers Convert mlx4 to page pool. I've been sitting on these patches for over a year, and Jonathan Lemon had a similar series years before. We never deployed it or sent upstream because it didn't really show much perf win under normal load (admittedly I think the real testing was done before Ilias's work on recycling). During the v6.9 kernel rollout Meta's CDN team noticed that machines with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time spent mapping buffers in the IOMMU). The problem does not occur with modern NICs, so I dusted off this series and reportedly it still works. And it makes the problem go away, no overloads, perf back in line with older kernels. Something must have changed in IOMMU code, I guess. This series is very simple, and can very likely be optimized further. Thing is, I don't have access to any CX3 Pro NICs. They only exist in CDN locations which haven't had a HW refresh for a while. So I can say this series survives a week under traffic w/ XDP enabled, but my ability to iterate and improve is a bit limited. v2: https://lore.kernel.org/20250211192141.619024-1-kuba@kernel.org v1: https://lore.kernel.org/20250205031213.358973-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250213010635.1354034-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18eth: mlx4: use the page pool for Rx buffersJakub Kicinski
Simple conversion to page pool. Preserve the current fragmentation logic / page splitting. Each page starts with a single frag reference, and then we bump that when attaching to skbs. This can likely be optimized further. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250213010635.1354034-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18eth: mlx4: remove the local XDP fast-recycling ringJakub Kicinski
It will be replaced with page pool's built-in recycling. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250213010635.1354034-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>