Age | Commit message (Collapse) | Author |
|
Use error handling block instead of open-coding it in one of probe
failure cases.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://patch.msgid.link/20250212-syscon-phandle-args-can-v2-2-ac9a1253396b@linaro.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Generic probe failure message is useless: does not give information what
failed and it duplicates messages provided by the core, e.g. from memory
allocation or platform_get_irq(). It also floods dmesg in case of
deferred probe, e.g. resulting from devm_clk_get().
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://patch.msgid.link/20250212-syscon-phandle-args-can-v2-1-ac9a1253396b@linaro.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Shradha Gupta says:
====================
net: Enable Big TCP for MANA devices
Allow the max gso/gro aggregated pkt size to go up to GSO_MAX_SIZE for
MANA NIC. On Azure, this not possible without allowing the same for
netvsc NIC (as the NICs are bonded together).
Therefore, we use netif_set_tso_max_size() to set max aggregated pkt
size
to VF's tso_max_size for netvsc too, when the data path is switched over
to the VF
The first patch allows MANA to configure aggregated pkt size of up-to
GSO_MAX_SIZE
The second patch enables the same on the netvsc NIC, if the data path
for the bonded NIC is switched to the VF
---
Changes in v3
* Add ipv6_hopopt_jumbo_remove() while sending Big TCP packets
---
Changes in v2
* Instead of using 'tcp segment' throughout the patch used the words
'aggregated pkt size'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On Azure, increasing VF's gso/gro packet size to up-to GSO_MAX_SIZE
is not possible without allowing the same for netvsc NIC
(as the NICs are bonded together). For bonded NICs, the min of the max
aggregated pkt size of the members is propagated in the stack.
Therefore, we use netif_set_tso_max_size() to set max aggregated pkt size
to VF's packet size for netvsc too, when the data path is switched over
to the VF
Tested on azure env with Accelerated Networking enabled and disabled.
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow the max aggregated pkt size to go up-to GSO_MAX_SIZE for MANA NIC.
This patch only increases the max allowable gso/gro pkt size for MANA
devices and does not change the defaults.
Following are the perf benefits by increasing the pkt aggregate size from
legacy gso_max_size value(64K) to newer one(up-to 511K
IPv4 tests
for i in {1..10}; do netperf -t TCP_RR -H 10.0.0.5 -p50000 -- -r80000,80000
-O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
min p90 p99 Throughput gso_max_size
93 171 194 6594.25
97 154 180 7183.74
95 165 189 6927.86
96 165 188 6976.04
93 154 185 7338.05 64K
93 168 189 6938.03
94 169 189 6784.93
92 166 189 7117.56
94 179 191 6678.44
95 157 183 7277.81
min p90 p99 Throughput
93 134 146 8448.75
95 134 140 8396.54
94 137 148 8204.12
94 137 148 8244.41
94 128 139 8666.52 80K
94 141 153 8116.86
94 138 149 8163.92
92 135 142 8362.72
92 134 142 8497.57
93 136 148 8393.23
IPv6 Tests
for i in {1..10}; do netperf -t TCP_RR -H fd00:9013:cadd::4 -p50000 --
-r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
min p90 p99 Throughput gso_max_size
108 165 170 6673.2
101 169 189 6451.69
101 165 169 6737.65
102 167 175 6614.64
101 178 189 6247.13 64K
107 163 169 6678.63
106 176 187 6350.86
100 164 169 6617.36
102 163 170 6849.21
102 168 175 6605.7
min p90 p99 Throughput
108 155 166 7183
110 154 163 7268.87
109 152 159 7434.35
107 145 157 7569.15
107 149 164 7496.17 80K
110 154 159 7245.85
108 156 162 7266.24
109 145 158 7526.66
106 145 151 7785.75
111 148 157 7246.65
Tested on azure env with Accelerated Networking enabled and disabled.
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current implementation has a bug: If the current css doesn't
contain any pool that is a descendant of the "pool" (i.e. when
found_descendant == false), then "pool" will point to some unrelated
pool. If the current css has a child, we'll overwrite parent_pool with
this unrelated pool on the next iteration.
Since we can just check whether a pool refers to the same region to
determine whether or not it's related, all the additional pool tracking
is unnecessary, so just switch to using css_for_each_descendant_pre for
traversal.
Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup")
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250127152754.21325-1-friedrich.vock@gmx.de
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
commit 5731d41af924 ("cxl: Deprecate driver") labelled the cxl driver as
deprecated and moved the ABI documentation to the obsolete/ subdirectory,
but didn't update cxl.rst, causing a warning once ff7ff6eb4f809 ("docs:
media: Allow creating cross-references for RC ABI") was merged.
Fix the cross-reference, and also add a deprecation warning.
Fixes: 5731d41af924 ("cxl: Deprecate driver")
Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250219064807.175107-1-ajd@linux.ibm.com
|
|
Kuniyuki Iwashima says:
====================
net: Fix race of rtnl_net_lock(dev_net(dev)).
Yael Chemla reported that commit 7fb1073300a2 ("net: Hold rtnl_net_lock()
in (un)?register_netdevice_notifier_dev_net().") started to trigger KASAN's
use-after-free splat.
The problem is that dev_net(dev) fetched before rtnl_net_lock() might be
different after rtnl_net_lock().
The patch 2 fixes the issue by checking dev_net(dev) after rtnl_net_lock(),
and the patch 3 fixes the same potential issue that would emerge once RTNL
is removed.
v4: https://lore.kernel.org/20250212064206.18159-1-kuniyu@amazon.com
v3: https://lore.kernel.org/20250211051217.12613-1-kuniyu@amazon.com
v2: https://lore.kernel.org/20250207044251.65421-1-kuniyu@amazon.com
v1: https://lore.kernel.org/20250130232435.43622-1-kuniyu@amazon.com
====================
Link: https://patch.msgid.link/20250217191129.19967-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The following sequence is basically illegal when dev was fetched
without lookup because dev_net(dev) might be different after holding
rtnl_net_lock():
net = dev_net(dev);
rtnl_net_lock(net);
Let's use rtnl_net_dev_lock() in unregister_netdev().
Note that there is no real bug in unregister_netdev() for now
because RTNL protects the scope even if dev_net(dev) is changed
before/after RTNL.
Fixes: 00fb9823939e ("dev: Hold per-netns RTNL in (un)?register_netdev().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250217191129.19967-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After the cited commit, dev_net(dev) is fetched before holding RTNL
and passed to __unregister_netdevice_notifier_net().
However, dev_net(dev) might be different after holding RTNL.
In the reported case [0], while removing a VF device, its netns was
being dismantled and the VF was moved to init_net.
So the following sequence is basically illegal when dev was fetched
without lookup:
net = dev_net(dev);
rtnl_net_lock(net);
Let's use a new helper rtnl_net_dev_lock() to fix the race.
It fetches dev_net_rcu(dev), bumps its net->passive, and checks if
dev_net_rcu(dev) is changed after rtnl_net_lock().
[0]:
BUG: KASAN: slab-use-after-free in notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
Read of size 8 at addr ffff88810cefb4c8 by task test-bridge-lag/21127
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:123)
print_report (mm/kasan/report.c:379 mm/kasan/report.c:489)
kasan_report (mm/kasan/report.c:604)
notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
call_netdevice_notifiers_info (net/core/dev.c:2011)
unregister_netdevice_many_notify (net/core/dev.c:11551)
unregister_netdevice_queue (net/core/dev.c:11487)
unregister_netdev (net/core/dev.c:11635)
mlx5e_remove (drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6552 drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6579) mlx5_core
auxiliary_bus_remove (drivers/base/auxiliary.c:230)
device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
bus_remove_device (./include/linux/kobject.h:193 drivers/base/base.h:73 drivers/base/bus.c:583)
device_del (drivers/base/power/power.h:142 drivers/base/core.c:3855)
mlx5_rescan_drivers_locked (./include/linux/auxiliary_bus.h:241 drivers/net/ethernet/mellanox/mlx5/core/dev.c:333 drivers/net/ethernet/mellanox/mlx5/core/dev.c:535 drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core
mlx5_unregister_device (drivers/net/ethernet/mellanox/mlx5/core/dev.c:468) mlx5_core
mlx5_uninit_one (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 drivers/net/ethernet/mellanox/mlx5/core/main.c:1563) mlx5_core
remove_one (drivers/net/ethernet/mellanox/mlx5/core/main.c:965 drivers/net/ethernet/mellanox/mlx5/core/main.c:2019) mlx5_core
pci_device_remove (./include/linux/pm_runtime.h:129 drivers/pci/pci-driver.c:475)
device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
unbind_store (drivers/base/bus.c:245)
kernfs_fop_write_iter (fs/kernfs/file.c:338)
vfs_write (fs/read_write.c:587 (discriminator 1) fs/read_write.c:679 (discriminator 1))
ksys_write (fs/read_write.c:732)
do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7f6a4d5018b7
Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
Reported-by: Yael Chemla <ychemla@nvidia.com>
Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250217191129.19967-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
net_drop_ns() is NULL when CONFIG_NET_NS is disabled.
The next patch introduces a function that increments
and decrements net->passive.
As a prep, let's rename and export net_free() to
net_passive_dec() and add net_passive_inc().
Suggested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/netdev/CANn89i+oUCt2VGvrbrweniTendZFEh+nwS=uonc004-aPkWy-Q@mail.gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250217191129.19967-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix incorrect data offset read in the pd692x0_pi_get_pw_limit callback.
The issue was previously unnoticed as it was only used by the regulator
API and not thoroughly tested, since the PSE is mainly controlled via
ethtool.
The function became actively used by ethtool after commit 3e9dbfec4998
("net: pse-pd: Split ethtool_get_status into multiple callbacks"),
which led to the discovery of this issue.
Fix it by using the correct data offset.
Fixes: a87e699c9d33 ("net: pse-pd: pd692x0: Enhance with new current limit and voltage read callbacks")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250217134812.1925345-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Willem de Bruijn says:
====================
net: deduplicate cookie logic
Reuse standard sk, ip and ipv6 cookie init handlers where possible.
Avoid repeated open coding of the same logic.
Harmonize feature sets across protocols.
Make IPv4 and IPv6 logic more alike.
Simplify adding future new fields with a single init point.
====================
Link: https://patch.msgid.link/20250214222720.3205500-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Avoid open coding the same logic.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-8-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This initializes tclass and dontfrag before cmsg parsing, removing the
need for explicit checks against -1 in each caller.
Leave hlimit set to -1, because its full initialization
(in ip6_sk_dst_hoplimit) requires more state (dst, flowi6, ..).
This also prepares for calling sockcm_init in a follow-on patch.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-7-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do not modify socket fields if it can be avoided.
The current code predates the introduction of ip cookies in commit
aa6615814533 ("ipv4: processing ancillary IP_TOS or IP_TTL"). Now that
cookies exist and support tos, update that field directly.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-6-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Initialize the ip cookie tos field when initializing the cookie, in
ipcm_init_sk.
The existing code inverts the standard pattern for initializing cookie
fields. Default is to initialize the field from the sk, then possibly
overwrite that when parsing cmsgs (the unlikely case).
This field inverts that, setting the field to an illegal value and
after cmsg parsing checking whether the value is still illegal and
thus should be overridden.
Be careful to always apply mask INET_DSCP_MASK, as before.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-5-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Avoid open coding the same logic.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-4-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Avoid open coding initialization of sockcm fields.
Avoid reading the sk_priority field twice.
This ensures all callers, existing and future, will correctly try a
cmsg passed mark before sk_mark.
This patch extends support for cmsg mark to:
packet_spkt and packet_tpacket and net/can/raw.c.
This patch extends support for cmsg priority to:
packet_spkt and packet_tpacket.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TCP only reads the tsflags field. Don't bother initializing others.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As kernel test robot reported, the following warning occurs:
cocci warnings: (new ones prefixed by >>)
>> drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c:582:6-8: opportunity for str_enabled_disabled(on)
Replace ternary (condition ? "enabled" : "disabled") with
str_enabled_disabled() from string_choices.h to improve readability,
maintain uniform string usage, and reduce binary size through linker
deduplication.
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Link: https://patch.msgid.link/20250217155833.3105775-1-eleanor15x@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The old_flags variable is declared twice in __dev_change_flags(),
causing a shadow variable warning. This patch fixes the issue by
removing the redundant declaration, reusing the existing old_flags
variable instead.
net/core/dev.c:9225:16: warning: declaration shadows a local variable [-Wshadow]
9225 | unsigned int old_flags = dev->flags;
| ^
net/core/dev.c:9185:15: note: previous declaration is here
9185 | unsigned int old_flags = dev->flags;
| ^
1 warning generated.
Remove the redundant inner declaration and reuse the existing old_flags
variable since its value is not needed outside the if block, and it is
safe to reuse the variable. This eliminates the warning while
maintaining the same functionality.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250217-old_flags-v2-1-4cda3b43a35f@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix few spelling mistakes in net selftests
Signed-off-by: Chandra Mohan Sundar <chandru.dav@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250217141520.81033-1-chandru.dav@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add EEE support to MediaTek SoC Ethernet. The register fields are
similar to the ones in MT7531, except that the LPI threshold is in
milliseconds.
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250217094022.1065436-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sparse warning:
sparse: symbol 'ugeth_mac_ops' was not declared. Should it be
static.
Add static to fix sparse warnings and add const. phylink_create() will
accept a const struct.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/202502141128.9HfxcdIE-lkp@intel.com
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Heiner Kallweit says:
====================
net: phy: improve and simplify EEE handling in phylib
This series improves and simplifies phylib's EEE handling.
====================
Link: https://patch.msgid.link/3caa3151-13ac-44a8-9bb6-20f82563f698@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
genphy_c45_eee_is_active
After the last user has gone, we can remove the local advertisement
parameter from genphy_c45_eee_is_active.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/bd121330-9e28-4bc8-8422-794bd54d561f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that disabled EEE modes are considered when populating
advertising_eee, we can use this bitmap here instead of reading
the PHY register.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/e57ed3d4-d0bc-4f91-83f6-8f48dfb6d7d7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
writing advertisement register
advertising_eee is adjusted now whenever an EEE mode gets disabled.
Therefore we can remove the silent removal of disabled EEE modes here.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/e95b9dad-24a7-4e3e-9af9-6f0770cf1520@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A PHY driver may populate eee_disabled_modes in its probe or get_features
callback, therefore filter the EEE advertisement read from the PHY.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/493f3e2e-9cfc-445d-adbe-58d9c117a489@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a mode is to be disabled, remove it from advertising_eee.
Disabling EEE modes shall be done before calling phy_start(),
warn if that's not the case.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/92164896-38ff-4474-b98b-e83fc05b9509@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation of a follow-up patch, move phy_is_started() to before
phy_disable_eee_mode().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/04d1e7a5-f4c0-42ab-8fa4-88ad26b74813@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We requested in the past that GVE patches coming out of Google should
be submitted only by GVE maintainers. There were too many patches
posted which didn't follow the subsystem guidance.
Recently Joshua was added to maintainers, but even tho he was asked
to follow the netdev "FAQ" in the past [1] he does not follow
the local customs. It is not reasonable for a person who hasn't read
the maintainer entry for the subsystem to be a driver maintainer.
We can re-add once Joshua does some on-list reviews to prove
the fluency with the upstream process.
Link: https://lore.kernel.org/20240610172720.073d5912@kernel.org # [1]
Link: https://patch.msgid.link/20250215162646.2446559-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Realtek shadows standard c45 registers in VEND2 device register space.
Add defines for these VEND2 registers, based on the names of the
standard c45 registers.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/c90bdf76-f8b8-4d06-9656-7a52d5658ee6@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When we terminated the dump, the callback isn't running, so cb_running
should be set to false to be logically consistent.
cb_running signifies whether a dump is ongoing. It is set to true in
cb->start(), and is checked in netlink_dump() to be true initially.
After the dump, it is set to false in the same function.
This is just a cleanup, no path should access this field on a closed
socket.
Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com>
Link: https://patch.msgid.link/aff028e3eb2b768b9895fa6349fa1981ae22f098.camel@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Before this patch the NETDEV_XDP_ACT_NDO_XMIT XDP feature flag is set by
default as part of driver initialization, and is never cleared. However,
this flag differs from others in that it is used as an indicator for
whether the driver is ready to perform the ndo_xdp_xmit operation as
part of an XDP_REDIRECT. Kernel helpers
xdp_features_(set|clear)_redirect_target exist to convey this meaning.
This patch ensures that the netdev is only reported as a redirect target
when XDP queues exist to forward traffic.
Fixes: 39a7f4aa3e4a ("gve: Add XDP REDIRECT support for GQI-QPL format")
Cc: stable@vger.kernel.org
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20250214224417.1237818-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sean Anderson says:
====================
net: cadence: macb: Modernize statistics reporting
Implement the modern interfaces for statistics reporting.
====================
Link: https://patch.msgid.link/20250214212703.2618652-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Report standard statistics using the dedicated callbacks instead of
get_ethtool_stats.
OCTTX is split over two registers. Accumulating these registers
separately in gem_stats just means we need to combine them again later.
Instead, combine these stats before saving them, like is done for
ethtool_stats.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250214212703.2618652-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Convert the existing get_stats implementation to get_stats64. Since we
now report 64-bit values, increase the counters to 64-bits as well.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250214212703.2618652-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement byte queue limits to allow queueing disciplines to account for
packets enqueued in the ring buffers but not yet transmitted.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250214211252.2615573-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ufshcd_is_ufs_dev_busy(), ufshcd_print_host_state() and
ufshcd_eh_timed_out() are used in both modes (legacy mode and MCQ mode).
hba->outstanding_reqs only represents the outstanding requests in legacy
mode. Hence, change hba->outstanding_reqs into scsi_host_busy(hba->host) in
these functions.
Fixes: eacb139b77ff ("scsi: ufs: core: mcq: Enable multi-circular queue")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250214224352.3025151-1-bvanassche@acm.org
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Yan Zhai says:
====================
bpf: skip non exist keys in generic_map_lookup_batch
The generic_map_lookup_batch currently returns EINTR if it fails with
ENOENT and retries several times on bpf_map_copy_value. The next batch
would start from the same location, presuming it's a transient issue.
This is incorrect if a map can actually have "holes", i.e.
"get_next_key" can return a key that does not point to a valid value. At
least the array of maps type may contain such holes legitly. Right now
these holes show up, generic batch lookup cannot proceed any more. It
will always fail with EINTR errors.
This patch fixes this behavior by skipping the non-existing key, and
does not return EINTR any more.
V2->V3: deleted a unused macro
V1->V2: split the fix and selftests; fixed a few selftests issues.
V2: https://lore.kernel.org/bpf/cover.1738905497.git.yan@cloudflare.com/
V1: https://lore.kernel.org/bpf/Z6OYbS4WqQnmzi2z@debian.debian/
====================
Link: https://patch.msgid.link/cover.1739171594.git.yan@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Iterating through array of maps may encounter non existing keys. The
batch operation should not fail on when this happens.
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/9007237b9606dc2ee44465a4447fe46e13f3bea6.1739171594.git.yan@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The generic_map_lookup_batch currently returns EINTR if it fails with
ENOENT and retries several times on bpf_map_copy_value. The next batch
would start from the same location, presuming it's a transient issue.
This is incorrect if a map can actually have "holes", i.e.
"get_next_key" can return a key that does not point to a valid value. At
least the array of maps type may contain such holes legitly. Right now
these holes show up, generic batch lookup cannot proceed any more. It
will always fail with EINTR errors.
Rather, do not retry in generic_map_lookup_batch. If it finds a non
existing element, skip to the next key. This simple solution comes with
a price that transient errors may not be recovered, and the iteration
might cycle back to the first key under parallel deletion. For example,
Hou Tao <houtao@huaweicloud.com> pointed out a following scenario:
For LPM trie map:
(1) ->map_get_next_key(map, prev_key, key) returns a valid key
(2) bpf_map_copy_value() return -ENOMENT
It means the key must be deleted concurrently.
(3) goto next_key
It swaps the prev_key and key
(4) ->map_get_next_key(map, prev_key, key) again
prev_key points to a non-existing key, for LPM trie it will treat just
like prev_key=NULL case, the returned key will be duplicated.
With the retry logic, the iteration can continue to the key next to the
deleted one. But if we directly skip to the next key, the iteration loop
would restart from the first key for the lpm_trie type.
However, not all races may be recovered. For example, if current key is
deleted after instead of before bpf_map_copy_value, or if the prev_key
also gets deleted, then the loop will still restart from the first key
for lpm_tire anyway. For generic lookup it might be better to stay
simple, i.e. just skip to the next key. To guarantee that the output
keys are not duplicated, it is better to implement map type specific
batch operations, which can properly lock the trie and synchronize with
concurrent mutators.
Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/85618439eea75930630685c467ccefeac0942e2b.1739171594.git.yan@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since commit under Fixes we set the window clamp in accordance
to newly measured rcvbuf scaling_ratio. If the scaling_ratio
decreased significantly we may put ourselves in a situation
where windows become smaller than rcvq_space, preventing
tcp_rcv_space_adjust() from increasing rcvbuf.
The significant decrease of scaling_ratio is far more likely
since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"),
which increased the "default" scaling ratio from ~30% to 50%.
Hitting the bad condition depends a lot on TCP tuning, and
drivers at play. One of Meta's workloads hits it reliably
under following conditions:
- default rcvbuf of 125k
- sender MTU 1500, receiver MTU 5000
- driver settles on scaling_ratio of 78 for the config above.
Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss
(10 * 5k = 50k). Once we find out the true scaling ratio and
MSS we clamp the windows to 38k. Triggering the condition also
depends on the message sequence of this workload. I can't repro
the problem with simple iperf or TCP_RR-style tests.
Fixes: a2cbb1603943 ("tcp: Update window clamping condition")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250217232905.3162187-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is obviously not that important, but when changes are synced back
from the kernel to liburing, the codespell CI ends up erroring because
of this misspelling. Let's just correct it and avoid this biting us
again on an import.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
C22 register space is mapped to 0xa400 in MMD VEND2 register space.
Add a helper to access mapped C22 registers.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/6344277b-c5c7-449b-ac89-d5425306ca76@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
eth: mlx4: use the page pool for Rx buffers
Convert mlx4 to page pool. I've been sitting on these patches for
over a year, and Jonathan Lemon had a similar series years before.
We never deployed it or sent upstream because it didn't really show
much perf win under normal load (admittedly I think the real testing
was done before Ilias's work on recycling).
During the v6.9 kernel rollout Meta's CDN team noticed that machines
with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time
spent mapping buffers in the IOMMU). The problem does not occur with
modern NICs, so I dusted off this series and reportedly it still works.
And it makes the problem go away, no overloads, perf back in line with
older kernels. Something must have changed in IOMMU code, I guess.
This series is very simple, and can very likely be optimized further.
Thing is, I don't have access to any CX3 Pro NICs. They only exist
in CDN locations which haven't had a HW refresh for a while. So I can
say this series survives a week under traffic w/ XDP enabled, but
my ability to iterate and improve is a bit limited.
v2: https://lore.kernel.org/20250211192141.619024-1-kuba@kernel.org
v1: https://lore.kernel.org/20250205031213.358973-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250213010635.1354034-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Simple conversion to page pool. Preserve the current fragmentation
logic / page splitting. Each page starts with a single frag reference,
and then we bump that when attaching to skbs. This can likely be
optimized further.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It will be replaced with page pool's built-in recycling.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|