summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-18ice: add E835 device IDsDawid Osuchowski
E835 is an enhanced version of the E830. It continues to use the same set of commands, registers and interfaces as other devices in the 800 Series. Following device IDs are added: - 0x1248: Intel(R) Ethernet Controller E835-CC for backplane - 0x1249: Intel(R) Ethernet Controller E835-CC for QSFP - 0x124A: Intel(R) Ethernet Controller E835-CC for SFP - 0x1261: Intel(R) Ethernet Controller E835-C for backplane - 0x1262: Intel(R) Ethernet Controller E835-C for QSFP - 0x1263: Intel(R) Ethernet Controller E835-C for SFP - 0x1265: Intel(R) Ethernet Controller E835-L for backplane - 0x1266: Intel(R) Ethernet Controller E835-L for QSFP - 0x1267: Intel(R) Ethernet Controller E835-L for SFP Reviewed-by: Konrad Knitter <konrad.knitter@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18ice: add 40G speed to Admin Command GET PORT OPTIONAleksandr Loktionov
Introduce the ICE_AQC_PORT_OPT_MAX_LANE_40G constant and update the code to process this new option in both the devlink and the Admin Queue Command GET PORT OPTION (opcode 0x06EA) message, similar to existing constants like ICE_AQC_PORT_OPT_MAX_LANE_50G, ICE_AQC_PORT_OPT_MAX_LANE_100G, and so on. This feature allows the driver to correctly report configuration options for 2x40G on E823 and other cards in the future via devlink. Example command: devlink port split pci/0000:01:00.0/0 count 2 Example dmesg: ice 0000:01:00.0: Available port split options and max port speeds (Gbps): ice 0000:01:00.0: Status Split Quad 0 Quad 1 ice 0000:01:00.0: count L0 L1 L2 L3 L4 L5 L6 L7 ice 0000:01:00.0: 2 40 - - - 40 - - - ice 0000:01:00.0: 2 50 - 50 - - - - - ice 0000:01:00.0: 4 25 25 25 25 - - - - ice 0000:01:00.0: 4 25 25 - - 25 25 - - ice 0000:01:00.0: Active 8 10 10 10 10 10 10 10 10 ice 0000:01:00.0: 1 100 - - - - - - - Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18idpf: preserve coalescing settings across resetsAhmed Zaki
The IRQ coalescing config currently reside only inside struct idpf_q_vector. However, all idpf_q_vector structs are de-allocated and re-allocated during resets. This leads to user-set coalesce configuration to be lost. Add new fields to struct idpf_vport_user_config_data to save the user settings and re-apply them after reset. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18idpf: add cross timestampingMilena Olech
Add cross timestamp support through virtchnl mailbox messages and directly, through PCIe BAR registers. Cross timestamping assumes that both system time and device clock time values are cached simultaneously, what is triggered by HW. Feature is enabled for both ARM and x86 archs. Signed-off-by: Milena Olech <milena.olech@intel.com> Reviewed-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18idpf: add flow steering supportAhmed Zaki
Use the new virtchnl2 OP codes to communicate with the Control Plane to add flow steering filters. We add the basic functionality for add/delete with TCP/UDP IPv4 only. Support for other OP codes and protocols will be added later. Standard 'ethtool -N|--config-ntuple' should be used, for example: # ethtool -N ens801f0d1 flow-type tcp4 src-ip 10.0.0.1 action 6 to route all IPv4/TCP traffic from IP 10.0.0.1 to queue 6. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18virtchnl2: add flow steering supportSudheer Mogilappagari
Add opcodes and corresponding message structure to add and delete flow steering rules. Flow steering enables configuration of rules to take an action or subset of actions based on a match criteria. Actions could be redirect to queue, redirect to queue group, drop packet or mark. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Co-developed-by: Dinesh Kumar <dinesh.kumar@intel.com> Signed-off-by: Dinesh Kumar <dinesh.kumar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18virtchnl2: rename enum virtchnl2_cap_rssAhmed Zaki
The "enum virtchnl2_cap_rss" will be used for negotiating flow steering capabilities. Instead of adding a new enum, rename virtchnl2_cap_rss to virtchnl2_flow_types. Also rename the enum's constants. Flow steering will use this enum in the next patches. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-17et131x: Add missing check after DMA mapThomas Fourier
The DMA map functions can fail and should be tested for errors. If the mapping fails, unmap and return an error. Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Acked-by: Mark Einon <mark.einon@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250716094733.28734-2-fourier.thomas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net: ag71xx: Add missing check after DMA mapThomas Fourier
The DMA map functions can fail and should be tested for errors. Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250716095733.37452-3-fourier.thomas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17selftests/drivers/net: Support ipv6 for napi_id testTianyi Cui
Add support for IPv6 environment for napi_id test. Test Plan: ./run_kselftest.sh -t drivers/net:napi_id.py TAP version 13 1..1 # timeout set to 45 # selftests: drivers/net: napi_id.py # TAP version 13 # 1..1 # ok 1 napi_id.test_napi_id # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 ok 1 selftests: drivers/net: napi_id.py Signed-off-by: Tianyi Cui <1997cui@gmail.com> Link: https://patch.msgid.link/20250717011913.1248816-1-1997cui@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17ibmvnic: Use ndo_get_stats64 to fix inaccurate SAR reportingMingming Cao
VNIC testing on multi-core Power systems showed SAR stats drift and packet rate inconsistencies under load. Implements ndo_get_stats64 to provide safe aggregation of queue-level atomic64 counters into rtnl_link_stats64 for use by tools like 'ip -s', 'ifconfig', and 'sar'. Switch to ndo_get_stats64 to align SAR reporting with the standard kernel interface for retrieving netdev stats. This removes redundant per-adapter stat updates, reduces overhead, eliminates cacheline bouncing from hot path updates, and improves the accuracy of reported packet rates. Signed-off-by: Mingming Cao <mmc@linux.ibm.com> Reviewed-by: Brian King <bjking1@linux.ibm.com> Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> ---- Changes since v3: link to v3: https://www.spinics.net/lists/netdev/msg1107999.html -- keep per queue counters as u64 (this patch) and drop off patch 1 in v3 Link: https://patch.msgid.link/20250716152115.61143-1-mmc@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge branch 'net-mlx5-misc-changes-2025-07-16'Jakub Kicinski
Tariq Toukan says: ==================== net/mlx5: misc changes 2025-07-16 This series contains misc enhancements to the mlx5 driver. v1: https://lore.kernel.org/1752471585-18053-1-git-send-email-tariqt@nvidia.com ==================== Link: https://patch.msgid.link/1752675472-201445-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net/mlx5e: Properly access RCU protected qdisc_sleeping variableLeon Romanovsky
qdisc_sleeping variable is declared as "struct Qdisc __rcu" and as such needs proper annotation while accessing it. Without rtnl_dereference(), the following error is generated by sparse: drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: warning: incorrect type in initializer (different address spaces) drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: expected struct Qdisc *qdisc drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: got struct Qdisc [noderef] __rcu *qdisc_sleeping Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1752675472-201445-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net/mlx5e: fix kdoc warning on eswitch.hMoshe Shemesh
Fix the following kdoc warning: git ls-files *.[ch] | egrep drivers/net/ethernet/mellanox/mlx5/core/ |\ xargs scripts/kernel-doc --none drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:824: warning: cannot understand function prototype: 'struct mlx5_esw_event_info ' Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1752675472-201445-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net/mlx5: HWS, Enable IPSec hardware offload in legacy modeLama Kayal
IPSec hardware offload in legacy mode should not be affected by the steering mode, hence it should also work properly with hmfs mode. Remove steering mode validation when calculating the cap for packet offload, this will also enable the missing cap MLX5_IPSEC_CAP_PRIO needed for crypto offload. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1752675472-201445-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net: pcs: xpcs: mask readl() return value to 16 bitsJack Ping CHNG
readl() returns 32-bit value but Clause 22/45 registers are 16-bit wide. Masking with 0xFFFF avoids using garbage upper bits. Signed-off-by: Jack Ping CHNG <jchng@maxlinear.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250716030349.3796806-1-jchng@maxlinear.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net/mlx5: Fix an IS_ERR() vs NULL bug in esw_qos_move_node()Dan Carpenter
The __esw_qos_alloc_node() function returns NULL on error. It doesn't return error pointers. Update the error checking to match. Fixes: 96619c485fa6 ("net/mlx5: Add support for setting tc-bw on nodes") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/0ce4ec2a-2b5d-4652-9638-e715a99902a7@sabinyo.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net: ethernet: mtk_wed: Fix NULL vs IS_ERR() bug in mtk_wed_get_memory_region()Dan Carpenter
We recently changed this from using devm_ioremap() to using devm_ioremap_resource() and unfortunately the former returns NULL while the latter returns error pointers. The check for errors needs to be updated as well. Fixes: e27dba1951ce ("net: Use of_reserved_mem_region_to_resource{_byname}() for "memory-region"") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/87c10dbd-df86-4971-b4f5-40ba02c076fb@sabinyo.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net: airoha: Fix a NULL vs IS_ERR() bug in airoha_npu_run_firmware()Dan Carpenter
The devm_ioremap_resource() function returns error pointers. It never returns NULL. Update the check to match. Fixes: e27dba1951ce ("net: Use of_reserved_mem_region_to_resource{_byname}() for "memory-region"") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/fc6d194e-6bf5-49ca-bc77-3fdfda62c434@sabinyo.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge branch 'add-shared-phy-counter-support-for-qca807x-and-qca808x'Jakub Kicinski
Luo Jie says: ==================== Add shared PHY counter support for QCA807x and QCA808x The implementation of the PHY counter is identical for both QCA808x and QCA807x series devices. This includes counters for both good and bad CRC frames in the RX and TX directions, which are active when CRC checking is enabled. This patch series introduces PHY counter functions into a shared library, enabling counter support for the QCA808x and QCA807x families through this common infrastructure. Additionally, enable CRC checking and configure automatic clearing of counters after reading within config_init() to ensure accurate counter recording. v2: https://lore.kernel.org/20250714-qcom_phy_counter-v2-0-94dde9d9769f@quicinc.com v1: https://lore.kernel.org/20250709-qcom_phy_counter-v1-0-93a54a029c46@quicinc.com ==================== Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-0-8b0e460a527b@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net: phy: qcom: qca807x: Support PHY counterLuo Jie
Within the QCA807X PHY operation's config_init() function, enable CRC checking for received and transmitted frames and configure counter to clear after being read to support counter recording. Additionally, add support for PHY counter operations. Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-3-8b0e460a527b@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net: phy: qcom: qca808x: Support PHY counterLuo Jie
Enable CRC checking for received and transmitted frames, and configure counters to clear after being read within config_init() for accurate counter recording. Additionally, add PHY counter operations and integrate shared functions. Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-2-8b0e460a527b@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net: phy: qcom: Add PHY counter supportLuo Jie
Add PHY counter functionality to the shared library. The implementation is identical for the current QCA807X and QCA808X PHYs. The PHY counter can be configured to perform CRC checking for both received and transmitted packets. Additionally, the packet counter can be set to automatically clear after it is read. The PHY counter includes 32-bit packet counters for both RX (received) and TX (transmitted) packets, as well as 16-bit counters for recording CRC error packets for both RX and TX. Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-1-8b0e460a527b@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17netdevsim: remove redundant branchDennis Chen
bool notify is referenced nowhere else in the function except to check whether or not to call rtnl_offload_xstats_notify(). Remove it and move the call to the previous branch. Signed-off-by: Dennis Chen <dechen@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250716165750.561175-1-dechen@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-07-17 We've added 13 non-merge commits during the last 20 day(s) which contain a total of 4 files changed, 712 insertions(+), 84 deletions(-). The main changes are: 1) Avoid skipping or repeating a sk when using a TCP bpf_iter, from Jordan Rife. 2) Clarify the driver requirement on using the XDP metadata, from Song Yoong Siang * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: doc: xdp: Clarify driver implementation for XDP Rx metadata selftests/bpf: Add tests for bucket resume logic in established sockets selftests/bpf: Create iter_tcp_destroy test program selftests/bpf: Create established sockets in socket iterator tests selftests/bpf: Make ehash buckets configurable in socket iterator tests selftests/bpf: Allow for iteration over multiple states selftests/bpf: Allow for iteration over multiple ports selftests/bpf: Add tests for bucket resume logic in listening sockets bpf: tcp: Avoid socket skips and repeats during iteration bpf: tcp: Use bpf_tcp_iter_batch_item for bpf_tcp_iter_state batch items bpf: tcp: Get rid of st_bucket_done bpf: tcp: Make sure iter->batch always contains a full bucket snapshot bpf: tcp: Make mem flags configurable through bpf_iter_tcp_realloc_batch ==================== Link: https://patch.msgid.link/20250717191731.4142326-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17selftests: net: prevent Python from buffering the outputJakub Kicinski
Make sure Python doesn't buffer the output, otherwise for some tests we may see false positive timeouts in NIPA. NIPA thinks that a machine has hung if the test doesn't print anything for 3min. This is also nice to heave for running the tests manually, especially in vng. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250716205712.1787325-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge branch 'neighbour-convert-rtm_getneigh-to-rcu-and-make-pneigh-rtnl-free'Jakub Kicinski
Kuniyuki Iwashima says: ==================== neighbour: Convert RTM_GETNEIGH to RCU and make pneigh RTNL-free. This is kind of v3 of the series below [0] but without NEIGHTBL patches. Patch 1 ~ 4 and 9 come from the series to convert RTM_GETNEIGH to RCU. Other patches clean up pneigh_lookup() and convert the pneigh code to RCU + private mutex so that we can easily remove RTNL from RTM_NEWNEIGH in the later series. [0]: https://lore.kernel.org/netdev/20250418012727.57033-1-kuniyu@amazon.com/ v2: https://lore.kernel.org/20250712203515.4099110-1-kuniyu@google.com v1: https://lore.kernel.org/20250711191007.3591938-1-kuniyu@google.com ==================== Link: https://patch.msgid.link/20250716221221.442239-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Update pneigh_entry in pneigh_create().Kuniyuki Iwashima
neigh_add() updates pneigh_entry() found or created by pneigh_create(). This update is serialised by RTNL, but we will remove it. Let's move the update part to pneigh_create() and make it return errno instead of a pointer of pneigh_entry. Now, the pneigh code is RTNL free. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-16-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Protect tbl->phash_buckets[] with a dedicated mutex.Kuniyuki Iwashima
tbl->phash_buckets[] is only modified in the slow path by pneigh_create() and pneigh_delete() under the table lock. Both of them are called under RTNL, so no extra lock is needed, but we will remove RTNL from the paths. pneigh_create() looks up a pneigh_entry, and this part can be lockless, but it would complicate the logic like 1. lookup 2. allocate pengih_entry for GFP_KERNEL 3. lookup again but under lock 4. if found, return it after freeing the allocated memory 5. else, return the new one Instead, let's add a per-table mutex and run lookup and allocation under it. Note that updating pneigh_entry part in neigh_add() is still protected by RTNL and will be moved to pneigh_create() in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-15-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Drop read_lock_bh(&tbl->lock) in pneigh_lookup().Kuniyuki Iwashima
Now, all callers of pneigh_lookup() are under RCU, and the read lock there is no longer needed. Let's drop the lock, inline __pneigh_lookup_1() to pneigh_lookup(), and call it from pneigh_create(). The next patch will remove tbl->lock from pneigh_create(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-14-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Remove __pneigh_lookup().Kuniyuki Iwashima
__pneigh_lookup() is the lockless version of pneigh_lookup(), but its only caller pndisc_is_router() holds the table lock and reads pneigh_netry.flags. This is because accessing pneigh_entry after pneigh_lookup() was illegal unless the caller holds RTNL or the table lock. Now, pneigh_entry is guaranteed to be alive during the RCU critical section. Let's call pneigh_lookup() and use READ_ONCE() for n->flags in pndisc_is_router() and remove __pneigh_lookup(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-13-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Use rcu_dereference() in pneigh_get_{first,next}().Kuniyuki Iwashima
Now pneigh_entry is guaranteed to be alive during the RCU critical section even without holding tbl->lock. Let's use rcu_dereference() in pneigh_get_{first,next}(). Note that neigh_seq_start() still holds tbl->lock for the normal neighbour entry. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-12-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Drop read_lock_bh(&tbl->lock) in pneigh_dump_table().Kuniyuki Iwashima
Now pneigh_entry is guaranteed to be alive during the RCU critical section even without holding tbl->lock. Let's drop read_lock_bh(&tbl->lock) and use rcu_dereference() to iterate tbl->phash_buckets[] in pneigh_dump_table() Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-11-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Convert RTM_GETNEIGH to RCU.Kuniyuki Iwashima
Only __dev_get_by_index() is the RTNL dependant in neigh_get(). Let's replace it with dev_get_by_index_rcu() and convert RTM_GETNEIGH to RCU. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-10-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Annotate access to struct pneigh_entry.{flags,protocol}.Kuniyuki Iwashima
We will convert pneigh readers to RCU, and its flags and protocol will be read locklessly. Let's annotate the access to the two fields. Note that all access to pn->permanent is under RTNL (neigh_add() and pneigh_ifdown_and_unlock()), so WRITE_ONCE() and READ_ONCE() are not needed. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-9-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Free pneigh_entry after RCU grace period.Kuniyuki Iwashima
We will convert RTM_GETNEIGH to RCU. neigh_get() looks up pneigh_entry by pneigh_lookup() and passes it to pneigh_fill_info(). Then, we must ensure that the entry is alive till pneigh_fill_info() completes, but read_lock_bh(&tbl->lock) in pneigh_lookup() does not guarantee that. Also, we will convert all readers of tbl->phash_buckets[] to RCU. Let's use call_rcu() to free pneigh_entry and update phash_buckets[] and ->next by rcu_assign_pointer(). pneigh_ifdown_and_unlock() uses list_head to avoid overwriting ->next and moving RCU iterators to another list. pndisc_destructor() (only IPv6 ndisc uses this) uses a mutex, so it is not delayed to call_rcu(), where we cannot sleep. This is fine because the mcast code works with RCU and ipv6_dev_mc_dec() frees mcast objects after RCU grace period. While at it, we change the return type of pneigh_ifdown_and_unlock() to void. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Annotate neigh_table.phash_buckets and pneigh_entry.next with __rcu.Kuniyuki Iwashima
The next patch will free pneigh_entry with call_rcu(). Then, we need to annotate neigh_table.phash_buckets[] and pneigh_entry.next with __rcu. To make the next patch cleaner, let's annotate the fields in advance. Currently, all accesses to the fields are under the neigh table lock, so rcu_dereference_protected() is used with 1 for now, but most of them (except in pneigh_delete() and pneigh_ifdown_and_unlock()) will be replaced with rcu_dereference() and rcu_dereference_check(). Note that pneigh_ifdown_and_unlock() changes pneigh_entry.next to a local list, which is illegal because the RCU iterator could be moved to another list. This part will be fixed in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Split pneigh_lookup().Kuniyuki Iwashima
pneigh_lookup() has ASSERT_RTNL() in the middle of the function, which is confusing. When called with the last argument, creat, 0, pneigh_lookup() literally looks up a proxy neighbour entry. This is the case of the reader path as the fast path and RTM_GETNEIGH. pneigh_lookup(), however, creates a pneigh_entry when called with creat 1 from RTM_NEWNEIGH and SIOCSARP, which require RTNL. Let's split pneigh_lookup() into two functions. We will convert all the reader paths to RCU, and read_lock_bh(&tbl->lock) in the new pneigh_lookup() will be dropped. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Move neigh_find_table() to neigh_get().Kuniyuki Iwashima
neigh_valid_get_req() calls neigh_find_table() to fetch neigh_tables[]. neigh_find_table() uses rcu_dereference_rtnl(), but RTNL actually does not protect it at all; neigh_table_clear() can be called without RTNL and only waits for RCU readers by synchronize_rcu(). Fortunately, there is no bug because IPv4 is built-in, IPv6 cannot be unloaded, and DECNET was removed. To fetch neigh_tables[] by rcu_dereference() later, let's move neigh_find_table() from neigh_valid_get_req() to neigh_get(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Allocate skb in neigh_get().Kuniyuki Iwashima
We will remove RTNL for neigh_get() and run it under RCU instead. neigh_get_reply() and pneigh_get_reply() allocate skb with GFP_KERNEL. Let's move the allocation before __dev_get_by_index() in neigh_get(). Now, neigh_get_reply() and pneigh_get_reply() are inlined and rtnl_unicast() is factorised. We will convert pneigh_lookup() to __pneigh_lookup() later. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Move two validations from neigh_get() to neigh_valid_get_req().Kuniyuki Iwashima
We will remove RTNL for neigh_get() and run it under RCU instead. neigh_get() returns -EINVAL in the following cases: * NDA_DST is not specified * Both ndm->ndm_ifindex and NTF_PROXY are not specified These validations do not require RCU. Let's move them to neigh_valid_get_req(). While at it, the extack string for the first case is replaced with NL_SET_ERR_ATTR_MISS(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Make neigh_valid_get_req() return ndmsg.Kuniyuki Iwashima
neigh_get() passes 4 local variable pointers to neigh_valid_get_req(). If it returns a pointer of struct ndmsg, we do not need to pass two of them. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge branch 'ethtool-rss-support-rss_set-via-netlink'Jakub Kicinski
Jakub Kicinski says: ==================== ethtool: rss: support RSS_SET via Netlink Support configuring RSS settings via Netlink. Creating and removing contexts remains for the following series. v2: https://lore.kernel.org/20250714222729.743282-1-kuba@kernel.org v1: https://lore.kernel.org/20250711015303.3688717-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250716000331.1378807-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17selftests: drv-net: rss_api: test input-xfrm and hash fieldsJakub Kicinski
Test configuring input-xfrm and hash fields with all the limitations. Tested on mlx5 (CX6): # ./ksft-net-drv/drivers/net/hw/rss_api.py TAP version 13 1..10 ok 1 rss_api.test_rxfh_nl_set_fail ok 2 rss_api.test_rxfh_nl_set_indir ok 3 rss_api.test_rxfh_nl_set_indir_ctx ok 4 rss_api.test_rxfh_indir_ntf ok 5 rss_api.test_rxfh_indir_ctx_ntf ok 6 rss_api.test_rxfh_nl_set_key ok 7 rss_api.test_rxfh_fields ok 8 rss_api.test_rxfh_fields_set ok 9 rss_api.test_rxfh_fields_set_xfrm ok 10 rss_api.test_rxfh_fields_ntf # Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0 Link: https://patch.msgid.link/20250716000331.1378807-12-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17ethtool: rss: support setting flow hashing fieldsJakub Kicinski
Add support for ETHTOOL_SRXFH (setting hashing fields) in RSS_SET. The tricky part is dealing with symmetric hashing. In netlink user can change the hashing fields and symmetric hash in one request, in IOCTL the two used to be set via different uAPI requests. Since fields and hash function config are still separate driver callbacks - changes to the two are not atomic. Keep things simple and validate the settings against both pre- and post- change ones. Meaning that we will reject the config request if user tries to correct the flow fields and set input_xfrm in one request, or disables input_xfrm and makes flow fields non-symmetric. We can adjust it later if there's a real need. Starting simple feels right, and potentially partially applying the settings isn't nice, either. Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250716000331.1378807-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17ethtool: rss: support setting input-xfrm via NetlinkJakub Kicinski
Support configuring symmetric hashing via Netlink. We have the flow field config prepared as part of SET handling, so scan it for conflicts instead of querying the driver again. Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250716000331.1378807-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17netlink: specs: define input-xfrm enum in the specJakub Kicinski
Help YNL decode the values for input-xfrm by defining the possible values in the spec. Don't define "no change" as it's an IOCTL artifact with no use in Netlink. With this change on mlx5 input-xfrm gets decoded: # ynl --family ethtool --dump rss-get [{'header': {'dev-index': 2, 'dev-name': 'eth0'}, 'hfunc': 1, 'hkey': b'V\xa8\xf9\x9 ...', 'indir': [0, 1, ... ], 'input-xfrm': {'sym-or-xor'}, <<< 'flow-hash': {'ah4': {'ip-dst', 'ip-src'}, 'ah6': {'ip-dst', 'ip-src'}, 'esp4': {'ip-dst', 'ip-src'}, 'esp6': {'ip-dst', 'ip-src'}, 'ip4': {'ip-dst', 'ip-src'}, 'ip6': {'ip-dst', 'ip-src'}, 'tcp4': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'}, 'tcp6': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'}, 'udp4': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'}, 'udp6': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'}} }] Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250716000331.1378807-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17selftests: drv-net: rss_api: test setting hashing key via NetlinkJakub Kicinski
Test setting hashing key via Netlink. # ./tools/testing/selftests/drivers/net/hw/rss_api.py TAP version 13 1..7 ok 1 rss_api.test_rxfh_nl_set_fail ok 2 rss_api.test_rxfh_nl_set_indir ok 3 rss_api.test_rxfh_nl_set_indir_ctx ok 4 rss_api.test_rxfh_indir_ntf ok 5 rss_api.test_rxfh_indir_ctx_ntf ok 6 rss_api.test_rxfh_nl_set_key ok 7 rss_api.test_rxfh_fields # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 Link: https://patch.msgid.link/20250716000331.1378807-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17ethtool: rss: support setting hkey via NetlinkJakub Kicinski
Support setting RSS hashing key via ethtool Netlink. Use the Netlink policy to make sure user doesn't pass an empty key, "resetting" the key is not a thing. Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250716000331.1378807-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17ethtool: rss: support setting hfunc via NetlinkJakub Kicinski
Support setting RSS hash function / algo via ethtool Netlink. Like IOCTL we don't validate that the function is within the range known to the kernel. The drivers do a pretty good job validating the inputs, and the IDs are technically "dynamically queried" rather than part of uAPI. Only change should be that in Netlink we don't support user explicitly passing ETH_RSS_HASH_NO_CHANGE (0), if no change is requested the attribute should be absent. The ETH_RSS_HASH_NO_CHANGE is retained in driver-facing API for consistency (not that I see a strong reason for it). Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250716000331.1378807-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>