summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-21net/mlx5: HWS, handle modify header actions dependencyYevgeny Kliteynik
Having adjacent accelerated modify header actions (so-called pattern-argument actions) may result in inconsistent outcome. These inconsistencies can take the form of writes to the same field or a read coupled with a write to the same field. The solution is to detect such dependencies and insert nops between the offending actions. The existing implementation had a few issues, which pretty much required a complete rewrite of the code that handles these dependencies. In the new implementation we're doing the following: * Checking any two adjacent actions for conflicts (not just odd-even pairs). * Marking 'set' and 'add' action fields as destination, rather than source, for the purposes of checking for conflicts. * Checking all types of actions ('add', 'set', 'copy') for dependencies. * Managing offsets of the args in the buffer - copy the action args to the right place in the buffer. * Checking that after inserting nops we're still within the number of supported actions - return an error otherwise. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747766802-958178-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net/mlx5: HWS, fix typo - 'nope' to 'nop'Yevgeny Kliteynik
Fix typo - rename 'nope_locations' to 'nop_locations', which describes the locations of 'nop' actions. To shorten the lines, this renaming also required some refactoring. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747766802-958178-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net/mlx5: HWS, register reformat actions with fwVlad Dogaru
Hardware steering handles actions differently from firmware, but for termination rules that use encapsulation the firmware needs to be aware of the action. Fix this by registering reformat actions with the firmware the first time this is needed. To do this, add a third possible owner for an action, and also a lock to protect against registration of the same action from different threads. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747766802-958178-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net/mlx5: SWS, fix reformat id error handlingVlad Dogaru
The firmware reformat id is a u32 and can't safely be returned as an int. Because the functions also need a way to signal error, prefer to return the id as an output parameter and keep the return code only for success/error. While we're at it, also extract some duplicate code to fetch the reformat id from a more generic struct pkt_reformat. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1747766802-958178-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: add debug checks in ____napi_schedule() and napi_poll()Eric Dumazet
While tracking an IDPF bug, I found that idpf_vport_splitq_napi_poll() was not following NAPI rules. It can indeed return @budget after napi_complete() has been called. Add two debug conditions in networking core to hopefully catch this kind of bugs sooner. IDPF bug will be fixed in a separate patch. [ 72.441242] repoll requested for device eth1 idpf_vport_splitq_napi_poll [idpf] but napi is not scheduled. [ 72.446291] list_del corruption. next->prev should be ff31783d93b14040, but was ff31783d93b10080. (next=ff31783d93b10080) [ 72.446659] kernel BUG at lib/list_debug.c:67! [ 72.446816] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI [ 72.447031] CPU: 156 UID: 0 PID: 16258 Comm: ip Tainted: G W 6.15.0-dbg-DEV #1944 NONE [ 72.447340] Tainted: [W]=WARN [ 72.447702] RIP: 0010:__list_del_entry_valid_or_report (lib/list_debug.c:65) [ 72.450630] Call Trace: [ 72.450720] <IRQ> [ 72.450797] net_rx_action (include/linux/list.h:215 include/linux/list.h:287 net/core/dev.c:7385 net/core/dev.c:7516) [ 72.450928] ? lock_release (kernel/locking/lockdep.c:?) [ 72.451059] ? clockevents_program_event (kernel/time/clockevents.c:?) [ 72.451222] handle_softirqs (kernel/softirq.c:579) [ 72.451356] ? do_softirq (kernel/softirq.c:480) [ 72.451480] ? idpf_vc_xn_exec (drivers/net/ethernet/intel/idpf/idpf_virtchnl.c:462) idpf [ 72.451635] do_softirq (kernel/softirq.c:480) [ 72.451750] </IRQ> [ 72.451828] <TASK> [ 72.451905] __local_bh_enable_ip (kernel/softirq.c:?) [ 72.452051] idpf_vc_xn_exec (drivers/net/ethernet/intel/idpf/idpf_virtchnl.c:462) idpf [ 72.452210] idpf_send_delete_queues_msg (drivers/net/ethernet/intel/idpf/idpf_virtchnl.c:2083) idpf [ 72.452390] idpf_vport_stop (drivers/net/ethernet/intel/idpf/idpf_lib.c:837 drivers/net/ethernet/intel/idpf/idpf_lib.c:868) idpf [ 72.452541] ? idpf_vport_stop (include/linux/bottom_half.h:? include/linux/netdevice.h:4762 drivers/net/ethernet/intel/idpf/idpf_lib.c:855) idpf [ 72.452695] idpf_initiate_soft_reset (drivers/net/ethernet/intel/idpf/idpf_lib.c:?) idpf [ 72.452867] idpf_change_mtu (drivers/net/ethernet/intel/idpf/idpf_lib.c:2189) idpf [ 72.453015] netif_set_mtu_ext (net/core/dev.c:9437) [ 72.453157] ? packet_notifier (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/packet/af_packet.c:4240) [ 72.453292] netif_set_mtu (net/core/dev.c:9515) [ 72.453416] dev_set_mtu (net/core/dev_api.c:?) [ 72.453534] bond_change_mtu (drivers/net/bonding/bond_main.c:4833) [ 72.453666] netif_set_mtu_ext (net/core/dev.c:9437) [ 72.453803] do_setlink (net/core/rtnetlink.c:3116) [ 72.453925] ? rtnl_newlink (net/core/rtnetlink.c:3901) [ 72.454055] ? rtnl_newlink (net/core/rtnetlink.c:3901) [ 72.454185] ? rtnl_newlink (net/core/rtnetlink.c:3901) [ 72.454314] ? trace_contention_end (include/trace/events/lock.h:122) [ 72.454467] ? __mutex_lock (arch/x86/include/asm/preempt.h:85 kernel/locking/mutex.c:611 kernel/locking/mutex.c:746) [ 72.454597] ? cap_capable (include/trace/events/capability.h:26) [ 72.454721] ? security_capable (security/security.c:?) [ 72.454857] rtnl_newlink (net/core/rtnetlink.c:?) [ 72.454982] ? lock_is_held_type (kernel/locking/lockdep.c:5599 kernel/locking/lockdep.c:5938) [ 72.455121] ? __lock_acquire (kernel/locking/lockdep.c:?) [ 72.455256] ? __change_page_attr_set_clr (arch/x86/mm/pat/set_memory.c:685) [ 72.455438] ? __lock_acquire (kernel/locking/lockdep.c:?) [ 72.455582] ? rtnetlink_rcv_msg (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/core/rtnetlink.c:6885) [ 72.455721] ? lock_acquire (kernel/locking/lockdep.c:5866) [ 72.455848] ? rtnetlink_rcv_msg (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/core/rtnetlink.c:6885) [ 72.455987] ? lock_release (kernel/locking/lockdep.c:?) [ 72.456117] ? rcu_read_unlock (include/linux/rcupdate.h:341 include/linux/rcupdate.h:871) [ 72.456249] ? __pfx_rtnl_newlink (net/core/rtnetlink.c:3956) [ 72.456388] rtnetlink_rcv_msg (net/core/rtnetlink.c:6955) [ 72.456526] ? rtnetlink_rcv_msg (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/core/rtnetlink.c:6885) [ 72.456671] ? lock_acquire (kernel/locking/lockdep.c:5866) [ 72.456802] ? net_generic (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 include/net/netns/generic.h:45) [ 72.456929] ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6858) [ 72.457082] netlink_rcv_skb (net/netlink/af_netlink.c:2534) [ 72.457212] netlink_unicast (net/netlink/af_netlink.c:1313) [ 72.457344] netlink_sendmsg (net/netlink/af_netlink.c:1883) [ 72.457476] __sock_sendmsg (net/socket.c:712) [ 72.457602] ____sys_sendmsg (net/socket.c:?) [ 72.457735] ? _copy_from_user (arch/x86/include/asm/uaccess_64.h:126 arch/x86/include/asm/uaccess_64.h:134 arch/x86/include/asm/uaccess_64.h:141 include/linux/uaccess.h:178 lib/usercopy.c:18) [ 72.457875] ___sys_sendmsg (net/socket.c:2620) [ 72.458042] ? __call_rcu_common (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 arch/x86/include/asm/irqflags.h:159 kernel/rcu/tree.c:3107) [ 72.458185] ? mntput_no_expire (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 fs/namespace.c:1457) [ 72.458324] ? lock_acquire (kernel/locking/lockdep.c:5866) [ 72.458451] ? mntput_no_expire (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 fs/namespace.c:1457) [ 72.458588] ? lock_release (kernel/locking/lockdep.c:?) [ 72.458718] ? mntput_no_expire (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 fs/namespace.c:1457) [ 72.458856] __x64_sys_sendmsg (net/socket.c:2652) [ 72.458997] ? do_syscall_64 (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 include/linux/entry-common.h:198 arch/x86/entry/syscall_64.c:90) [ 72.459136] do_syscall_64 (arch/x86/entry/syscall_64.c:?) [ 72.459259] ? exc_page_fault (arch/x86/mm/fault.c:1542) [ 72.459387] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) [ 72.459555] RIP: 0033:0x7fd15f17cbd0 Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250520121908.1805732-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net/enic: Allow at least 8 RQs to always be usedNelson Escobar
Enic started using netif_get_num_default_rss_queues() to set the number of RQs used in commit cc94d6c4d40c ("enic: Adjust used MSI-X wq/rq/cq/interrupt resources in a more robust way") This resulted in machines with less than 16 cpus using less than 8 RQs. Allow enic to use at least 8 RQs no matter how many cpus are in the machine to not impact existing enic workloads after a kernel upgrade. Reviewed-by: John Daley <johndale@cisco.com> Reviewed-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250521-enic_min_8rq-v1-1-691bd2353273@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21hinic3: module initialization and tx/rx logicFan Gong
This is [1/3] part of hinic3 Ethernet driver initial submission. With this patch hinic3 is a valid kernel module but non-functional driver. The driver parts contained in this patch: Module initialization. PCI driver registration but with empty id_table. Auxiliary driver registration. Net device_ops registration but open/stop are empty stubs. tx/rx logic. All major data structures of the driver are fully introduced with the code that uses them but without their initialization code that requires management interface with the hw. Co-developed-by: Xin Guo <guoxin09@huawei.com> Signed-off-by: Xin Guo <guoxin09@huawei.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Co-developed-by: Gur Stavi <gur.stavi@huawei.com> Signed-off-by: Gur Stavi <gur.stavi@huawei.com> Link: https://patch.msgid.link/76a137ffdfe115c737c2c224f0c93b60ba53cc16.1747736586.git.gur.stavi@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21nfc: Correct Samsung "Electronics" spelling in copyright headersSumanth Gavini
Fix the misspelling of "Electronics" in copyright headers across: - s3fwrn5 driver - virtual_ncidev driver Signed-off-by: Sumanth Gavini <sumanth.gavini@yahoo.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250520072119.176018-1-sumanth.gavini@yahoo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21emulex/benet: correct command version selection in be_cmd_get_stats()Alok Tiwari
Logic here always sets hdr->version to 2 if it is not a BE3 or Lancer chip, even if it is BE2. Use 'else if' to prevent multiple assignments, setting version 0 for BE2, version 1 for BE3 and Lancer, and version 2 for others. Fixes potential incorrect version setting when BE2_chip and BE3_chip/lancer_chip checks could both be true. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20250519141731.691136-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21Merge branch 'net-phy-add-support-for-new-aeonsemi-phys'Jakub Kicinski
Christian Marangi says: ==================== net: phy: Add support for new Aeonsemi PHYs Add support for new Aeonsemi 10G C45 PHYs. These PHYs intergate an IPC to setup some configuration and require special handling to sync with the parity bit. The parity bit is a way the IPC use to follow correct order of command sent. Supported PHYs AS21011JB1, AS21011PB1, AS21010JB1, AS21010PB1, AS21511JB1, AS21511PB1, AS21510JB1, AS21510PB1, AS21210JB1, AS21210PB1 that all register with the PHY ID 0x7500 0x7500 before the firmware is loaded. The big special thing about this PHY is that it does provide a generic PHY ID in C45 register that change to the correct one one the firmware is loaded. In practice: - MMD 0x7 ID 0x7500 0x9410 -> FW LOAD -> ID 0x7500 0x9422 To handle this, we operate on .match_phy_device where we check the PHY ID, if the ID match the generic one, we load the firmware and we return 0 (PHY driver doesn't match). Then PHY core will try the next PHY driver in the list and this time the PHY is correctly filled in and we register for it. To help in the matching and not modify part of the PHY device struct, .match_phy_device is extended to provide also the current phy_driver is trying to match for. This add the extra benefits that some other PHY can simplify their .match_phy_device OP. ==================== Link: https://patch.msgid.link/20250517201353.5137-1-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21dt-bindings: net: Document support for Aeonsemi PHYsChristian Marangi
Add Aeonsemi PHYs and the requirement of a firmware to correctly work. Also document the max number of LEDs supported and what PHY ID expose when no firmware is loaded. Supported PHYs AS21011JB1, AS21011PB1, AS21010JB1, AS21010PB1, AS21511JB1, AS21511PB1, AS21510JB1, AS21510PB1, AS21210JB1, AS21210PB1 that all register with the PHY ID 0x7500 0x9410 on C45 registers before the firmware is loaded. Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://patch.msgid.link/20250517201353.5137-7-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: phy: Add support for Aeonsemi AS21xxx PHYsChristian Marangi
Add support for Aeonsemi AS21xxx 10G C45 PHYs. These PHYs integrate an IPC to setup some configuration and require special handling to sync with the parity bit. The parity bit is a way the IPC use to follow correct order of command sent. Supported PHYs AS21011JB1, AS21011PB1, AS21010JB1, AS21010PB1, AS21511JB1, AS21511PB1, AS21510JB1, AS21510PB1, AS21210JB1, AS21210PB1 that all register with the PHY ID 0x7500 0x7510 before the firmware is loaded. They all support up to 5 LEDs with various HW mode supported. While implementing it was found some strange coincidence with using the same logic for implementing C22 in MMD regs in Broadcom PHYs. For reference here the AS21xxx PHY name logic: AS21x1xxB1 ^ ^^ | |J: Supports SyncE/PTP | |P: No SyncE/PTP support | 1: Supports 2nd Serdes | 2: Not 2nd Serdes support 0: 10G, 5G, 2.5G 5: 5G, 2.5G 2: 2.5G Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://patch.msgid.link/20250517201353.5137-6-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: phy: introduce genphy_match_phy_device()Christian Marangi
Introduce new API, genphy_match_phy_device(), to provide a way to check to match a PHY driver for a PHY device based on the info stored in the PHY device struct. The function generalize the logic used in phy_bus_match() to check the PHY ID whether if C45 or C22 ID should be used for matching. This is useful for custom .match_phy_device function that wants to use the generic logic under some condition. (example a PHY is already setup and provide the correct PHY ID) Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://patch.msgid.link/20250517201353.5137-5-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: phy: nxp-c45-tja11xx: simplify .match_phy_device OPChristian Marangi
Simplify .match_phy_device OP by using a generic function and using the new phy_id PHY driver info instead of hardcoding the matching PHY ID with new variant for macsec and no_macsec PHYs. Also make use of PHY_ID_MATCH_MODEL macro and drop PHY_ID_MASK define to introduce phy_id and phy_id_mask again in phy_driver struct. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://patch.msgid.link/20250517201353.5137-4-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: phy: bcm87xx: simplify .match_phy_device OPChristian Marangi
Simplify .match_phy_device OP by using a generic function and using the new phy_id PHY driver info instead of hardcoding the matching PHY ID. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://patch.msgid.link/20250517201353.5137-3-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: phy: pass PHY driver to .match_phy_device OPChristian Marangi
Pass PHY driver pointer to .match_phy_device OP in addition to phydev. Having access to the PHY driver struct might be useful to check the PHY ID of the driver is being matched for in case the PHY ID scanned in the phydev is not consistent. A scenario for this is a PHY that change PHY ID after a firmware is loaded, in such case, the PHY ID stored in PHY device struct is not valid anymore and PHY will manually scan the ID in the match_phy_device function. Having the PHY driver info is also useful for those PHY driver that implement multiple simple .match_phy_device OP to match specific MMD PHY ID. With this extra info if the parsing logic is the same, the matching function can be generalized by using the phy_id in the PHY driver instead of hardcoding. Rust wrapper callback is updated to align to the new match_phy_device arguments. Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Benno Lossin <lossin@kernel.org> # for Rust Reviewed-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Link: https://patch.msgid.link/20250517201353.5137-2-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: libwx: Fix log levelJiawen Wu
There is a log should be printed as info level, not error level. Fixes: 9bfd65980f8d ("net: libwx: Add sriov api for wangxun nics") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/67409DB57B87E2F0+20250519063357.21164-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21rtase: Use min() instead of min_t()Justin Lai
Use min() instead of min_t() to avoid the possibility of casting to the wrong type. Signed-off-by: Justin Lai <justinlai0215@realtek.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250520042031.9297-1-justinlai0215@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21Merge branch 'net-faster-and-simpler-crc32c-computation'Jakub Kicinski
Eric Biggers says: ==================== net: faster and simpler CRC32C computation Update networking code that computes the CRC32C of packets to just call crc32c() without unnecessary abstraction layers. The result is faster and simpler code. Patches 1-7 add skb_crc32c() and remove the overly-abstracted and inefficient __skb_checksum(). Patches 8-10 replace skb_copy_and_hash_datagram_iter() with skb_copy_and_crc32c_datagram_iter(), eliminating the unnecessary use of the crypto layer. This unblocks the conversion of nvme-tcp to call crc32c() directly instead of using the crypto layer, which patch 9 does. v1: https://lore.kernel.org/20250511004110.145171-1-ebiggers@kernel.org ==================== Link: https://patch.msgid.link/20250519175012.36581-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: remove skb_copy_and_hash_datagram_iter()Eric Biggers
Now that skb_copy_and_hash_datagram_iter() is no longer used, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-11-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()Eric Biggers
Now that the crc32c() library function directly takes advantage of architecture-specific optimizations and there also now exists a function skb_copy_and_crc32c_datagram_iter(), it is unnecessary to go through the crypto_ahash API. Just use those functions. This is much simpler, and it also improves performance due to eliminating the crypto API overhead. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-10-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: add skb_copy_and_crc32c_datagram_iter()Eric Biggers
Since skb_copy_and_hash_datagram_iter() is used only with CRC32C, the crypto_ahash abstraction provides no value. Add skb_copy_and_crc32c_datagram_iter() which just calls crc32c() directly. This is faster and simpler. It also doesn't have the weird dependency issue where skb_copy_and_hash_datagram_iter() depends on CONFIG_CRYPTO_HASH=y without that being expressed explicitly in the kconfig (presumably because it was too heavyweight for NET to select). The new function is conditional on the hidden boolean symbol NET_CRC32C, which selects CRC32. So it gets compiled only when something that actually needs CRC32C packet checksums is enabled, it has no implicit dependency, and it doesn't depend on the heavyweight crypto layer. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-9-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21lib/crc32: remove unused support for CRC32C combinationEric Biggers
crc32c_combine() and crc32c_shift() are no longer used (except by the KUnit test that tests them), and their current implementation is very slow. Remove them. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-8-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: fold __skb_checksum() into skb_checksum()Eric Biggers
Now that the only remaining caller of __skb_checksum() is skb_checksum(), fold __skb_checksum() into skb_checksum(). This makes struct skb_checksum_ops unnecessary, so remove that too and simply do the "regular" net checksum. It also makes the wrapper functions csum_partial_ext() and csum_block_add_ext() unnecessary, so remove those too and just use the underlying functions. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-7-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21sctp: use skb_crc32c() instead of __skb_checksum()Eric Biggers
Make sctp_compute_cksum() just use the new function skb_crc32c(), instead of calling __skb_checksum() with a skb_checksum_ops struct that does CRC32C. This is faster and simpler. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-6-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21RDMA/siw: use skb_crc32c() instead of __skb_checksum()Eric Biggers
Instead of calling __skb_checksum() with a skb_checksum_ops struct that does CRC32C, just call the new function skb_crc32c(). This is faster and simpler. Acked-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-5-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: use skb_crc32c() in skb_crc32c_csum_help()Eric Biggers
Instead of calling __skb_checksum() with a skb_checksum_ops struct that does CRC32C, just call the new function skb_crc32c(). This is faster and simpler. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-4-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: add skb_crc32c()Eric Biggers
Add skb_crc32c(), which calculates the CRC32C of a sk_buff. It will replace __skb_checksum(), which unnecessarily supports arbitrary checksums. Compared to __skb_checksum(), skb_crc32c(): - Uses the correct type for CRC32C values (u32, not __wsum). - Does not require the caller to provide a skb_checksum_ops struct. - Is faster because it does not use indirect calls and does not use the very slow crc32c_combine(). According to commit 2817a336d4d5 ("net: skb_checksum: allow custom update/combine for walking skb") which added __skb_checksum(), the original motivation for the abstraction layer was to avoid code duplication for CRC32C and other checksums in the future. However: - No additional checksums showed up after CRC32C. __skb_checksum() is only used with the "regular" net checksum and CRC32C. - Indirect calls are expensive. Commit 2544af0344ba ("net: avoid indirect calls in L4 checksum calculation") worked around this using the INDIRECT_CALL_1 macro. But that only avoided the indirect call for the net checksum, and at the cost of an extra branch. - The checksums use different types (__wsum and u32), causing casts to be needed. - It made the checksums of fragments be combined (rather than chained) for both checksums, despite this being highly counterproductive for CRC32C due to how slow crc32c_combine() is. This can clearly be seen in commit 4c2f24549644 ("sctp: linearize early if it's not GSO") which tried to work around this performance bug. With a dedicated function for each checksum, we can instead just use the proper strategy for each checksum. As shown by the following tables, the new function skb_crc32c() is faster than __skb_checksum(), with the improvement varying greatly from 5% to 2500% depending on the case. The largest improvements come from fragmented packets, mainly due to eliminating the inefficient crc32c_combine(). But linear packets are improved too, especially shorter ones, mainly due to eliminating indirect calls. These benchmarks were done on AMD Zen 5. On that CPU, Linux uses IBRS instead of retpoline; an even greater improvement might be seen with retpoline: Linear sk_buffs Length in bytes __skb_checksum cycles skb_crc32c cycles =============== ===================== ================= 64 43 18 256 94 77 1420 204 161 16384 1735 1642 Nonlinear sk_buffs (even split between head and one fragment) Length in bytes __skb_checksum cycles skb_crc32c cycles =============== ===================== ================= 64 579 22 256 829 77 1420 1506 194 16384 4365 1682 Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-3-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: introduce CONFIG_NET_CRC32CEric Biggers
Add a hidden kconfig symbol NET_CRC32C that will group together the functions that calculate CRC32C checksums of packets, so that these don't have to be built into NET-enabled kernels that don't need them. Make skb_crc32c_csum_help() (which is called only when IP_SCTP is enabled) conditional on this symbol, and make IP_SCTP select it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-2-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21Merge branch 'tools-ynl-gen-add-support-for-inherited-selector-and-therefore-tc'Jakub Kicinski
Jakub Kicinski says: ==================== tools: ynl-gen: add support for "inherited" selector and therefore TC Add C codegen support for constructs needed by TC, namely passing sub-message selector from a lower nest, and sub-messages with fixed headers. ==================== Link: https://patch.msgid.link/20250520161916.413298-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21tools: ynl: add a sample for TCJakub Kicinski
Add a very simple TC dump sample with decoding of fq_codel attrs: # ./tools/net/ynl/samples/tc dummy0: fq_codel limit: 10240p target: 5ms new_flow_cnt: 0 proving that selector passing (for stats) works. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250520161916.413298-13-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21netlink: specs: tc: add qdisc dump to TC specJakub Kicinski
Hook TC qdisc dump in the TC qdisc get, it only supported doit until now and dumping will be used by the sample code. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250520161916.413298-12-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21tools: ynl: enable codegen for TCJakub Kicinski
We are ready to support most of TC. Enable C code gen. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250520161916.413298-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21tools: ynl-gen: support weird sub-message formatsJakub Kicinski
TC uses all possible sub-message formats: - nested attrs - fixed headers + nested attrs - fixed headers - empty Nested attrs are already supported for rt-link. Add support for remaining 3. The empty and fixed headers ones are fairly trivial, we can fake a Binary or Flags type instead of a Nest. For fixed headers + nest we need to teach nest parsing and nest put to handle fixed headers. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250520161916.413298-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21tools: ynl-gen: support local attrs in _multi_parseJakub Kicinski
The _multi_parse() helper calls the _attr_get() method of each attr, but it only respects what code the helper wants to emit, not what local variables it needs. Local variables will soon be needed, support them. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250520161916.413298-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21tools: ynl-gen: move fixed header info from RenderInfo to StructJakub Kicinski
RenderInfo describes a request-response exchange. Struct describes a parsed attribute set. For ease of parsing sub-messages with fixed headers move fixed header info from RenderInfo to Struct. No functional changes. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250520161916.413298-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21tools: ynl-gen: support passing selector to a nestJakub Kicinski
In rtnetlink all submessages had the selector at the same level of nesting as the submessage. We could refer to the relevant attribute from the current struct. In TC, stats are one level of nesting deeper than "kind". Teach the code-gen about structs which need to be passed a selector by the caller for parsing. Because structs are "topologically sorted" one pass of propagating the selectors down is enough. For generating netlink message we depend on the presence bits so no selector passing needed there. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250520161916.413298-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21netlink: specs: tc: drop the family name prefix from attrsJakub Kicinski
All attribute sets and messages are prefixed with tc-. The C codegen also adds the family name to all structs. We end up with names like struct tc_tc_act_attrs. Remove the tc- prefixes to shorten the names. This should not impact Python as the attr set names are never exposed to user, they are only used to refer to things internally, in the encoder / decoder. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250520161916.413298-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21netlink: specs: tc: add C naming infoJakub Kicinski
Add naming info needed by C code gen. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250520161916.413298-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21netlink: specs: tc: use tc-gact instead of tc-gen as struct nameJakub Kicinski
There is a define in the uAPI header called tc_gen which expands to the "generic" TC action fields. This helps other actions include the base fields without having to deal with nested structs. A couple of actions (sample, gact) do not define extra fields, so the spec used a common tc-gen struct for both of them. Unfortunately this struct does not exist in C. Let's use gact's (generic act's) struct for basic actions. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250520161916.413298-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21netlink: specs: tc: remove duplicate nestsJakub Kicinski
tc-act-stats-attrs and tca-stats-attrs are almost identical. The only difference is that the latter has sub-message decoding for app, rather than declaring it as a binary attr. tc-act-police-attrs and tc-police-attrs are identical but for the TODO annotations. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250520161916.413298-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21tools: ynl-gen: add makefile deps for neighJakub Kicinski
Kory is reporting build issues after recent additions to YNL if the system headers are old. Link: https://lore.kernel.org/20250519164949.597d6e92@kmaincent-XPS-13-7390 Reported-by: Kory Maincent <kory.maincent@bootlin.com> Fixes: 0939a418b3b0 ("tools: ynl: submsg: reverse parse / error reporting") Tested-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250520161916.413298-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20Merge branch 'net-airoha-add-per-flow-stats-support-to-hw-flowtable-offloading'Jakub Kicinski
Lorenzo Bianconi says: ==================== net: airoha: Add per-flow stats support to hw flowtable offloading Introduce per-flow stats accounting to the flowtable hw offload in the airoha_eth driver. Flow stats are split in the PPE and NPU modules: - PPE: accounts for high 32bit of per-flow stats - NPU: accounts for low 32bit of per-flow stats v1: https://lore.kernel.org/20250514-airoha-en7581-flowstats-v1-0-c00ede12a2ca@kernel.org ==================== Link: https://patch.msgid.link/20250516-airoha-en7581-flowstats-v2-0-06d5fbf28984@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20net: airoha: ppe: Disable packet keepaliveLorenzo Bianconi
Since netfilter flowtable entries are now refreshed by flow-stats polling, we can disable hw packet keepalive used to periodically send packets belonging to offloaded flows to the kernel in order to refresh flowtable entries. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250516-airoha-en7581-flowstats-v2-3-06d5fbf28984@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20net: airoha: Add FLOW_CLS_STATS callback supportLorenzo Bianconi
Introduce per-flow stats accounting to the flowtable hw offload in the airoha_eth driver. Flow stats are split in the PPE and NPU modules: - PPE: accounts for high 32bit of per-flow stats - NPU: accounts for low 32bit of per-flow stats FLOW_CLS_STATS can be enabled or disabled at compile time. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250516-airoha-en7581-flowstats-v2-2-06d5fbf28984@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20net: airoha: npu: Move memory allocation in airoha_npu_send_msg() callerLorenzo Bianconi
Move ppe_mbox_data struct memory allocation from airoha_npu_send_msg routine to the caller one. This is a preliminary patch to enable wlan NPU offloading and flow counter stats support. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250516-airoha-en7581-flowstats-v2-1-06d5fbf28984@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20Merge branch 'ipv6-follow-up-for-rtnl-free-rtm_newroute-series'Jakub Kicinski
Kuniyuki Iwashima says: ==================== ipv6: Follow up for RTNL-free RTM_NEWROUTE series. Patch 1 removes rcu_read_lock() in fib6_get_table(). Patch 2 removes rtnl_is_held arg for lwtunnel_valid_encap_type(), which was short-term fix and is no longer used. Patch 3 fixes RCU vs GFP_KERNEL report by syzkaller. Patch 4~7 reverts GFP_ATOMIC uses to GFP_KERNEL. v1: https://lore.kernel.org/20250514201943.74456-1-kuniyu@amazon.com ==================== Link: https://patch.msgid.link/20250516022759.44392-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20ipv6: Revert two per-cpu var allocation for RTM_NEWROUTE.Kuniyuki Iwashima
These two commits preallocated two per-cpu variables in ip6_route_info_create() as fib_nh_common_init() and fib6_nh_init() were expected to be called under RCU. * commit d27b9c40dbd6 ("ipv6: Preallocate nhc_pcpu_rth_output in ip6_route_info_create().") * commit 5720a328c3e9 ("ipv6: Preallocate rt->fib6_nh->rt6i_pcpu in ip6_route_info_create().") Now these functions can be called without RCU and can use GFP_KERNEL. Let's revert the commits. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250516022759.44392-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20ipv6: Pass gfp_flags down to ip6_route_info_create_nh().Kuniyuki Iwashima
Since commit c4837b9853e5 ("ipv6: Split ip6_route_info_create()."), ip6_route_info_create_nh() uses GFP_ATOMIC as it was expected to be called under RCU. Now, we can call it without RCU and use GFP_KERNEL. Let's pass gfp_flags to ip6_route_info_create_nh(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250516022759.44392-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20Revert "ipv6: Factorise ip6_route_multipath_add()."Kuniyuki Iwashima
Commit 71c0efb6d12f ("ipv6: Factorise ip6_route_multipath_add().") split a loop in ip6_route_multipath_add() so that we can put rcu_read_lock() between ip6_route_info_create() and ip6_route_info_create_nh(). We no longer need to do so as ip6_route_info_create_nh() does not require RCU now. Let's revert the commit to simplify the code. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250516022759.44392-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>