summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2025-02-21net: Use link/peer netns in newlink() of rtnl_link_opsXiao Liang
Add two helper functions - rtnl_newlink_link_net() and rtnl_newlink_peer_net() for netns fallback logic. Peer netns falls back to link netns, and link netns falls back to source netns. Convert the use of params->net in netdevice drivers to one of the helper functions for clarity. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-4-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21rtnetlink: Pack newlink() params into structXiao Liang
There are 4 net namespaces involved when creating links: - source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend), - peer netns - netns of peer device. Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on netlink attributes in the request. +------------+-------------------+---------+---------+ | peer netns | IFLA_LINK_NETNSID | src_net | dev_net | +------------+-------------------+---------+---------+ | | absent | source | target | | absent +-------------------+---------+---------+ | | present | link | link | +------------+-------------------+---------+---------+ | | absent | peer | target | | present +-------------------+---------+---------+ | | present | peer | link | +------------+-------------------+---------+---------+ When IFLA_LINK_NETNSID is present, the device is created in link netns first and then moved to target netns. This has some side effects, including extra ifindex allocation, ifname validation and link events. These could be avoided if we create it in target netns from the beginning. On the other hand, the meaning of src_net parameter is ambiguous. It varies depending on how parameters are passed. It is the effective link (or peer netns) by design, but some drivers ignore it and use dev_net instead. To provide more netns context for drivers, this patch packs existing newlink() parameters, along with the source netns, link netns and peer netns, into a struct. The old "src_net" is renamed to "net" to avoid confusion with real source netns, and will be deprecated later. The use of src_net are converted to params->net trivially. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21xfrm: check for PMTU in tunnel mode for packet offloadLeon Romanovsky
In tunnel mode, for the packet offload, there were no PMTU signaling to the upper level about need to fragment the packet. As a solution, call to already existing xfrm[4|6]_tunnel_check_size() to perform that. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-21xfrm: simplify SA initialization routineLeon Romanovsky
SA replay mode is initialized differently for user-space and kernel-space users, but the call to xfrm_init_replay() existed in common path with boolean protection. That caused to situation where we have two different function orders. So let's rewrite the SA initialization flow to have same order for both in-kernel and user-space callers. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-21xfrm: delay initialization of offload path till its actually requestedLeon Romanovsky
XFRM offload path is probed even if offload isn't needed at all. Let's make sure that x->type_offload pointer stays NULL for such path to reduce ambiguity. Fixes: 9d389d7f84bb ("xfrm: Add a xfrm type offload.") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-20Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull BPF fixes from Daniel Borkmann: - Fix a soft-lockup in BPF arena_map_free on 64k page size kernels (Alan Maguire) - Fix a missing allocation failure check in BPF verifier's acquire_lock_state (Kumar Kartikeya Dwivedi) - Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb to the raw_tp_null_args set (Kuniyuki Iwashima) - Fix a deadlock when freeing BPF cgroup storage (Abel Wu) - Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex (Andrii Nakryiko) - Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is accessing skb data not containing an Ethernet header (Shigeru Yoshida) - Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai) - Several BPF sockmap fixes to address incorrect TCP copied_seq calculations, which prevented correct data reads from recv(2) in user space (Jiayuan Chen) - Two fixes for BPF map lookup nullness elision (Daniel Xu) - Fix a NULL-pointer dereference from vmlinux BTF lookup in bpf_sk_storage_tracing_allowed (Jared Kangas) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests: bpf: test batch lookup on array of maps with holes bpf: skip non exist keys in generic_map_lookup_batch bpf: Handle allocation failure in acquire_lock_state bpf: verifier: Disambiguate get_constant_map_key() errors bpf: selftests: Test constant key extraction on irrelevant maps bpf: verifier: Do not extract constant map keys for irrelevant maps bpf: Fix softlockup in arena_map_free on 64k page kernel net: Add rx_skb of kfree_skb to raw_tp_null_args[]. bpf: Fix deadlock when freeing cgroup storage selftests/bpf: Add strparser test for bpf selftests/bpf: Fix invalid flag of recv() bpf: Disable non stream socket for strparser bpf: Fix wrong copied_seq calculation strparser: Add read_sock callback bpf: avoid holding freeze_mutex during mmap operation bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic selftests/bpf: Adjust data size to have ETH_HLEN bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type() bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
2025-02-20xsk: Add launch time hardware offload support to XDP Tx metadataSong Yoong Siang
Extend the XDP Tx metadata framework so that user can requests launch time hardware offload, where the Ethernet device will schedule the packet for transmission at a pre-determined time called launch time. The value of launch time is communicated from user space to Ethernet driver via launch_time field of struct xsk_tx_metadata. Suggested-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250216093430.957880-2-yoong.siang.song@intel.com
2025-02-20bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callbackJason Xing
Support the ACK case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_ACK_CB. This callback will occur at the same timestamping point as the user space's SCM_TSTAMP_ACK. The BPF program can use it to get the same SCM_TSTAMP_ACK timestamp without modifying the user-space application. This patch extends txstamp_ack to two bits: 1 stands for SO_TIMESTAMPING mode, 2 bpf extension. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-10-kerneljasonxing@gmail.com
2025-02-20bpf: Prevent unsafe access to the sock fields in the BPF timestamping callbackJason Xing
The subsequent patch will implement BPF TX timestamping. It will call the sockops BPF program without holding the sock lock. This breaks the current assumption that all sock ops programs will hold the sock lock. The sock's fields of the uapi's bpf_sock_ops requires this assumption. To address this, a new "u8 is_locked_tcp_sock;" field is added. This patch sets it in the current sock_ops callbacks. The "is_fullsock" test is then replaced by the "is_locked_tcp_sock" test during sock_ops_convert_ctx_access(). The new TX timestamping callbacks added in the subsequent patch will not have this set. This will prevent unsafe access from the new timestamping callbacks. Potentially, we could allow read-only access. However, this would require identifying which callback is read-safe-only and also requires additional BPF instruction rewrites in the covert_ctx. Since the BPF program can always read everything from a socket (e.g., by using bpf_core_cast), this patch keeps it simple and disables all read and write access to any socket fields through the bpf_sock_ops UAPI from the new TX timestamping callback. Moreover, note that some of the fields in bpf_sock_ops are specific to tcp_sock, and sock_ops currently only supports tcp_sock. In the future, UDP timestamping will be added, which will also break this assumption. The same idea used in this patch will be reused. Considering that the current sock_ops only supports tcp_sock, the variable is named is_locked_"tcp"_sock. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250220072940.99994-4-kerneljasonxing@gmail.com
2025-02-20bpf: Prepare the sock_ops ctx and call bpf prog for TX timestampingJason Xing
This patch introduces a new bpf_skops_tx_timestamping() function that prepares the "struct bpf_sock_ops" ctx and then executes the sockops BPF program. The subsequent patch will utilize bpf_skops_tx_timestamping() at the existing TX timestamping kernel callbacks (__sk_tstamp_tx specifically) to call the sockops BPF program. Later, four callback points to report information to user space based on this patch will be introduced. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250220072940.99994-3-kerneljasonxing@gmail.com
2025-02-20bpf: Add networking timestamping support to bpf_get/setsockopt()Jason Xing
The new SK_BPF_CB_FLAGS and new SK_BPF_CB_TX_TIMESTAMPING are added to bpf_get/setsockopt. The later patches will implement the BPF networking timestamping. The BPF program will use bpf_setsockopt(SK_BPF_CB_FLAGS, SK_BPF_CB_TX_TIMESTAMPING) to enable the BPF networking timestamping on a socket. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-2-kerneljasonxing@gmail.com
2025-02-20net: Add options as a flexible array to struct ip_tunnel_infoGal Pressman
Remove the hidden assumption that options are allocated at the end of the struct, and teach the compiler about them using a flexible array. With this, we can revert the unsafe_memcpy() call we have in tun_dst_unclone() [1], and resolve the false field-spanning write warning caused by the memcpy() in ip_tunnel_info_opts_set(). The layout of struct ip_tunnel_info remains the same with this patch. Before this patch, there was an implicit padding at the end of the struct, options would be written at 'info + 1' which is after the padding. This will remain the same as this patch explicitly aligns 'options'. The alignment is needed as the options are later casted to different structs, and might result in unaligned memory access. Pahole output before this patch: struct ip_tunnel_info { struct ip_tunnel_key key; /* 0 64 */ /* XXX last struct has 1 byte of padding */ /* --- cacheline 1 boundary (64 bytes) --- */ struct ip_tunnel_encap encap; /* 64 8 */ struct dst_cache dst_cache; /* 72 16 */ u8 options_len; /* 88 1 */ u8 mode; /* 89 1 */ /* size: 96, cachelines: 2, members: 5 */ /* padding: 6 */ /* paddings: 1, sum paddings: 1 */ /* last cacheline: 32 bytes */ }; Pahole output after this patch: struct ip_tunnel_info { struct ip_tunnel_key key; /* 0 64 */ /* XXX last struct has 1 byte of padding */ /* --- cacheline 1 boundary (64 bytes) --- */ struct ip_tunnel_encap encap; /* 64 8 */ struct dst_cache dst_cache; /* 72 16 */ u8 options_len; /* 88 1 */ u8 mode; /* 89 1 */ /* XXX 6 bytes hole, try to pack */ u8 options[] __attribute__((__aligned__(16))); /* 96 0 */ /* size: 96, cachelines: 2, members: 6 */ /* sum members: 90, holes: 1, sum holes: 6 */ /* paddings: 1, sum paddings: 1 */ /* forced alignments: 1, forced holes: 1, sum forced holes: 6 */ /* last cacheline: 32 bytes */ } __attribute__((__aligned__(16))); [1] Commit 13cfd6a6d7ac ("net: Silence false field-spanning write warning in metadata_dst memcpy") Link: https://lore.kernel.org/all/53D1D353-B8F6-4ADC-8F29-8C48A7C9C6F1@kernel.org/ Suggested-by: Kees Cook <kees@kernel.org> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250219143256.370277-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20ip_tunnel: Use ip_tunnel_info() helper instead of 'info + 1'Gal Pressman
Tunnel options should not be accessed directly, use the ip_tunnel_info() accessor instead. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250219143256.370277-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc4). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20net: allow small head cache usage with large MAX_SKB_FRAGS valuesPaolo Abeni
Sabrina reported the following splat: WARNING: CPU: 0 PID: 1 at net/core/dev.c:6935 netif_napi_add_weight_locked+0x8f2/0xba0 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-net-00092-g011b03359038 #996 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:netif_napi_add_weight_locked+0x8f2/0xba0 Code: e8 c3 e6 6a fe 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc c7 44 24 10 ff ff ff ff e9 8f fb ff ff e8 9e e6 6a fe <0f> 0b e9 d3 fe ff ff e8 92 e6 6a fe 48 8b 04 24 be ff ff ff ff 48 RSP: 0000:ffffc9000001fc60 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88806ce48128 RCX: 1ffff11001664b9e RDX: ffff888008f00040 RSI: ffffffff8317ca42 RDI: ffff88800b325cb6 RBP: ffff88800b325c40 R08: 0000000000000001 R09: ffffed100167502c R10: ffff88800b3a8163 R11: 0000000000000000 R12: ffff88800ac1c168 R13: ffff88800ac1c168 R14: ffff88800ac1c168 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff888008201000 CR3: 0000000004c94001 CR4: 0000000000370ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> gro_cells_init+0x1ba/0x270 xfrm_input_init+0x4b/0x2a0 xfrm_init+0x38/0x50 ip_rt_init+0x2d7/0x350 ip_init+0xf/0x20 inet_init+0x406/0x590 do_one_initcall+0x9d/0x2e0 do_initcalls+0x23b/0x280 kernel_init_freeable+0x445/0x490 kernel_init+0x20/0x1d0 ret_from_fork+0x46/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> irq event stamp: 584330 hardirqs last enabled at (584338): [<ffffffff8168bf87>] __up_console_sem+0x77/0xb0 hardirqs last disabled at (584345): [<ffffffff8168bf6c>] __up_console_sem+0x5c/0xb0 softirqs last enabled at (583242): [<ffffffff833ee96d>] netlink_insert+0x14d/0x470 softirqs last disabled at (583754): [<ffffffff8317c8cd>] netif_napi_add_weight_locked+0x77d/0xba0 on kernel built with MAX_SKB_FRAGS=45, where SKB_WITH_OVERHEAD(1024) is smaller than GRO_MAX_HEAD. Such built additionally contains the revert of the single page frag cache so that napi_get_frags() ends up using the page frag allocator, triggering the splat. Note that the underlying issue is independent from the mentioned revert; address it ensuring that the small head cache will fit either TCP and GRO allocation and updating napi_alloc_skb() and __netdev_alloc_skb() to select kmalloc() usage for any allocation fitting such cache. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20tcp: drop secpath at the same time as we currently drop dstSabrina Dubroca
Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while running tests that boil down to: - create a pair of netns - run a basic TCP test over ipcomp6 - delete the pair of netns The xfrm_state found on spi_byaddr was not deleted at the time we delete the netns, because we still have a reference on it. This lingering reference comes from a secpath (which holds a ref on the xfrm_state), which is still attached to an skb. This skb is not leaked, it ends up on sk_receive_queue and then gets defer-free'd by skb_attempt_defer_free. The problem happens when we defer freeing an skb (push it on one CPU's defer_list), and don't flush that list before the netns is deleted. In that case, we still have a reference on the xfrm_state that we don't expect at this point. We already drop the skb's dst in the TCP receive path when it's no longer needed, so let's also drop the secpath. At this point, tcp_filter has already called into the LSM hooks that may require the secpath, so it should not be needed anymore. However, in some of those places, the MPTCP extension has just been attached to the skb, so we cannot simply drop all extensions. Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/5055ba8f8f72bdcb602faa299faca73c280b7735.1739743613.git.sd@queasysnail.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-19net: dismiss sk_forward_alloc_get()Paolo Abeni
After the previous patch we can remove the forward_alloc_get proto callback, basically reverting commit 292e6077b040 ("net: introduce sk_forward_alloc_get()") and commit 66d58f046c9d ("net: use sk_forward_alloc_get() in sk_get_meminfo()"). Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250218-net-next-mptcp-rx-path-refactor-v1-5-4a47d90d7998@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19ipv4: fib_rules: Add port mask matchingIdo Schimmel
Extend IPv4 FIB rules to match on source and destination ports using a mask. Note that the mask is only set when not matching on a range. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250217134109.311176-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19net: fib_rules: Add port mask supportIdo Schimmel
Add support for configuring and deleting rules that match on source and destination ports using a mask as well as support for dumping such rules to user space. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250217134109.311176-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: Add net_passive_inc() and net_passive_dec().Kuniyuki Iwashima
net_drop_ns() is NULL when CONFIG_NET_NS is disabled. The next patch introduces a function that increments and decrements net->passive. As a prep, let's rename and export net_free() to net_passive_dec() and add net_passive_inc(). Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/CANn89i+oUCt2VGvrbrweniTendZFEh+nwS=uonc004-aPkWy-Q@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250217191129.19967-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18ipv6: initialize inet socket cookies with sockcm_initWillem de Bruijn
Avoid open coding the same logic. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-8-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18ipv6: replace ipcm6_init calls with ipcm6_init_skWillem de Bruijn
This initializes tclass and dontfrag before cmsg parsing, removing the need for explicit checks against -1 in each caller. Leave hlimit set to -1, because its full initialization (in ip6_sk_dst_hoplimit) requires more state (dst, flowi6, ..). This also prepares for calling sockcm_init in a follow-on patch. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-7-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18ipv4: remove get_rttosWillem de Bruijn
Initialize the ip cookie tos field when initializing the cookie, in ipcm_init_sk. The existing code inverts the standard pattern for initializing cookie fields. Default is to initialize the field from the sk, then possibly overwrite that when parsing cmsgs (the unlikely case). This field inverts that, setting the field to an illegal value and after cmsg parsing checking whether the value is still illegal and thus should be overridden. Be careful to always apply mask INET_DSCP_MASK, as before. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-5-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18ipv4: initialize inet socket cookies with sockcm_initWillem de Bruijn
Avoid open coding the same logic. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-4-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18net: initialize mark in sockcm_initWillem de Bruijn
Avoid open coding initialization of sockcm fields. Avoid reading the sk_priority field twice. This ensures all callers, existing and future, will correctly try a cmsg passed mark before sk_mark. This patch extends support for cmsg mark to: packet_spkt and packet_tpacket and net/can/raw.c. This patch extends support for cmsg priority to: packet_spkt and packet_tpacket. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-3-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice, iavf: Add support for Rx timestamping Mateusz Polchlopek says: Initially, during VF creation it registers the PTP clock in the system and negotiates with PF it's capabilities. In the meantime the PF enables the Flexible Descriptor for VF. Only this type of descriptor allows to receive Rx timestamps. Enabling virtual clock would be possible, though it would probably perform poorly due to the lack of direct time access. Enable timestamping should be done using userspace tools, e.g. hwstamp_ctl -i $VF -r 14 In order to report the timestamps to userspace, the VF extends timestamp to 40b. To support this feature the flexible descriptors and PTP part in iavf driver have been introduced. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: iavf: add support for Rx timestamps to hotpath iavf: handle set and get timestamps ops iavf: Implement checking DD desc field iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors iavf: define Rx descriptors as qwords libeth: move idpf_rx_csum_decoded and idpf_rx_extracted iavf: periodically cache PHC time iavf: add support for indirect access to PHC time iavf: add initial framework for registering PTP clock iavf: negotiate PTP capabilities iavf: add support for negotiating flexible RXDID format virtchnl: add enumeration for the rxdid format ice: support Rx timestamp on flex descriptor virtchnl: add support for enabling PTP on iAVF ==================== Link: https://patch.msgid.link/20250214192739.1175740-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17netlink: Add nla_put_empty_nest helperJoe Damato
Creating empty nests is helpful when the exact attributes to be exposed in the future are not known. Encapsulate the logic in a helper. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net: use napi_id_valid helperStefano Jordhani
In commit 6597e8d35851 ("netdev-genl: Elide napi_id when not present"), napi_id_valid function was added. Use the helper to refactor open-coded checks in the source. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Stefano Jordhani <sjordhani@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> # for iouring Link: https://patch.msgid.link/20250214181801.931-1-sjordhani@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14net: introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL()Eric Dumazet
We have many EXPORT_SYMBOL(x) in networking tree because IPv6 can be built as a module. CONFIG_IPV6=y is becoming the norm. Define a EXPORT_IPV6_MOD(x) which only exports x for modular IPv6. Same principle applies to EXPORT_IPV6_MOD_GPL() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Link: https://patch.msgid.link/20250212132418.1524422-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14libeth: move idpf_rx_csum_decoded and idpf_rx_extractedMateusz Polchlopek
Structs idpf_rx_csum_decoded and idpf_rx_extracted are used both in idpf and iavf Intel drivers. Change the prefix from idpf_* to libeth_* and move mentioned structs to libeth's rx.h header file. Adjust usage in idpf driver. Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-13Bluetooth: L2CAP: Fix corrupted list in hci_chan_delLuiz Augusto von Dentz
This fixes the following trace by reworking the locking of l2cap_conn so instead of only locking when changing the chan_l list this promotes chan_lock to a general lock of l2cap_conn so whenever it is being held it would prevents the likes of l2cap_conn_del to run: list_del corruption, ffff888021297e00->prev is LIST_POISON2 (dead000000000122) ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:61! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 UID: 0 PID: 5896 Comm: syz-executor213 Not tainted 6.14.0-rc1-next-20250204-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024 RIP: 0010:__list_del_entry_valid_or_report+0x12c/0x190 lib/list_debug.c:59 Code: 8c 4c 89 fe 48 89 da e8 32 8c 37 fc 90 0f 0b 48 89 df e8 27 9f 14 fd 48 c7 c7 a0 c0 60 8c 4c 89 fe 48 89 da e8 15 8c 37 fc 90 <0f> 0b 4c 89 e7 e8 0a 9f 14 fd 42 80 3c 2b 00 74 08 4c 89 e7 e8 cb RSP: 0018:ffffc90003f6f998 EFLAGS: 00010246 RAX: 000000000000004e RBX: dead000000000122 RCX: 01454d423f7fbf00 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff819f077c R09: 1ffff920007eded0 R10: dffffc0000000000 R11: fffff520007eded1 R12: dead000000000122 R13: dffffc0000000000 R14: ffff8880352248d8 R15: ffff888021297e00 FS: 00007f7ace6686c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7aceeeb1d0 CR3: 000000003527c000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __list_del_entry_valid include/linux/list.h:124 [inline] __list_del_entry include/linux/list.h:215 [inline] list_del_rcu include/linux/rculist.h:168 [inline] hci_chan_del+0x70/0x1b0 net/bluetooth/hci_conn.c:2858 l2cap_conn_free net/bluetooth/l2cap_core.c:1816 [inline] kref_put include/linux/kref.h:65 [inline] l2cap_conn_put+0x70/0xe0 net/bluetooth/l2cap_core.c:1830 l2cap_sock_shutdown+0xa8a/0x1020 net/bluetooth/l2cap_sock.c:1377 l2cap_sock_release+0x79/0x1d0 net/bluetooth/l2cap_sock.c:1416 __sock_release net/socket.c:642 [inline] sock_close+0xbc/0x240 net/socket.c:1393 __fput+0x3e9/0x9f0 fs/file_table.c:448 task_work_run+0x24f/0x310 kernel/task_work.c:227 ptrace_notify+0x2d2/0x380 kernel/signal.c:2522 ptrace_report_syscall include/linux/ptrace.h:415 [inline] ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline] syscall_exit_work+0xc7/0x1d0 kernel/entry/common.c:173 syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline] syscall_exit_to_user_mode+0x24a/0x340 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f7aceeaf449 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f7ace668218 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: fffffffffffffffc RBX: 00007f7acef39328 RCX: 00007f7aceeaf449 RDX: 000000000000000e RSI: 0000000020000100 RDI: 0000000000000004 RBP: 00007f7acef39320 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 R13: 0000000000000004 R14: 00007f7ace668670 R15: 000000000000000b </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:__list_del_entry_valid_or_report+0x12c/0x190 lib/list_debug.c:59 Code: 8c 4c 89 fe 48 89 da e8 32 8c 37 fc 90 0f 0b 48 89 df e8 27 9f 14 fd 48 c7 c7 a0 c0 60 8c 4c 89 fe 48 89 da e8 15 8c 37 fc 90 <0f> 0b 4c 89 e7 e8 0a 9f 14 fd 42 80 3c 2b 00 74 08 4c 89 e7 e8 cb RSP: 0018:ffffc90003f6f998 EFLAGS: 00010246 RAX: 000000000000004e RBX: dead000000000122 RCX: 01454d423f7fbf00 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff819f077c R09: 1ffff920007eded0 R10: dffffc0000000000 R11: fffff520007eded1 R12: dead000000000122 R13: dffffc0000000000 R14: ffff8880352248d8 R15: ffff888021297e00 FS: 00007f7ace6686c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7acef05b08 CR3: 000000003527c000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Reported-by: syzbot+10bd8fe6741eedd2be2e@syzkaller.appspotmail.com Tested-by: syzbot+10bd8fe6741eedd2be2e@syzkaller.appspotmail.com Fixes: b4f82f9ed43a ("Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
2025-02-12net: avoid unconditionally touching sk_tsflags on RXPaolo Abeni
After commit 5d4cc87414c5 ("net: reorganize "struct sock" fields"), the sk_tsflags field shares the same cacheline with sk_forward_alloc. The UDP protocol does not acquire the sock lock in the RX path; forward allocations are protected via the receive queue spinlock; additionally udp_recvmsg() calls sock_recv_cmsgs() unconditionally touching sk_tsflags on each packet reception. Due to the above, under high packet rate traffic, when the BH and the user-space process run on different CPUs, UDP packet reception experiences a cache miss while accessing sk_tsflags. The receive path doesn't strictly need to access the problematic field; change sock_set_timestamping() to maintain the relevant information in a newly allocated sk_flags bit, so that sock_recv_cmsgs() can take decisions accessing the latter field only. With this patch applied, on an AMD epic server with i40e NICs, I measured a 10% performance improvement for small packets UDP flood performance tests - possibly a larger delta could be observed with more recent H/W. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/dbd18c8a1171549f8249ac5a8b30b1b5ec88a425.1739294057.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net: report csum_complete via qstatsJakub Kicinski
Commit 13c7c941e729 ("netdev: add qstat for csum complete") reserved the entry for csum complete in the qstats uAPI. Start reporting this value now that we have a driver which needs it. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250211181356.580800-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11tcp: add tcp_rto_max_ms sysctlEric Dumazet
Previous patch added a TCP_RTO_MAX_MS socket option to tune a TCP socket max RTO value. Many setups prefer to change a per netns sysctl. This patch adds /proc/sys/net/ipv4/tcp_rto_max_ms Its initial value is 120000 (120 seconds). Keep in mind that a decrease of tcp_rto_max_ms means shorter overall timeouts, unless tcp_retries2 sysctl is increased. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11tcp: add the ability to control max RTOEric Dumazet
Currently, TCP stack uses a constant (120 seconds) to limit the RTO value exponential growth. Some applications want to set a lower value. Add TCP_RTO_MAX_MS socket option to set a value (in ms) between 1 and 120 seconds. It is discouraged to change the socket rto max on a live socket, as it might lead to unexpected disconnects. Following patch is adding a netns sysctl to control the default value at socket creation time. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11tcp: add a @pace_delay parameter to tcp_reset_xmit_timer()Eric Dumazet
We want to factorize calls to inet_csk_reset_xmit_timer(), to ease TCP_RTO_MAX change. Current users want to add tcp_pacing_delay(sk) to the timeout. Remaining calls to inet_csk_reset_xmit_timer() do not add the pacing delay. Following patch will convert them, passing false for @pace_delay. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11tcp: remove tcp_reset_xmit_timer() @max_when argumentEric Dumazet
All callers use TCP_RTO_MAX, we can factorize this constant, becoming a variable soon. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11wifi: mac80211: set ieee80211_prep_tx_info::link_id upon Auth RxEmmanuel Grumbach
This will be used by the low level driver. Note that link_id will be 0 in case of a non-MLO authentication. Also fix a call-site of mgd_prepare_tx() where the link_id was not populated. Update the documentation to reflect the current state ieee80211_prep_tx_info::link_id is also available in mgd_complete_tx(). Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250205110958.6a590f189ce5.I1fc5c0da26b143f5b07191eb592f01f7083d55ae@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-11wifi: mac80211: add strict mode disabling workaroundsJohannes Berg
Add a strict mode where we disable certain workarounds and have additional checks such as, for now, that VHT capabilities from association response match those from beacon/probe response. We can extend the checks in the future. Make it an opt-in setting by the driver so it can be set there in some driver-specific way, for example. Also allow setting this one hw flag through the hwflags debugfs, by writing a new strict=0 or strict=1 value. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250205110958.5cecb0469479.I4a69617dc60ba0d6308416ffbc3102cfd08ba068@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-11wifi: mac80211: Add support for EPCS configurationIlan Peer
Add support for configuring EPCS state: - When EPCS is enabled, send an EPCS enable request action frame to the AP. When the AP replies with EPCS enable response, enable EPCS by applying the QoS parameters provided by the AP. Do so for all the valid MLD links. Once EPCS is enabled, support processing of unsolicited EPCS enable response frames. - When EPCS is disabled, send an EPCS teardown request to the AP and apply the QoS parameters as obtained from the last received beacons. Do so for all the valid links. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250205110958.7a90afd7e140.I3f602d65f5c1fd849d6c70b12307dda33aa91ccb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-11wifi: mac80211: Drop cooked monitor supportAlexander Wetzel
Hostapd switched from cooked monitor interfaces to nl80211 Dec 2011. Drop support for the outdated cooked monitor interfaces and fix creating the virtual monitor interfaces in the following cases: 1) We have one non-monitor and one monitor interface with %MONITOR_FLAG_ACTIVE enabled and then delete the non-monitor interface. 2) We only have monitor interfaces enabled on resume while at least one has %MONITOR_FLAG_ACTIVE set. Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de> Link: https://patch.msgid.link/20250204111352.7004-2-Alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-11wifi: nl80211/cfg80211: Stop supporting cooked monitorAlexander Wetzel
Unconditionally start to refuse creating cooked monitor interfaces to phase them out. There is no feature flag for drivers to opt-in for cooked monitor and all known users are using/preferring the modern API since the hostapd release 1.0 in May 2012. Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de> Link: https://patch.msgid.link/20250204111352.7004-1-Alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-10net: fib_rules: Factorise fib_newrule() and fib_delrule().Kuniyuki Iwashima
fib_nl_newrule() / fib_nl_delrule() is the doit() handler for RTM_NEWRULE / RTM_DELRULE but also called from vrf_newlink(). Currently, we hold RTNL on both paths but will not on the former. Also, we set dev_net(dev)->rtnl to skb->sk in vrf_fib_rule() because fib_nl_newrule() / fib_nl_delrule() fetch net as sock_net(skb->sk). Let's Factorise the two functions and pass net and rtnl_held flag. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10vrf: use RCU protection in l3mdev_l3_out()Eric Dumazet
l3mdev_l3_out() can be called without RCU being held: raw_sendmsg() ip_push_pending_frames() ip_send_skb() ip_local_out() __ip_local_out() l3mdev_ip_out() Add rcu_read_lock() / rcu_read_unlock() pair to avoid a potential UAF. Fixes: a8e3e1a9f020 ("net: l3mdev: Add hook to output path") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10xsk: add helper to get &xdp_desc's DMA and meta pointer in one goAlexander Lobakin
Currently, when your driver supports XSk Tx metadata and you want to send an XSk frame, you need to do the following: * call external xsk_buff_raw_get_dma(); * call inline xsk_buff_get_metadata(), which calls external xsk_buff_raw_get_data() and then do some inline checks. This effectively means that the following piece: addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr; is done twice per frame, plus you have 2 external calls per frame, plus this: meta = pool->addrs + addr - pool->tx_metadata_len; if (unlikely(!xsk_buff_valid_tx_metadata(meta))) is always inlined, even if there's no meta or it's invalid. Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that in one go. It returns a small structure with 2 fields: DMA address, filled unconditionally, and metadata pointer, non-NULL only if it's present and valid. The address correction is performed only once and you also have only 1 external call per XSk frame, which does all the calculations and checks outside of your hotpath. You only need to check `if (ctx.meta)` for the metadata presence. To not copy any existing code, derive address correction and getting virtual and DMA address into small helpers. bloat-o-meter reports no object code changes for the existing functionality. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250206182630.3914318-5-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-08lib/crc32: rename __crc32c_le_combine() to crc32c_combine()Eric Biggers
Since the Castagnoli CRC32 is now always just crc32c(), rename __crc32c_le_combine() and __crc32c_le_shift() accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08lib/crc32: standardize on crc32c() name for Castagnoli CRC32Eric Biggers
For historical reasons, the Castagnoli CRC32 is available under 3 names: crc32c(), crc32c_le(), and __crc32c_le(). Most callers use crc32c(). The more verbose versions are not really warranted; there is no "_be" version that the "_le" version needs to be differentiated from, and the leading underscores are pointless. Therefore, let's standardize on just crc32c(). Remove the other two names, and update callers accordingly. Specifically, the new crc32c() comes from what was previously __crc32c_le(), so compared to the old crc32c() it now takes a size_t length rather than unsigned int, and it's now in linux/crc32.h instead of just linux/crc32c.h (which includes linux/crc32.h). Later patches will also rename __crc32c_le_combine(), crc32c_le_base(), and crc32c_le_arch(). Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-07net: devmem: don't call queue stop / start when the interface is downJakub Kicinski
We seem to be missing a netif_running() check from the devmem installation path. Starting a queue on a stopped device makes no sense. We still want to be able to allocate the memory, just to test that the device is indeed setting up the page pools in a memory provider compatible way. This is not a bug fix, because existing drivers check if the interface is down as part of the ops. But new drivers shouldn't have to do this, as long as they can correctly alloc/free while down. Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250206225638.1387810-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-07tcp: rename inet_csk_{delete|reset}_keepalive_timer()Eric Dumazet
inet_csk_delete_keepalive_timer() and inet_csk_reset_keepalive_timer() are only used from core TCP, there is no need to export them. Replace their prefix by tcp. Move them to net/ipv4/tcp_timer.c and make tcp_delete_keepalive_timer() static. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250206094605.2694118-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>