summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2025-07-03ipv6: Cleanup fib6_drop_pcpu_from()Yue Haibing
Since commit 0e2338749192 ("ipv6: fix races in ip6_dst_destroy()"), 'table' is unused in __fib6_drop_pcpu_from(), no need pass it from fib6_drop_pcpu_from(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250701041235.1333687-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-03netfilter: conntrack: remove DCCP protocol supportPablo Neira Ayuso
The DCCP socket family has now been removed from this tree, see: 8bb3212be4b4 ("Merge branch 'net-retire-dccp-socket'") Remove connection tracking and NAT support for this protocol, this should not pose a problem because no DCCP traffic is expected to be seen on the wire. As for the code for matching on dccp header for iptables and nftables, mark it as deprecated and keep it in place. Ruleset restoration is an atomic operation. Without dccp matching support, an astray match on dccp could break this operation leaving your computer with no policy in place, so let's follow a more conservative approach for matches. Add CONFIG_NFT_EXTHDR_DCCP which is set to 'n' by default to deprecate dccp extension support. Similarly, label CONFIG_NETFILTER_XT_MATCH_DCCP as deprecated too and also set it to 'n' by default. Code to match on DCCP protocol from ebtables also remains in place, this is just a few checks on IPPROTO_DCCP from _check() path which is exercised when ruleset is loaded. There is another use of IPPROTO_DCCP from the _check() path in the iptables multiport match. Another check for IPPROTO_DCCP from the packet in the reject target is also removed. So let's schedule removal of the dccp matching for a second stage, this should not interfer with the dccp retirement since this is only matching on the dccp header. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-03vsock/vmci: Clear the vmci transport packet properly when initializing itHarshaVardhana S A
In vmci_transport_packet_init memset the vmci_transport_packet before populating the fields to avoid any uninitialised data being left in the structure. Cc: Bryan Tan <bryan-bt.tan@broadcom.com> Cc: Vishnu Dasa <vishnu.dasa@broadcom.com> Cc: Broadcom internal kernel review list Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: virtualization@lists.linux.dev Cc: netdev@vger.kernel.org Cc: stable <stable@kernel.org> Signed-off-by: HarshaVardhana S A <harshavardhana.sa@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250701122254.2397440-1-gregkh@linuxfoundation.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-02net: ipv6: Fix spelling mistakeChenguang Zhao
change 'Maximium' to 'Maximum' Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20250702055820.112190-1-zhaochenguang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02devlink: Extend devlink rate API with traffic classes bandwidth managementCarolina Jubran
Introduce support for specifying relative bandwidth shares between traffic classes (TC) in the devlink-rate API. This new option allows users to allocate bandwidth across multiple traffic classes in a single command. This feature provides a more granular control over traffic management, especially for scenarios requiring Enhanced Transmission Selection. Users can now define a relative bandwidth share for each traffic class. For example, assigning share values of 20 to TC0 (TCP/UDP) and 80 to TC5 (RoCE) will result in TC0 receiving 20% and TC5 receiving 80% of the total bandwidth. The actual percentage each class receives depends on the ratio of its share value to the sum of all shares. Example: DEV=pci/0000:08:00.0 $ devlink port function rate add $DEV/vfs_group tx_share 10Gbit \ tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0 $ devlink port function rate set $DEV/vfs_group \ tc-bw 0:20 1:0 2:0 3:0 4:0 5:20 6:60 7:0 Example usage with ynl: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-set --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1, "rate-tc-bws": [ {"rate-tc-index": 0, "rate-tc-bw": 50}, {"rate-tc-index": 1, "rate-tc-bw": 50}, {"rate-tc-index": 2, "rate-tc-bw": 0}, {"rate-tc-index": 3, "rate-tc-bw": 0}, {"rate-tc-index": 4, "rate-tc-bw": 0}, {"rate-tc-index": 5, "rate-tc-bw": 0}, {"rate-tc-index": 6, "rate-tc-bw": 0}, {"rate-tc-index": 7, "rate-tc-bw": 0} ] }' ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-get --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1 }' output for rate-get: {'bus-name': 'pci', 'dev-name': '0000:08:00.0', 'port-index': 1, 'rate-tc-bws': [{'rate-tc-bw': 50, 'rate-tc-index': 0}, {'rate-tc-bw': 50, 'rate-tc-index': 1}, {'rate-tc-bw': 0, 'rate-tc-index': 2}, {'rate-tc-bw': 0, 'rate-tc-index': 3}, {'rate-tc-bw': 0, 'rate-tc-index': 4}, {'rate-tc-bw': 0, 'rate-tc-index': 5}, {'rate-tc-bw': 0, 'rate-tc-index': 6}, {'rate-tc-bw': 0, 'rate-tc-index': 7}], 'rate-tx-max': 0, 'rate-tx-priority': 0, 'rate-tx-share': 0, 'rate-tx-weight': 0, 'rate-type': 'leaf'} Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250629142138.361537-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02net: preserve MSG_ZEROCOPY with forwardingWillem de Bruijn
MSG_ZEROCOPY data must be copied before data is queued to local sockets, to avoid indefinite timeout until memory release. This test is performed by skb_orphan_frags_rx, which is called when looping an egress skb to packet sockets, error queue or ingress path. To preserve zerocopy for skbs that are looped to ingress but are then forwarded to an egress device rather than delivered locally, defer this last check until an skb enters the local IP receive path. This is analogous to existing behavior of skb_clear_delivery_time. Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250630194312.1571410-2-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02net: ipv4: fix stat increase when udp early demux drops the packetAntoine Tenart
udp_v4_early_demux now returns drop reasons as it either returns 0 or ip_mc_validate_source, which returns itself a drop reason. However its use was not converted in ip_rcv_finish_core and the drop reason is ignored, leading to potentially skipping increasing LINUX_MIB_IPRPFILTER if the drop reason is SKB_DROP_REASON_IP_RPFILTER. This is a fix and we're not converting udp_v4_early_demux to explicitly return a drop reason to ease backports; this can be done as a follow-up. Fixes: d46f827016d8 ("net: ip: make ip_mc_validate_source() return drop reason") Cc: Menglong Dong <menglong8.dong@gmail.com> Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20250701074935.144134-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02net/sched: Always pass notifications when child class becomes emptyLion Ackermann
Certain classful qdiscs may invoke their classes' dequeue handler on an enqueue operation. This may unexpectedly empty the child qdisc and thus make an in-flight class passive via qlen_notify(). Most qdiscs do not expect such behaviour at this point in time and may re-activate the class eventually anyways which will lead to a use-after-free. The referenced fix commit attempted to fix this behavior for the HFSC case by moving the backlog accounting around, though this turned out to be incomplete since the parent's parent may run into the issue too. The following reproducer demonstrates this use-after-free: tc qdisc add dev lo root handle 1: drr tc filter add dev lo parent 1: basic classid 1:1 tc class add dev lo parent 1: classid 1:1 drr tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1 tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0 tc qdisc add dev lo parent 2:1 handle 3: netem tc qdisc add dev lo parent 3:1 handle 4: blackhole echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888 tc class delete dev lo classid 1:1 echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888 Since backlog accounting issues leading to a use-after-frees on stale class pointers is a recurring pattern at this point, this patch takes a different approach. Instead of trying to fix the accounting, the patch ensures that qdisc_tree_reduce_backlog always calls qlen_notify when the child qdisc is empty. This solves the problem because deletion of qdiscs always involves a call to qdisc_reset() and / or qdisc_purge_queue() which ultimately resets its qlen to 0 thus causing the following qdisc_tree_reduce_backlog() to report to the parent. Note that this may call qlen_notify on passive classes multiple times. This is not a problem after the recent patch series that made all the classful qdiscs qlen_notify() handlers idempotent. Fixes: 3f981138109f ("sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()") Signed-off-by: Lion Ackermann <nnamrec@gmail.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02ipv6: ip6_mc_input() and ip6_mr_input() cleanupsEric Dumazet
Both functions are always called under RCU. We remove the extra rcu_read_lock()/rcu_read_unlock(). We use skb_dst_dev_net_rcu() instead of skb_dst_dev_net(). We use dev_net_rcu() instead of dev_net(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02ipv6: adopt skb_dst_dev() and skb_dst_dev_net[_rcu]() helpersEric Dumazet
Use the new helpers as a step to deal with potential dst->dev races. v2: fix typo in ipv6_rthdr_rcv() (kernel test robot <lkp@intel.com>) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02ipv6: adopt dst_dev() helperEric Dumazet
Use the new helper as a step to deal with potential dst->dev races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02ipv4: adopt dst_dev, skb_dst_dev and skb_dst_dev_net[_rcu]Eric Dumazet
Use the new helpers as a first step to deal with potential dst->dev races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02net: dst: add four helpers to annotate data-races around dst->devEric Dumazet
dst->dev is read locklessly in many contexts, and written in dst_dev_put(). Fixing all the races is going to need many changes. We probably will have to add full RCU protection. Add three helpers to ease this painful process. static inline struct net_device *dst_dev(const struct dst_entry *dst) { return READ_ONCE(dst->dev); } static inline struct net_device *skb_dst_dev(const struct sk_buff *skb) { return dst_dev(skb_dst(skb)); } static inline struct net *skb_dst_dev_net(const struct sk_buff *skb) { return dev_net(skb_dst_dev(skb)); } static inline struct net *skb_dst_dev_net_rcu(const struct sk_buff *skb) { return dev_net_rcu(skb_dst_dev(skb)); } Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02net: dst: annotate data-races around dst->outputEric Dumazet
dst_dev_put() can overwrite dst->output while other cpus might read this field (for instance from dst_output()) Add READ_ONCE()/WRITE_ONCE() annotations to suppress potential issues. We will likely need RCU protection in the future. Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02net: dst: annotate data-races around dst->inputEric Dumazet
dst_dev_put() can overwrite dst->input while other cpus might read this field (for instance from dst_input()) Add READ_ONCE()/WRITE_ONCE() annotations to suppress potential issues. We will likely need full RCU protection later. Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02net: dst: annotate data-races around dst->lastuseEric Dumazet
(dst_entry)->lastuse is read and written locklessly, add corresponding annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02net: dst: annotate data-races around dst->expiresEric Dumazet
(dst_entry)->expires is read and written locklessly, add corresponding annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02net: dst: annotate data-races around dst->obsoleteEric Dumazet
(dst_entry)->obsolete is read locklessly, add corresponding annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02udp: move udp_memory_allocated into net_aligned_dataEric Dumazet
____cacheline_aligned_in_smp attribute only makes sure to align a field to a cache line. It does not prevent the linker to use the remaining of the cache line for other variables, causing potential false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250630093540.3052835-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02tcp: move tcp_memory_allocated into net_aligned_dataEric Dumazet
____cacheline_aligned_in_smp attribute only makes sure to align a field to a cache line. It does not prevent the linker to use the remaining of the cache line for other variables, causing potential false sharing. Move tcp_memory_allocated into a dedicated cache line. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250630093540.3052835-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02net: move net_cookie into net_aligned_dataEric Dumazet
Using per-cpu data for net->net_cookie generation is overkill, because even busy hosts do not create hundreds of netns per second. Make sure to put net_cookie in a private cache line to avoid potential false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250630093540.3052835-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02net: add struct net_aligned_dataEric Dumazet
This structure will hold networking data that must consume a full cache line to avoid accidental false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250630093540.3052835-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01seg6: fix lenghts typo in a commentAndrea Mayer
Fix a typo: lenghts -> length The typo has been identified using codespell, and the tool currently does not report any additional issues in comments. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250629171226.4988-2-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01rose: fix dangling neighbour pointers in rose_rt_device_down()Kohei Enju
There are two bugs in rose_rt_device_down() that can cause use-after-free: 1. The loop bound `t->count` is modified within the loop, which can cause the loop to terminate early and miss some entries. 2. When removing an entry from the neighbour array, the subsequent entries are moved up to fill the gap, but the loop index `i` is still incremented, causing the next entry to be skipped. For example, if a node has three neighbours (A, A, B) with count=3 and A is being removed, the second A is not checked. i=0: (A, A, B) -> (A, B) with count=2 ^ checked i=1: (A, B) -> (A, B) with count=2 ^ checked (B, not A!) i=2: (doesn't occur because i < count is false) This leaves the second A in the array with count=2, but the rose_neigh structure has been freed. Code that accesses these entries assumes that the first `count` entries are valid pointers, causing a use-after-free when it accesses the dangling pointer. Fix both issues by iterating over the array in reverse order with a fixed loop bound. This ensures that all entries are examined and that the removal of an entry doesn't affect subsequent iterations. Reported-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e04e2c007ba2c80476cb Tested-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kohei Enju <enjuk@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250629030833.6680-1-enjuk@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01ip6_tunnel: enable to change proto of fb tunnelsNicolas Dichtel
This is possible via the ioctl API: > ip -6 tunnel change ip6tnl0 mode any Let's align the netlink API: > ip link set ip6tnl0 type ip6tnl mode any Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20250630145602.1027220-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01net: ethtool: fix leaking netdev ref if ethnl_default_parse() failedJakub Kicinski
Ido spotted that I made a mistake in commit under Fixes, ethnl_default_parse() may acquire a dev reference even when it returns an error. This may have been driven by the code structure in dumps (which unconditionally release dev before handling errors), but it's too much of a trap. Functions should undo what they did before returning an error, rather than expecting caller to clean up. Rather than fixing ethnl_default_set_doit() directly make ethnl_default_parse() clean up errors. Reported-by: Ido Schimmel <idosch@idosch.org> Link: https://lore.kernel.org/aGEPszpq9eojNF4Y@shredder Fixes: 963781bdfe20 ("net: ethtool: call .parse_request for SET handlers") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250630154053.1074664-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01Merge tag 'nfs-for-6.16-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: - Fix loop in GSS sequence number cache - Clean up /proc/net/rpc/nfs if nfs_fs_proc_net_init() fails - Fix a race to wake on NFS_LAYOUT_DRAIN - Fix handling of NFS level errors in I/O * tag 'nfs-for-6.16-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4/flexfiles: Fix handling of NFS level errors in I/O NFSv4/pNFS: Fix a race to wake on NFS_LAYOUT_DRAIN nfs: Clean up /proc/net/rpc/nfs when nfs_fs_proc_net_init() fails. sunrpc: fix loop in gss seqno cache
2025-07-01net: ieee8021q: fix insufficient table-size assertionRubenKelevra
_Static_assert(ARRAY_SIZE(map) != IEEE8021Q_TT_MAX - 1) rejects only a length of 7 and allows any other mismatch. Replace it with a strict equality test via a helper macro so that every mapping table must have exactly IEEE8021Q_TT_MAX (8) entries. Signed-off-by: RubenKelevra <rubenkelevra@gmail.com> Link: https://patch.msgid.link/20250626205907.1566384-1-rubenkelevra@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-30net: net->nsid_lock does not need BH safetyEric Dumazet
At the time of commit bc51dddf98c9 ("netns: avoid disabling irq for netns id") peernet2id() was not yet using RCU. Commit 2dce224f469f ("netns: protect netns ID lookups with RCU") changed peernet2id() to no longer acquire net->nsid_lock (potentially from BH context). We do not need to block soft interrupts when acquiring net->nsid_lock anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Guillaume Nault <gnault@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250627163242.230866-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30neighbor: Add NTF_EXT_VALIDATED flag for externally validated entriesIdo Schimmel
tl;dr ===== Add a new neighbor flag ("extern_valid") that can be used to indicate to the kernel that a neighbor entry was learned and determined to be valid externally. The kernel will not try to remove or invalidate such an entry, leaving these decisions to the user space control plane. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed. Background ========== In a typical EVPN multi-homing setup each host is multi-homed using a set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf switches (VTEPs). VTEPs that are connected to the same ES are called ES peers. When a neighbor entry is learned on a VTEP, it is distributed to both ES peers and remote VTEPs using EVPN MAC/IP advertisement routes. ES peers use the neighbor entry when routing traffic towards the multi-homed host and remote VTEPs use it for ARP/NS suppression. Motivation ========== If the ES link between a host and the VTEP on which the neighbor entry was locally learned goes down, the EVPN MAC/IP advertisement route will be withdrawn and the neighbor entries will be removed from both ES peers and remote VTEPs. Routing towards the multi-homed host and ARP/NS suppression can fail until another ES peer locally learns the neighbor entry and distributes it via an EVPN MAC/IP advertisement route. "draft-rbickhart-evpn-ip-mac-proxy-adv-03" [1] suggests avoiding these intermittent failures by having the ES peers install the neighbor entries as before, but also injecting EVPN MAC/IP advertisement routes with a proxy indication. When the previously mentioned ES link goes down and the original EVPN MAC/IP advertisement route is withdrawn, the ES peers will not withdraw their neighbor entries, but instead start aging timers for the proxy indication. If an ES peer locally learns the neighbor entry (i.e., it becomes "reachable"), it will restart its aging timer for the entry and emit an EVPN MAC/IP advertisement route without a proxy indication. An ES peer will stop its aging timer for the proxy indication if it observes the removal of the proxy indication from at least one of the ES peers advertising the entry. In the event that the aging timer for the proxy indication expired, an ES peer will withdraw its EVPN MAC/IP advertisement route. If the timer expired on all ES peers and they all withdrew their proxy advertisements, the neighbor entry will be completely removed from the EVPN fabric. Implementation ============== In the above scheme, when the control plane (e.g., FRR) advertises a neighbor entry with a proxy indication, it expects the corresponding entry in the data plane (i.e., the kernel) to remain valid and not be removed due to garbage collection or loss of carrier. The control plane also expects the kernel to notify it if the entry was learned locally (i.e., became "reachable") so that it will remove the proxy indication from the EVPN MAC/IP advertisement route. That is why these entries cannot be programmed with dummy states such as "permanent" or "noarp". Instead, add a new neighbor flag ("extern_valid") which indicates that the entry was learned and determined to be valid externally and should not be removed or invalidated by the kernel. The kernel can probe the entry and notify user space when it becomes "reachable" (it is initially installed as "stale"). However, if the kernel does not receive a confirmation, have it return the entry to the "stale" state instead of the "failed" state. In other words, an entry marked with the "extern_valid" flag behaves like any other dynamically learned entry other than the fact that the kernel cannot remove or invalidate it. One can argue that the "extern_valid" flag should not prevent garbage collection and that instead a neighbor entry should be programmed with both the "extern_valid" and "extern_learn" flags. There are two reasons for not doing that: 1. Unclear why a control plane would like to program an entry that the kernel cannot invalidate but can completely remove. 2. The "extern_learn" flag is used by FRR for neighbor entries learned on remote VTEPs (for ARP/NS suppression) whereas here we are concerned with local entries. This distinction is currently irrelevant for the kernel, but might be relevant in the future. Given that the flag only makes sense when the neighbor has a valid state, reject attempts to add a neighbor with an invalid state and with this flag set. For example: # ip neigh add 192.0.2.1 nud none dev br0.10 extern_valid Error: Cannot create externally validated neighbor with an invalid state. # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid # ip neigh replace 192.0.2.1 nud failed dev br0.10 extern_valid Error: Cannot mark neighbor as externally validated with an invalid state. The above means that a neighbor cannot be created with the "extern_valid" flag and flags such as "use" or "managed" as they result in a neighbor being created with an invalid state ("none") and immediately getting probed: # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use Error: Cannot create externally validated neighbor with an invalid state. However, these flags can be used together with "extern_valid" after the neighbor was created with a valid state: # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use One consequence of preventing the kernel from invalidating a neighbor entry is that by default it will only try to determine reachability using unicast probes. This can be changed using the "mcast_resolicit" sysctl: # sysctl net.ipv4.neigh.br0/10.mcast_resolicit 0 # tcpdump -nn -e -i br0.10 -Q out arp & # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 # sysctl -wq net.ipv4.neigh.br0/10.mcast_resolicit=3 # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 iproute2 patches can be found here [2]. [1] https://datatracker.ietf.org/doc/html/draft-rbickhart-evpn-ip-mac-proxy-adv-03 [2] https://github.com/idosch/iproute2/tree/submit/extern_valid_v1 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20250626073111.244534-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30ipv6: guard ip6_mr_output() with rcuEric Dumazet
syzbot found at least one path leads to an ip_mr_output() without RCU being held. Add guard(rcu)() to fix this in a concise way. WARNING: net/ipv6/ip6mr.c:2376 at ip6_mr_output+0xe0b/0x1040 net/ipv6/ip6mr.c:2376, CPU#1: kworker/1:2/121 Call Trace: <TASK> ip6tunnel_xmit include/net/ip6_tunnel.h:162 [inline] udp_tunnel6_xmit_skb+0x640/0xad0 net/ipv6/ip6_udp_tunnel.c:112 send6+0x5ac/0x8d0 drivers/net/wireguard/socket.c:152 wg_socket_send_skb_to_peer+0x111/0x1d0 drivers/net/wireguard/socket.c:178 wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline] wg_packet_tx_worker+0x1c8/0x7c0 drivers/net/wireguard/send.c:276 process_one_work kernel/workqueue.c:3239 [inline] process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3322 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3403 kthread+0x70e/0x8a0 kernel/kthread.c:464 ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Fixes: 96e8f5a9fe2d ("net: ipv6: Add ip6_mr_output()") Reported-by: syzbot+0141c834e47059395621@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/685e86b3.a00a0220.129264.0003.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Benjamin Poirier <bpoirier@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250627115822.3741390-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: ethtool: move get_rxfh callback under the rss_lockJakub Kicinski
We already call get_rxfh under the rss_lock when we read back context state after changes. Let's be consistent and always hold the lock. The existing callers are all under rtnl_lock so this should make no difference in practice, but it makes the locking rules far less confusing IMHO. Any RSS callback and any access to the RSS XArray should hold the lock. Link: https://patch.msgid.link/20250626202848.104457-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: ethtool: move rxfh_fields callbacks under the rss_lockJakub Kicinski
Netlink code will want to perform the RSS_SET operation atomically under the rss_lock. sfc wants to hold the rss_lock in rxfh_fields_get, which makes that difficult. Lets move the locking up to the core so that for all driver-facing callbacks rss_lock is taken consistently by the core. Link: https://patch.msgid.link/20250626202848.104457-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: ethtool: take rss_lock for all rxfh changesJakub Kicinski
Always take the rss_lock in ethtool_set_rxfh(). We will want to make a similar change in ethtool_set_rxfh_fields() and some drivers lock that callback regardless of rss context ID being set. Having some callbacks locked unconditionally and some only if context ID is set would be very confusing. ethtool handling is under rtnl_lock, so rss_lock is very unlikely to ever be congested. Link: https://patch.msgid.link/20250626202848.104457-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: ethtool: avoid OOB accesses in PAUSE_SETJakub Kicinski
We now reuse .parse_request() from GET on SET, so we need to make sure that the policies for both cover the attributes used for .parse_request(). genetlink will only allocate space in info->attrs for ARRAY_SIZE(policy). Reported-by: syzbot+430f9f76633641a62217@syzkaller.appspotmail.com Fixes: 963781bdfe20 ("net: ethtool: call .parse_request for SET handlers") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250626233926.199801-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30wifi: cfg80211: fix S1G beacon head validation in nl80211Lachlan Hodges
S1G beacons contain fixed length optional fields that precede the variable length elements, ensure we take this into account when validating the beacon. This particular case was missed in 1e1f706fc2ce ("wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements"). Fixes: 1d47f1198d58 ("nl80211: correctly validate S1G beacon head") Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250626115118.68660-1-lachlan.hodges@morsemicro.com [shorten/reword subject] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-28net: ipv4: guard ip_mr_output() with rcuEric Dumazet
syzbot found at least one path leads to an ip_mr_output() without RCU being held. Add guard(rcu)() to fix this in a concise way. WARNING: CPU: 0 PID: 0 at net/ipv4/ipmr.c:2302 ip_mr_output+0xbb1/0xe70 net/ipv4/ipmr.c:2302 Call Trace: <IRQ> igmp_send_report+0x89e/0xdb0 net/ipv4/igmp.c:799 igmp_timer_expire+0x204/0x510 net/ipv4/igmp.c:-1 call_timer_fn+0x17e/0x5f0 kernel/time/timer.c:1747 expire_timers kernel/time/timer.c:1798 [inline] __run_timers kernel/time/timer.c:2372 [inline] __run_timer_base+0x61a/0x860 kernel/time/timer.c:2384 run_timer_base kernel/time/timer.c:2393 [inline] run_timer_softirq+0xb7/0x180 kernel/time/timer.c:2403 handle_softirqs+0x286/0x870 kernel/softirq.c:579 __do_softirq kernel/softirq.c:613 [inline] invoke_softirq kernel/softirq.c:453 [inline] __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680 irq_exit_rcu+0x9/0x30 kernel/softirq.c:696 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1050 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1050 Fixes: 35bec72a24ac ("net: ipv4: Add ip_mr_output()") Reported-by: syzbot+f02fb9e43bd85c6c66ae@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/685e841a.a00a0220.129264.0002.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Petr Machata <petrm@nvidia.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Benjamin Poirier <bpoirier@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-27tcp: fix tcp_ofo_queue() to avoid including too much DUP SACK rangexin.guo
If the new coming segment covers more than one skbs in the ofo queue, and which seq is equal to rcv_nxt, then the sequence range that is duplicated will be sent as DUP SACK, the detail as below, in step6, the {501,2001} range is clearly including too much DUP SACK range, in violation of RFC 2883 rules. 1. client > server: Flags [.], seq 501:1001, ack 1325288529, win 20000, length 500 2. server > client: Flags [.], ack 1, [nop,nop,sack 1 {501:1001}], length 0 3. client > server: Flags [.], seq 1501:2001, ack 1325288529, win 20000, length 500 4. server > client: Flags [.], ack 1, [nop,nop,sack 2 {1501:2001} {501:1001}], length 0 5. client > server: Flags [.], seq 1:2001, ack 1325288529, win 20000, length 2000 6. server > client: Flags [.], ack 2001, [nop,nop,sack 1 {501:2001}], length 0 After this fix, the final ACK is as below: 6. server > client: Flags [.], ack 2001, options [nop,nop,sack 1 {501:1001}], length 0 [edumazet] added a new packetdrill test in the following patch. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: xin.guo <guoxin0309@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250626123420.1933835-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27tcp: remove inet_rtx_syn_ack()Eric Dumazet
inet_rtx_syn_ack() is a simple wrapper around tcp_rtx_synack(), if we move req->num_retrans update. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250626153017.2156274-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27tcp: remove rtx_syn_ack fieldEric Dumazet
Now inet_rtx_syn_ack() is only used by TCP, it can directly call tcp_rtx_synack() instead of using an indirect call to req->rsk_ops->rtx_syn_ack(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250626153017.2156274-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27Bluetooth: HCI: Set extended advertising data synchronouslyChristian Eggers
Currently, for controllers with extended advertising, the advertising data is set in the asynchronous response handler for extended adverstising params. As most advertising settings are performed in a synchronous context, the (asynchronous) setting of the advertising data is done too late (after enabling the advertising). Move setting of adverstising data from asynchronous response handler into synchronous context to fix ordering of HCI commands. Signed-off-by: Christian Eggers <ceggers@arri.de> Fixes: a0fb3726ba55 ("Bluetooth: Use Set ext adv/scan rsp data if controller supports") Cc: stable@vger.kernel.org v2: https://lore.kernel.org/linux-bluetooth/20250626115209.17839-1-ceggers@arri.de/ Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-06-27Bluetooth: MGMT: mesh_send: check instances prior disabling advertisingChristian Eggers
The unconditional call of hci_disable_advertising_sync() in mesh_send_done_sync() also disables other LE advertisings (non mesh related). I am not sure whether this call is required at all, but checking the adv_instances list (like done at other places) seems to solve the problem. Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Cc: stable@vger.kernel.org Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-06-27Bluetooth: MGMT: set_mesh: update LE scan interval and windowChristian Eggers
According to the message of commit b338d91703fa ("Bluetooth: Implement support for Mesh"), MGMT_OP_SET_MESH_RECEIVER should set the passive scan parameters. Currently the scan interval and window parameters are silently ignored, although user space (bluetooth-meshd) expects that they can be used [1] [1] https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/mesh/mesh-io-mgmt.c#n344 Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Cc: stable@vger.kernel.org Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-06-27Bluetooth: hci_sync: revert some mesh modificationsChristian Eggers
This reverts minor parts of the changes made in commit b338d91703fa ("Bluetooth: Implement support for Mesh"). It looks like these changes were only made for development purposes but shouldn't have been part of the commit. Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Cc: stable@vger.kernel.org Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-06-27Bluetooth: Prevent unintended pause by checking if advertising is activeYang Li
When PA Create Sync is enabled, advertising resumes unexpectedly. Therefore, it's necessary to check whether advertising is currently active before attempting to pause it. < HCI Command: LE Add Device To... (0x08|0x0011) plen 7 #1345 [hci0] 48.306205 Address type: Random (0x01) Address: 4F:84:84:5F:88:17 (Resolvable) Identity type: Random (0x01) Identity: FC:5B:8C:F7:5D:FB (Static) < HCI Command: LE Set Address Re.. (0x08|0x002d) plen 1 #1347 [hci0] 48.308023 Address resolution: Enabled (0x01) ... < HCI Command: LE Set Extended A.. (0x08|0x0039) plen 6 #1349 [hci0] 48.309650 Extended advertising: Enabled (0x01) Number of sets: 1 (0x01) Entry 0 Handle: 0x01 Duration: 0 ms (0x00) Max ext adv events: 0 ... < HCI Command: LE Periodic Adve.. (0x08|0x0044) plen 14 #1355 [hci0] 48.314575 Options: 0x0000 Use advertising SID, Advertiser Address Type and address Reporting initially enabled SID: 0x02 Adv address type: Random (0x01) Adv address: 4F:84:84:5F:88:17 (Resolvable) Identity type: Random (0x01) Identity: FC:5B:8C:F7:5D:FB (Static) Skip: 0x0000 Sync timeout: 20000 msec (0x07d0) Sync CTE type: 0x0000 Fixes: ad383c2c65a5 ("Bluetooth: hci_sync: Enable advertising when LL privacy is enabled") Signed-off-by: Yang Li <yang.li@amlogic.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-06-26ipv4: fib: Remove unnecessary encap_type checkYue Haibing
lwtunnel_build_state() has check validity of encap_type, so no need to do this before call it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250625022059.3958215-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2025-06-27 We've added 6 non-merge commits during the last 8 day(s) which contain a total of 6 files changed, 120 insertions(+), 20 deletions(-). The main changes are: 1) Fix RCU usage in task_cls_state() for BPF programs using helpers like bpf_get_cgroup_classid_curr() outside of networking, from Charalampos Mitrodimas. 2) Fix a sockmap race between map_update and a pending workqueue from an earlier map_delete freeing the old psock where both pointed to the same psock->sk, from Jiayuan Chen. 3) Fix a data corruption issue when using bpf_msg_pop_data() in kTLS which failed to recalculate the ciphertext length, also from Jiayuan Chen. 4) Remove xdp_redirect_map{,_err} trace events since they are unused and also hide XDP trace events under CONFIG_BPF_SYSCALL, from Steven Rostedt. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: xdp: tracing: Hide some xdp events under CONFIG_BPF_SYSCALL xdp: Remove unused events xdp_redirect_map and xdp_redirect_map_err net, bpf: Fix RCU usage in task_cls_state() for BPF programs selftests/bpf: Add test to cover ktls with bpf_msg_pop_data bpf, ktls: Fix data corruption when using bpf_msg_pop_data() in ktls bpf, sockmap: Fix psock incorrectly pointing to sk ==================== Link: https://patch.msgid.link/20250626230111.24772-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc4). Conflicts: Documentation/netlink/specs/mptcp_pm.yaml 9e6dd4c256d0 ("netlink: specs: mptcp: replace underscores with dashes in names") ec362192aa9e ("netlink: specs: fix up indentation errors") https://lore.kernel.org/20250626122205.389c2cd4@canb.auug.org.au Adjacent changes: Documentation/netlink/specs/fou.yaml 791a9ed0a40d ("netlink: specs: fou: replace underscores with dashes in names") 880d43ca9aa4 ("netlink: specs: clean up spaces in brackets") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26Merge tag 'net-6.16-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth and wireless. Current release - regressions: - bridge: fix use-after-free during router port configuration Current release - new code bugs: - eth: wangxun: fix the creation of page_pool Previous releases - regressions: - netpoll: initialize UDP checksum field before checksumming - wifi: mac80211: finish link init before RCU publish - bluetooth: fix use-after-free in vhci_flush() - eth: - ionic: fix DMA mapping test - bnxt: properly flush XDP redirect lists Previous releases - always broken: - netlink: specs: enforce strict naming of properties - unix: don't leave consecutive consumed OOB skbs. - vsock: fix linux/vm_sockets.h userspace compilation errors - selftests: fix TCP packet checksum" * tag 'net-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) net: libwx: fix the creation of page_pool net: selftests: fix TCP packet checksum atm: Release atm_dev_mutex after removing procfs in atm_dev_deregister(). netlink: specs: enforce strict naming of properties netlink: specs: tc: replace underscores with dashes in names netlink: specs: rt-link: replace underscores with dashes in names netlink: specs: mptcp: replace underscores with dashes in names netlink: specs: ovs_flow: replace underscores with dashes in names netlink: specs: devlink: replace underscores with dashes in names netlink: specs: dpll: replace underscores with dashes in names netlink: specs: ethtool: replace underscores with dashes in names netlink: specs: fou: replace underscores with dashes in names netlink: specs: nfsd: replace underscores with dashes in names net: enetc: Correct endianness handling in _enetc_rd_reg64 atm: idt77252: Add missing `dma_map_error()` bnxt: properly flush XDP redirect lists vsock/uapi: fix linux/vm_sockets.h userspace compilation errors wifi: mac80211: finish link init before RCU publish wifi: iwlwifi: mvm: assume '1' as the default mac_config_cmd version selftest: af_unix: Add tests for -ECONNRESET. ...
2025-06-26net: selftests: fix TCP packet checksumJakub Kicinski
The length in the pseudo header should be the length of the L3 payload AKA the L4 header+payload. The selftest code builds the packet from the lower layers up, so all the headers are pushed already when it constructs L4. We need to subtract the lower layer headers from skb->len. Fixes: 3e1e58d64c3d ("net: add generic selftest support") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250624183258.3377740-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>