summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2024-03-29udp: do not transition UDP GRO fraglist partial checksums to unnecessaryAntoine Tenart
UDP GRO validates checksums and in udp4/6_gro_complete fraglist packets are converted to CHECKSUM_UNNECESSARY to avoid later checks. However this is an issue for CHECKSUM_PARTIAL packets as they can be looped in an egress path and then their partial checksums are not fixed. Different issues can be observed, from invalid checksum on packets to traces like: gen01: hw csum failure skb len=3008 headroom=160 headlen=1376 tailroom=0 mac=(106,14) net=(120,40) trans=160 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0xffff232e ip_summed=2 complete_sw=0 valid=0 level=0) hash(0x77e3d716 sw=1 l4=1) proto=0x86dd pkttype=0 iif=12 ... Fix this by only converting CHECKSUM_NONE packets to CHECKSUM_UNNECESSARY by reusing __skb_incr_checksum_unnecessary. All other checksum types are kept as-is, including CHECKSUM_COMPLETE as fraglist packets being segmented back would have their skb->csum valid. Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-29gro: fix ownership transferAntoine Tenart
If packets are GROed with fraglist they might be segmented later on and continue their journey in the stack. In skb_segment_list those skbs can be reused as-is. This is an issue as their destructor was removed in skb_gro_receive_list but not the reference to their socket, and then they can't be orphaned. Fix this by also removing the reference to the socket. For example this could be observed, kernel BUG at include/linux/skbuff.h:3131! (skb_orphan) RIP: 0010:ip6_rcv_core+0x11bc/0x19a0 Call Trace: ipv6_list_rcv+0x250/0x3f0 __netif_receive_skb_list_core+0x49d/0x8f0 netif_receive_skb_list_internal+0x634/0xd40 napi_complete_done+0x1d2/0x7d0 gro_cell_poll+0x118/0x1f0 A similar construction is found in skb_gro_receive, apply the same change there. Fixes: 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb carring sock reference") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-29udp: do not accept non-tunnel GSO skbs landing in a tunnelAntoine Tenart
When rx-udp-gro-forwarding is enabled UDP packets might be GROed when being forwarded. If such packets might land in a tunnel this can cause various issues and udp_gro_receive makes sure this isn't the case by looking for a matching socket. This is performed in udp4/6_gro_lookup_skb but only in the current netns. This is an issue with tunneled packets when the endpoint is in another netns. In such cases the packets will be GROed at the UDP level, which leads to various issues later on. The same thing can happen with rx-gro-list. We saw this with geneve packets being GROed at the UDP level. In such case gso_size is set; later the packet goes through the geneve rx path, the geneve header is pulled, the offset are adjusted and frag_list skbs are not adjusted with regard to geneve. When those skbs hit skb_fragment, it will misbehave. Different outcomes are possible depending on what the GROed skbs look like; from corrupted packets to kernel crashes. One example is a BUG_ON[1] triggered in skb_segment while processing the frag_list. Because gso_size is wrong (geneve header was pulled) skb_segment thinks there is "geneve header size" of data in frag_list, although it's in fact the next packet. The BUG_ON itself has nothing to do with the issue. This is only one of the potential issues. Looking up for a matching socket in udp_gro_receive is fragile: the lookup could be extended to all netns (not speaking about performances) but nothing prevents those packets from being modified in between and we could still not find a matching socket. It's OK to keep the current logic there as it should cover most cases but we also need to make sure we handle tunnel packets being GROed too early. This is done by extending the checks in udp_unexpected_gso: GSO packets lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits and landing in a tunnel must be segmented. [1] kernel BUG at net/core/skbuff.c:4408! RIP: 0010:skb_segment+0xd2a/0xf70 __udp_gso_segment+0xaa/0x560 Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-29net: hsr: Use full string description when opening HSR network deviceLukasz Majewski
Up till now only single character ('A' or 'B') was used to provide information of HSR slave network device status. As it is also possible and valid, that Interlink network device may be supported as well, the description must be more verbose. As a result the full string description is now used. Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-29net/smc: make smc_hash_sk/smc_unhash_sk staticZhengchao Shao
smc_hash_sk and smc_unhash_sk are only used in af_smc.c, so make them static and remove the output symbol. They can be called under the path .prot->hash()/unhash(). Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-29net: sched: make skip_sw actually skip softwareAsbjørn Sloth Tønnesen
TC filters come in 3 variants: - no flag (try to process in hardware, but fallback to software)) - skip_hw (do not process filter by hardware) - skip_sw (do not process filter by software) However skip_sw is implemented so that the skip_sw flag can first be checked, after it has been matched. IMHO it's common when using skip_sw, to use it on all rules. So if all filters in a block is skip_sw filters, then we can bail early, we can thus avoid having to match the filters, just to check for the skip_sw flag. This patch adds a bypass, for when only TC skip_sw rules are used. The bypass is guarded by a static key, to avoid harming other workloads. There are 3 ways that a packet from a skip_sw ruleset, can end up in the kernel path. Although the send packets to a non-existent chain way is only improved a few percents, then I believe it's worth optimizing the trap and fall-though use-cases. +----------------------------+--------+--------+--------+ | Test description | Pre- | Post- | Rel. | | | kpps | kpps | chg. | +----------------------------+--------+--------+--------+ | basic forwarding + notrack | 3589.3 | 3587.9 | 1.00x | | switch to eswitch mode | 3081.8 | 3094.7 | 1.00x | | add ingress qdisc | 3042.9 | 3063.6 | 1.01x | | tc forward in hw / skip_sw |37024.7 |37028.4 | 1.00x | | tc forward in sw / skip_hw | 3245.0 | 3245.3 | 1.00x | +----------------------------+--------+--------+--------+ | tests with only skip_sw rules below: | +----------------------------+--------+--------+--------+ | 1 non-matching rule | 2694.7 | 3058.7 | 1.14x | | 1 n-m rule, match trap | 2611.2 | 3323.1 | 1.27x | | 1 n-m rule, goto non-chain | 2886.8 | 2945.9 | 1.02x | | 5 non-matching rules | 1958.2 | 3061.3 | 1.56x | | 5 n-m rules, match trap | 1911.9 | 3327.0 | 1.74x | | 5 n-m rules, goto non-chain| 2883.1 | 2947.5 | 1.02x | | 10 non-matching rules | 1466.3 | 3062.8 | 2.09x | | 10 n-m rules, match trap | 1444.3 | 3317.9 | 2.30x | | 10 n-m rules,goto non-chain| 2883.1 | 2939.5 | 1.02x | | 25 non-matching rules | 838.5 | 3058.9 | 3.65x | | 25 n-m rules, match trap | 824.5 | 3323.0 | 4.03x | | 25 n-m rules,goto non-chain| 2875.8 | 2944.7 | 1.02x | | 50 non-matching rules | 488.1 | 3054.7 | 6.26x | | 50 n-m rules, match trap | 484.9 | 3318.5 | 6.84x | | 50 n-m rules,goto non-chain| 2884.1 | 2939.7 | 1.02x | +----------------------------+--------+--------+--------+ perf top (25 n-m skip_sw rules - pre patch): 20.39% [kernel] [k] __skb_flow_dissect 16.43% [kernel] [k] rhashtable_jhash2 10.58% [kernel] [k] fl_classify 10.23% [kernel] [k] fl_mask_lookup 4.79% [kernel] [k] memset_orig 2.58% [kernel] [k] tcf_classify 1.47% [kernel] [k] __x86_indirect_thunk_rax 1.42% [kernel] [k] __dev_queue_xmit 1.36% [kernel] [k] nft_do_chain 1.21% [kernel] [k] __rcu_read_lock perf top (25 n-m skip_sw rules - post patch): 5.12% [kernel] [k] __dev_queue_xmit 4.77% [kernel] [k] nft_do_chain 3.65% [kernel] [k] dev_gro_receive 3.41% [kernel] [k] check_preemption_disabled 3.14% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_nonlinear 2.88% [kernel] [k] __netif_receive_skb_core.constprop.0 2.49% [kernel] [k] mlx5e_xmit 2.15% [kernel] [k] ip_forward 1.95% [kernel] [k] mlx5e_tc_restore_tunnel 1.92% [kernel] [k] vlan_gro_receive Test setup: DUT: Intel Xeon D-1518 (2.20GHz) w/ Nvidia/Mellanox ConnectX-6 Dx 2x100G Data rate measured on switch (Extreme X690), and DUT connected as a router on a stick, with pktgen and pktsink as VLANs. Pktgen-dpdk was in range 36.6-37.7 Mpps 64B packets across all tests. Full test data at https://files.fiberby.net/ast/2024/tc_skip_sw/v2_tests/ Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-29net: sched: cls_api: add filter counterAsbjørn Sloth Tønnesen
Maintain a count of filters per block. Counter updates are protected by cb_lock, which is also used to protect the offload counters. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-29net: sched: cls_api: add skip_sw counterAsbjørn Sloth Tønnesen
Maintain a count of skip_sw filters. This counter is protected by the cb_lock, and is updated at the same time as offloadcnt. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-28bpf: Remove CONFIG_X86 and CONFIG_DYNAMIC_FTRACE guard from the tcp-cc kfuncsMartin KaFai Lau
The commit 7aae231ac93b ("bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE") added CONFIG_DYNAMIC_FTRACE guard because pahole was only generating btf for ftrace-able functions. The ftrace filter had already been removed from pahole, so the CONFIG_DYNAMIC_FTRACE guard can be removed. The commit 569c484f9995 ("bpf: Limit static tcp-cc functions in the .BTF_ids list to x86") has added CONFIG_X86 guard because it failed the powerpc arch which prepended a "." to the local static function, so "cubictcp_init" becomes ".cubictcp_init". "__bpf_kfunc" has been added to kfunc since then and it uses the __unused compiler attribute. There is an existing "__bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused)" test in bpf_testmod.c to cover the static kfunc case. cross compile on ppc64 with CONFIG_DYNAMIC_FTRACE disabled: > readelf -s vmlinux | grep cubictcp_ 56938: c00000000144fd00 184 FUNC LOCAL DEFAULT 2 cubictcp_cwnd_event [<localentry>: 8] 56939: c00000000144fdb8 200 FUNC LOCAL DEFAULT 2 cubictcp_recalc_[...] [<localentry>: 8] 56940: c00000000144fe80 296 FUNC LOCAL DEFAULT 2 cubictcp_init [<localentry>: 8] 56941: c00000000144ffa8 228 FUNC LOCAL DEFAULT 2 cubictcp_state [<localentry>: 8] 56942: c00000000145008c 1908 FUNC LOCAL DEFAULT 2 cubictcp_cong_avoid [<localentry>: 8] 56943: c000000001450800 1644 FUNC LOCAL DEFAULT 2 cubictcp_acked [<localentry>: 8] > bpftool btf dump file vmlinux | grep cubictcp_ [51540] FUNC 'cubictcp_acked' type_id=38137 linkage=static [51541] FUNC 'cubictcp_cong_avoid' type_id=38122 linkage=static [51543] FUNC 'cubictcp_cwnd_event' type_id=51542 linkage=static [51544] FUNC 'cubictcp_init' type_id=9186 linkage=static [51545] FUNC 'cubictcp_recalc_ssthresh' type_id=35021 linkage=static [51547] FUNC 'cubictcp_state' type_id=38141 linkage=static The patch removed both config guards. Cc: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240322191433.4133280-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28bpf: add bpf_modify_return_test_tp() kfunc triggering tracepointAndrii Nakryiko
Add a simple bpf_modify_return_test_tp() kfunc, available to all program types, that is useful for various testing and benchmarking scenarios, as it allows to trigger most tracing BPF program types from BPF side, allowing to do complex testing and benchmarking scenarios. It is also attachable to for fmod_ret programs, making it a good and simple way to trigger fmod_ret program under test/benchmark. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28bpf: Add a check for struct bpf_fib_lookup sizeAnton Protopopov
The struct bpf_fib_lookup should not grow outside of its 64 bytes. Add a static assert to validate this. Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240326101742.17421-4-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28bpf: Add support for passing mark with bpf_fib_lookupAnton Protopopov
Extend the bpf_fib_lookup() helper by making it to utilize mark if the BPF_FIB_LOOKUP_MARK flag is set. In order to pass the mark the four bytes of struct bpf_fib_lookup are used, shared with the output-only smac/dmac fields. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240326101742.17421-2-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28net: remove gfp_mask from napi_alloc_skb()Jakub Kicinski
__napi_alloc_skb() is napi_alloc_skb() with the added flexibility of choosing gfp_mask. This is a NAPI function, so GFP_ATOMIC is implied. The only practical choice the caller has is whether to set __GFP_NOWARN. But that's a false choice, too, allocation failures in atomic context will happen, and printing warnings in logs, effectively for a packet drop, is both too much and very likely non-actionable. This leads me to a conclusion that most uses of napi_alloc_skb() are simply misguided, and should use __GFP_NOWARN in the first place. We also have a "standard" way of reporting allocation failures via the queue stat API (qstats::rx-alloc-fail). The direct motivation for this patch is that one of the drivers used at Meta calls napi_alloc_skb() (so prior to this patch without __GFP_NOWARN), and the resulting OOM warning is the top networking warning in our fleet. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240327040213.3153864-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28Merge tag 'nfsd-6.9-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Address three recently introduced regressions * tag 'nfsd-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies SUNRPC: Revert 561141dd494382217bace4d1a51d08168420eace nfsd: Fix error cleanup path in nfsd_rename()
2024-03-28inet: inet_defrag: prevent sk release while still in useFlorian Westphal
ip_local_out() and other functions can pass skb->sk as function argument. If the skb is a fragment and reassembly happens before such function call returns, the sk must not be released. This affects skb fragments reassembled via netfilter or similar modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline. Eric Dumazet made an initial analysis of this bug. Quoting Eric: Calling ip_defrag() in output path is also implying skb_orphan(), which is buggy because output path relies on sk not disappearing. A relevant old patch about the issue was : 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") [..] net/ipv4/ip_output.c depends on skb->sk being set, and probably to an inet socket, not an arbitrary one. If we orphan the packet in ipvlan, then downstream things like FQ packet scheduler will not work properly. We need to change ip_defrag() to only use skb_orphan() when really needed, ie whenever frag_list is going to be used. Eric suggested to stash sk in fragment queue and made an initial patch. However there is a problem with this: If skb is refragmented again right after, ip_do_fragment() will copy head->sk to the new fragments, and sets up destructor to sock_wfree. IOW, we have no choice but to fix up sk_wmem accouting to reflect the fully reassembled skb, else wmem will underflow. This change moves the orphan down into the core, to last possible moment. As ip_defrag_offset is aliased with sk_buff->sk member, we must move the offset into the FRAG_CB, else skb->sk gets clobbered. This allows to delay the orphaning long enough to learn if the skb has to be queued or if the skb is completing the reasm queue. In the former case, things work as before, skb is orphaned. This is safe because skb gets queued/stolen and won't continue past reasm engine. In the latter case, we will steal the skb->sk reference, reattach it to the head skb, and fix up wmem accouting when inet_frag inflates truesize. Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().") Diagnosed-by: Eric Dumazet <edumazet@google.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yue sun <samsun1006219@gmail.com> Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28Merge tag 'nf-24-03-28' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: Patch #1 reject destroy chain command to delete device hooks in netdev family, hence, only delchain commands are allowed. Patch #2 reject table flag update interference with netdev basechain hook updates, this can leave hooks in inconsistent registration/unregistration state. Patch #3 do not unregister netdev basechain hooks if table is dormant. Otherwise, splat with double unregistration is possible. Patch #4 fixes Kconfig to allow to restore IP_NF_ARPTABLES, from Kuniyuki Iwashima. There are a more fixes still in progress on my side that need more work. * tag 'nf-24-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c netfilter: nf_tables: skip netdev hook unregistration if table is dormant netfilter: nf_tables: reject table flag and netdev basechain updates netfilter: nf_tables: reject destroy command to remove basechain hooks ==================== Link: https://lore.kernel.org/r/20240328031855.2063-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.cKuniyuki Iwashima
syzkaller started to report a warning below [0] after consuming the commit 4654467dc7e1 ("netfilter: arptables: allow xtables-nft only builds"). The change accidentally removed the dependency on NETFILTER_FAMILY_ARP from IP_NF_ARPTABLES. If NF_TABLES_ARP is not enabled on Kconfig, NETFILTER_FAMILY_ARP will be removed and some code necessary for arptables will not be compiled. $ grep -E "(NETFILTER_FAMILY_ARP|IP_NF_ARPTABLES|NF_TABLES_ARP)" .config CONFIG_NETFILTER_FAMILY_ARP=y # CONFIG_NF_TABLES_ARP is not set CONFIG_IP_NF_ARPTABLES=y $ make olddefconfig $ grep -E "(NETFILTER_FAMILY_ARP|IP_NF_ARPTABLES|NF_TABLES_ARP)" .config # CONFIG_NF_TABLES_ARP is not set CONFIG_IP_NF_ARPTABLES=y So, when nf_register_net_hooks() is called for arptables, it will trigger the splat below. Now IP_NF_ARPTABLES is only enabled by IP_NF_ARPFILTER, so let's restore the dependency on NETFILTER_FAMILY_ARP in IP_NF_ARPFILTER. [0]: WARNING: CPU: 0 PID: 242 at net/netfilter/core.c:316 nf_hook_entry_head+0x1e1/0x2c0 net/netfilter/core.c:316 Modules linked in: CPU: 0 PID: 242 Comm: syz-executor.0 Not tainted 6.8.0-12821-g537c2e91d354 #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:nf_hook_entry_head+0x1e1/0x2c0 net/netfilter/core.c:316 Code: 83 fd 04 0f 87 bc 00 00 00 e8 5b 84 83 fd 4d 8d ac ec a8 0b 00 00 e8 4e 84 83 fd 4c 89 e8 5b 5d 41 5c 41 5d c3 e8 3f 84 83 fd <0f> 0b e8 38 84 83 fd 45 31 ed 5b 5d 4c 89 e8 41 5c 41 5d c3 e8 26 RSP: 0018:ffffc90000b8f6e8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff83c42164 RDX: ffff888106851180 RSI: ffffffff83c42321 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000005 R09: 000000000000000a R10: 0000000000000003 R11: ffff8881055c2f00 R12: ffff888112b78000 R13: 0000000000000000 R14: ffff8881055c2f00 R15: ffff8881055c2f00 FS: 00007f377bd78800(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000496068 CR3: 000000011298b003 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> __nf_register_net_hook+0xcd/0x7a0 net/netfilter/core.c:428 nf_register_net_hook+0x116/0x170 net/netfilter/core.c:578 nf_register_net_hooks+0x5d/0xc0 net/netfilter/core.c:594 arpt_register_table+0x250/0x420 net/ipv4/netfilter/arp_tables.c:1553 arptable_filter_table_init+0x41/0x60 net/ipv4/netfilter/arptable_filter.c:39 xt_find_table_lock+0x2e9/0x4b0 net/netfilter/x_tables.c:1260 xt_request_find_table_lock+0x2b/0xe0 net/netfilter/x_tables.c:1285 get_info+0x169/0x5c0 net/ipv4/netfilter/arp_tables.c:808 do_arpt_get_ctl+0x3f9/0x830 net/ipv4/netfilter/arp_tables.c:1444 nf_getsockopt+0x76/0xd0 net/netfilter/nf_sockopt.c:116 ip_getsockopt+0x17d/0x1c0 net/ipv4/ip_sockglue.c:1777 tcp_getsockopt+0x99/0x100 net/ipv4/tcp.c:4373 do_sock_getsockopt+0x279/0x360 net/socket.c:2373 __sys_getsockopt+0x115/0x1e0 net/socket.c:2402 __do_sys_getsockopt net/socket.c:2412 [inline] __se_sys_getsockopt net/socket.c:2409 [inline] __x64_sys_getsockopt+0xbd/0x150 net/socket.c:2409 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x7f377beca6fe Code: 1f 44 00 00 48 8b 15 01 97 0a 00 f7 d8 64 89 02 b8 ff ff ff ff eb b8 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 c9 RSP: 002b:00000000005df728 EFLAGS: 00000246 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 00000000004966e0 RCX: 00007f377beca6fe RDX: 0000000000000060 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 000000000042938a R08: 00000000005df73c R09: 00000000005df800 R10: 00000000004966e8 R11: 0000000000000246 R12: 0000000000000003 R13: 0000000000496068 R14: 0000000000000003 R15: 00000000004bc9d8 </TASK> Fixes: 4654467dc7e1 ("netfilter: arptables: allow xtables-nft only builds") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-28netfilter: nf_tables: skip netdev hook unregistration if table is dormantPablo Neira Ayuso
Skip hook unregistration when adding or deleting devices from an existing netdev basechain. Otherwise, commit/abort path try to unregister hooks which not enabled. Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Fixes: 7d937b107108 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-28netfilter: nf_tables: reject table flag and netdev basechain updatesPablo Neira Ayuso
netdev basechain updates are stored in the transaction object hook list. When setting on the table dormant flag, it iterates over the existing hooks in the basechain. Thus, skipping the hooks that are being added/deleted in this transaction, which leaves hook registration in inconsistent state. Reject table flag updates in combination with netdev basechain updates in the same batch: - Update table flags and add/delete basechain: Check from basechain update path if there are pending flag updates for this table. - add/delete basechain and update table flags: Iterate over the transaction list to search for basechain updates from the table update path. In both cases, the batch is rejected. Based on suggestion from Florian Westphal. Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Fixes: 7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-28netfilter: nf_tables: reject destroy command to remove basechain hooksPablo Neira Ayuso
Report EOPNOTSUPP if NFT_MSG_DESTROYCHAIN is used to delete hooks in an existing netdev basechain, thus, only NFT_MSG_DELCHAIN is allowed. Fixes: 7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-27Merge tag 'wireless-2024-03-27' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.9-rc2 The first fixes for v6.9. Ping-Ke Shih now maintains a separate tree for Realtek drivers, document that in the MAINTAINERS. Plenty of fixes for both to stack and iwlwifi. Our kunit tests were working only on um architecture but that's fixed now. * tag 'wireless-2024-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (21 commits) MAINTAINERS: wifi: mwifiex: add Francesco as reviewer kunit: fix wireless test dependencies wifi: iwlwifi: mvm: include link ID when releasing frames wifi: iwlwifi: mvm: handle debugfs names more carefully wifi: iwlwifi: mvm: guard against invalid STA ID on removal wifi: iwlwifi: read txq->read_ptr under lock wifi: iwlwifi: fw: don't always use FW dump trig wifi: iwlwifi: mvm: rfi: fix potential response leaks wifi: mac80211: correctly set active links upon TTLM wifi: iwlwifi: mvm: Configure the link mapping for non-MLD FW wifi: iwlwifi: mvm: consider having one active link wifi: iwlwifi: mvm: pick the version of SESSION_PROTECTION_NOTIF wifi: mac80211: fix prep_connection error path wifi: cfg80211: fix rdev_dump_mpp() arguments order wifi: iwlwifi: mvm: disable MLO for the time being wifi: cfg80211: add a flag to disable wireless extensions wifi: mac80211: fix ieee80211_bss_*_flags kernel-doc wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes wifi: mac80211: fix mlme_link_id_dbg() MAINTAINERS: wifi: add git tree for Realtek WiFi drivers ... ==================== Link: https://lore.kernel.org/r/20240327191346.1A1EAC433C7@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-27Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-03-25 We've added 38 non-merge commits during the last 13 day(s) which contain a total of 50 files changed, 867 insertions(+), 274 deletions(-). The main changes are: 1) Add the ability to specify and retrieve BPF cookie also for raw tracepoint programs in order to ease migration from classic to raw tracepoints, from Andrii Nakryiko. 2) Allow the use of bpf_get_{ns_,}current_pid_tgid() helper for all program types and add additional BPF selftests, from Yonghong Song. 3) Several improvements to bpftool and its build, for example, enabling libbpf logs when loading pid_iter in debug mode, from Quentin Monnet. 4) Check the return code of all BPF-related set_memory_*() functions during load and bail out in case they fail, from Christophe Leroy. 5) Avoid a goto in regs_refine_cond_op() such that the verifier can be better integrated into Agni tool which doesn't support backedges yet, from Harishankar Vishwanathan. 6) Add a small BPF trie perf improvement by always inlining longest_prefix_match, from Jesper Dangaard Brouer. 7) Small BPF selftest refactor in bpf_tcp_ca.c to utilize start_server() helper instead of open-coding it, from Geliang Tang. 8) Improve test_tc_tunnel.sh BPF selftest to prevent client connect before the server bind, from Alessandro Carminati. 9) Fix BPF selftest benchmark for older glibc and use syscall(SYS_gettid) instead of gettid(), from Alan Maguire. 10) Implement a backward-compatible method for struct_ops types with additional fields which are not present in older kernels, from Kui-Feng Lee. 11) Add a small helper to check if an instruction is addr_space_cast from as(0) to as(1) and utilize it in x86-64 JIT, from Puranjay Mohan. 12) Small cleanup to remove unnecessary error check in bpf_struct_ops_map_update_elem, from Martin KaFai Lau. 13) Improvements to libbpf fd validity checks for BPF map/programs, from Mykyta Yatsenko. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (38 commits) selftests/bpf: Fix flaky test btf_map_in_map/lookup_update bpf: implement insn_is_cast_user() helper for JITs bpf: Avoid get_kernel_nofault() to fetch kprobe entry IP selftests/bpf: Use start_server in bpf_tcp_ca bpf: Sync uapi bpf.h to tools directory libbpf: Add new sec_def "sk_skb/verdict" selftests/bpf: Mark uprobe trigger functions with nocf_check attribute selftests/bpf: Use syscall(SYS_gettid) instead of gettid() wrapper in bench bpf-next: Avoid goto in regs_refine_cond_op() bpftool: Clean up HOST_CFLAGS, HOST_LDFLAGS for bootstrap bpftool selftests/bpf: scale benchmark counting by using per-CPU counters bpftool: Remove unnecessary source files from bootstrap version bpftool: Enable libbpf logs when loading pid_iter in debug mode selftests/bpf: add raw_tp/tp_btf BPF cookie subtests libbpf: add support for BPF cookie for raw_tp/tp_btf programs bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs bpf: pass whole link instead of prog when triggering raw tracepoint bpf: flatten bpf_probe_register call chain selftests/bpf: Prevent client connect before server bind in test_tc_tunnel.sh selftests/bpf: Add a sk_msg prog bpf_get_ns_current_pid_tgid() test ... ==================== Link: https://lore.kernel.org/r/20240325233940.7154-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-26tls: get psock ref after taking rxlock to avoid leakSabrina Dubroca
At the start of tls_sw_recvmsg, we take a reference on the psock, and then call tls_rx_reader_lock. If that fails, we return directly without releasing the reference. Instead of adding a new label, just take the reference after locking has succeeded, since we don't need it before. Fixes: 4cbc325ed6b4 ("tls: rx: allow only one reader at a time") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/fe2ade22d030051ce4c3638704ed58b67d0df643.1711120964.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-26tls: adjust recv return with async crypto and failed copy to userspaceSabrina Dubroca
process_rx_list may not copy as many bytes as we want to the userspace buffer, for example in case we hit an EFAULT during the copy. If this happens, we should only count the bytes that were actually copied, which may be 0. Subtracting async_copy_bytes is correct in both peek and !peek cases, because decrypted == async_copy_bytes + peeked for the peek case: peek is always !ZC, and we can go through either the sync or async path. In the async case, we add chunk to both decrypted and async_copy_bytes. In the sync case, we add chunk to both decrypted and peeked. I missed that in commit 6caaf104423d ("tls: fix peeking with sync+async decryption"). Fixes: 4d42cd6bc2ac ("tls: rx: fix return value for async crypto") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1b5a1eaab3c088a9dd5d9f1059ceecd7afe888d1.1711120964.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-26tls: recv: process_rx_list shouldn't use an offset with kvecSabrina Dubroca
Only MSG_PEEK needs to copy from an offset during the final process_rx_list call, because the bytes we copied at the beginning of tls_sw_recvmsg were left on the rx_list. In the KVEC case, we removed data from the rx_list as we were copying it, so there's no need to use an offset, just like in the normal case. Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/e5487514f828e0347d2b92ca40002c62b58af73d.1711120964.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-26net: pin system percpu page_pools to the corresponding NUMA nodesAlexander Lobakin
System page_pools are percpu and one instance can be used only on one CPU. %NUMA_NO_NODE is fine for allocating pages, as the PP core always allocates local pages in this case. But for the struct &page_pool itself, this node ID means they are allocated on the boot CPU, which may belong to a different node than the target CPU. Pin system page_pools to the corresponding nodes when creating, so that all the allocated data will always be local. Use cpu_to_mem() to account memless nodes. Nodes != 0 win some Kpps when testing with xdp-trafficgen. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20240325160635.3215855-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-26net: remove skb_free_datagram_locked()Eric Dumazet
Last user of skb_free_datagram_locked() went away in 2016 with commit 850cbaddb52d ("udp: use it's own memory accounting schema"). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240325134155.620531-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26net: Rename rps_lock to backlog_lock.Sebastian Andrzej Siewior
The rps_lock.*() functions use the inner lock of a sk_buff_head for locking. This lock is used if RPS is enabled, otherwise the list is accessed lockless and disabling interrupts is enough for the synchronisation because it is only accessed CPU local. Not only the list is protected but also the NAPI state protected. With the addition of backlog threads, the lock is also needed because of the cross CPU access even without RPS. The clean up of the defer_list list is also done via backlog threads (if enabled). It has been suggested to rename the locking function since it is no longer just RPS. Rename the rps_lock*() functions to backlog_lock*(). Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26net: Use backlog-NAPI to clean up the defer_list.Sebastian Andrzej Siewior
The defer_list is a per-CPU list which is used to free skbs outside of the socket lock and on the CPU on which they have been allocated. The list is processed during NAPI callbacks so ideally the list is cleaned up. Should the amount of skbs on the list exceed a certain water mark then the softirq is triggered remotely on the target CPU by invoking a remote function call. The raise of the softirqs via a remote function call leads to waking the ksoftirqd on PREEMPT_RT which is undesired. The backlog-NAPI threads already provide the infrastructure which can be utilized to perform the cleanup of the defer_list. The NAPI state is updated with the input_pkt_queue.lock acquired. It order not to break the state, it is needed to also wake the backlog-NAPI thread with the lock held. This requires to acquire the use the lock in rps_lock_irq*() if the backlog-NAPI threads are used even with RPS disabled. Move the logic of remotely starting softirqs to clean up the defer_list into kick_defer_list_purge(). Make sure a lock is held in rps_lock_irq*() if backlog-NAPI threads are used. Schedule backlog-NAPI for defer_list cleanup if backlog-NAPI is available. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26net: Allow to use SMP threads for backlog NAPI.Sebastian Andrzej Siewior
Backlog NAPI is a per-CPU NAPI struct only (with no device behind it) used by drivers which don't do NAPI them self, RPS and parts of the stack which need to avoid recursive deadlocks while processing a packet. The non-NAPI driver use the CPU local backlog NAPI. If RPS is enabled then a flow for the skb is computed and based on the flow the skb can be enqueued on a remote CPU. Scheduling/ raising the softirq (for backlog's NAPI) on the remote CPU isn't trivial because the softirq is only scheduled on the local CPU and performed after the hardirq is done. In order to schedule a softirq on the remote CPU, an IPI is sent to the remote CPU which schedules the backlog-NAPI on the then local CPU. On PREEMPT_RT interrupts are force-threaded. The soft interrupts are raised within the interrupt thread and processed after the interrupt handler completed still within the context of the interrupt thread. The softirq is handled in the context where it originated. With force-threaded interrupts enabled, ksoftirqd is woken up if a softirq is raised from hardirq context. This is the case if it is raised from an IPI. Additionally there is a warning on PREEMPT_RT if the softirq is raised from the idle thread. This was done for two reasons: - With threaded interrupts the processing should happen in thread context (where it originated) and ksoftirqd is the only thread for this context if raised from hardirq. Using the currently running task instead would "punish" a random task. - Once ksoftirqd is active it consumes all further softirqs until it stops running. This changed recently and is no longer the case. Instead of keeping the backlog NAPI in ksoftirqd (in force-threaded/ PREEMPT_RT setups) I am proposing NAPI-threads for backlog. The "proper" setup with threaded-NAPI is not doable because the threads are not pinned to an individual CPU and can be modified by the user. Additionally a dummy network device would have to be assigned. Also CPU-hotplug has to be considered if additional CPUs show up. All this can be probably done/ solved but the smpboot-threads already provide this infrastructure. Sending UDP packets over loopback expects that the packet is processed within the call. Delaying it by handing it over to the thread hurts performance. It is not beneficial to the outcome if the context switch happens immediately after enqueue or after a while to process a few packets in a batch. There is no need to always use the thread if the backlog NAPI is requested on the local CPU. This restores the loopback throuput. The performance drops mostly to the same value after enabling RPS on the loopback comparing the IPI and the tread result. Create NAPI-threads for backlog if request during boot. The thread runs the inner loop from napi_threaded_poll(), the wait part is different. It checks for NAPI_STATE_SCHED (the backlog NAPI can not be disabled). The NAPI threads for backlog are optional, it has to be enabled via the boot argument "thread_backlog_napi". It is mandatory for PREEMPT_RT to avoid the wakeup of ksoftirqd from the IPI. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26net: Remove conditional threaded-NAPI wakeup based on task state.Sebastian Andrzej Siewior
A NAPI thread is scheduled by first setting NAPI_STATE_SCHED bit. If successful (the bit was not yet set) then the NAPI_STATE_SCHED_THREADED is set but only if thread's state is not TASK_INTERRUPTIBLE (is TASK_RUNNING) followed by task wakeup. If the task is idle (TASK_INTERRUPTIBLE) then the NAPI_STATE_SCHED_THREADED bit is not set. The thread is no relying on the bit but always leaving the wait-loop after returning from schedule() because there must have been a wakeup. The smpboot-threads implementation for per-CPU threads requires an explicit condition and does not support "if we get out of schedule() then there must be something to do". Removing this optimisation simplifies the following integration. Set NAPI_STATE_SCHED_THREADED unconditionally on wakeup and rely on it in the wait path by removing the `woken' condition. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-25tcp: properly terminate timers for kernel socketsEric Dumazet
We had various syzbot reports about tcp timers firing after the corresponding netns has been dismantled. Fortunately Josef Bacik could trigger the issue more often, and could test a patch I wrote two years ago. When TCP sockets are closed, we call inet_csk_clear_xmit_timers() to 'stop' the timers. inet_csk_clear_xmit_timers() can be called from any context, including when socket lock is held. This is the reason it uses sk_stop_timer(), aka del_timer(). This means that ongoing timers might finish much later. For user sockets, this is fine because each running timer holds a reference on the socket, and the user socket holds a reference on the netns. For kernel sockets, we risk that the netns is freed before timer can complete, because kernel sockets do not hold reference on the netns. This patch adds inet_csk_clear_xmit_timers_sync() function that using sk_stop_timer_sync() to make sure all timers are terminated before the kernel socket is released. Modules using kernel sockets close them in their netns exit() handler. Also add sock_not_owned_by_me() helper to get LOCKDEP support : inet_csk_clear_xmit_timers_sync() must not be called while socket lock is held. It is very possible we can revert in the future commit 3a58f13a881e ("net: rds: acquire refcount on TCP sockets") which attempted to solve the issue in rds only. (net/smc/af_smc.c and net/mptcp/subflow.c have similar code) We probably can remove the check_net() tests from tcp_out_of_resources() and __tcp_close() in the future. Reported-by: Josef Bacik <josef@toxicpanda.com> Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/ Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Fixes: 8a68173691f0 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket") Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Josef Bacik <josef@toxicpanda.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/20240322135732.1535772-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25net: hsr: hsr_slave: Fix the promiscuous mode in offload modeRavi Gunasekaran
commit e748d0fd66ab ("net: hsr: Disable promiscuous mode in offload mode") disables promiscuous mode of slave devices while creating an HSR interface. But while deleting the HSR interface, it does not take care of it. It decreases the promiscuous mode count, which eventually enables promiscuous mode on the slave devices when creating HSR interface again. Fix this by not decrementing the promiscuous mode count while deleting the HSR interface when offload is enabled. Fixes: e748d0fd66ab ("net: hsr: Disable promiscuous mode in offload mode") Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240322100447.27615-1-r-gunasekaran@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25net: mark racy access on sk->sk_rcvbuflinke li
sk->sk_rcvbuf in __sock_queue_rcv_skb() and __sk_receive_skb() can be changed by other threads. Mark this as benign using READ_ONCE(). This patch is aimed at reducing the number of benign races reported by KCSAN in order to focus future debugging effort on harmful races. Signed-off-by: linke li <lilinke99@qq.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-25net: rfkill: gpio: Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://msgid.link/20240306183538.88777-2-u.kleine-koenig@pengutronix.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: use kvcalloc() for codel varsJohannes Berg
This is a big array, but it's only used by software and need not be contiguous in memory. Use kvcalloc() since it's so big (order 5 allocation). Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240325150509.9195643699e4.I1b94b17abc809491080d6312f31ce6b5decdd446@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: reactivate multi-link later in restartJohannes Berg
In case of restart, we currently reactivate multi-link on interfaces before reconfiguring keys etc. which means the drivers need to handle this case differently. Enable more links later to allow them to handle it the same way. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.d0f18a56335d.Ib3338d93872a4a568f38db0d02546534d3eff810@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: improve drop for action frame returnJohannes Berg
If we use a drop we not only save the extra call to dev_kfree_skb(), but also have a better reason in tracing, so do that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.34daf0a89eb4.I60e0639511f9de64e40e6105b640adf90f8f57f7@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: don't ask driver about no-op link changesJohannes Berg
If the links won't actually change, nothing will happen. This was previously done in the inner function (twice in some cases), but we shouldn't bother the driver with it. Clean that up. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.a8190a312a27.If4e6f5ce8228eda7afac0fc8c17dd731c5da9ed9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: don't select link ID if not provided in scan requestAyala Beker
If scan request doesn't include a link ID to be used for TSF reporting, don't select it as it might become inactive before scan is actually started by the driver. Instead, let the driver select one of the active links. Fixes: cbde0b49f276 ("wifi: mac80211: Extend support for scanning while MLO connected") Signed-off-by: Ayala Beker <ayala.beker@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.a6b643a15755.Ic28ed9a611432387b7f85e9ca9a97a4ce34a6e0f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: clarify IEEE80211_STATUS_SUBDATA_MASKJohannes Berg
We have 13 bits for the status_data, so restricting type to 4 and subdata to 8 bits is confusing, even if we don't need more bits now. Change subdata mask to be 9 bits instead, just to make things match up. If we actually need more types or more subdata bits we can later also reshuffle the bits between these, but we should probably keep them at 13 bits together. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.28ac7b665039.I1abbb13e90f016cab552492e05f5cb5b52de6463@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: don't enter idle during link switchJohannes Berg
When doing link switch with a disjoint set of links before and after the switch, we end up removing all channel contexts, adding new ones later. This looks like 'idle' to the code now, and we enter idle which also includes flushing queues. But we can't actually flush since we don't have a link active (bound to a channel context), and entering idle just to leave it again is also wrong. Fix this by passing through an indication that we shouldn't do any idle checks in this case. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.170328bac555.If4a522a9dd3133b91983854b909a4de13aa635da@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: add support for tearing down negotiated TTLMAyala Beker
In order to activate a link that is currently inactive due to a negotiated TTLM request, need to first tear down the negotiated TTLM request. Add support for sending TTLM teardown request and update the links state accordingly. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.d480cbf46fcf.Idedad472469d2c27dd2a088cf80a13a1e1cf9b78@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: cfg80211: ignore non-TX BSSs in per-STA profileBenjamin Berg
If a non-TX BSS is included in a per-STA profile, then we cannot set transmitted_bss for it. Even worse, if we do things properly we should be configuring both bssid_index and max_bssid_indicator correctly. We do not actually have both pieces of information (and, some APs currently do not include either). So, ignore any per-STA profile where the RNR says that the BSS is not transmitted. Also fix transmitted_bss to never be set for per-STA profiles. This fixes issues where mac80211 was setting the reference BSSID to an incorrect value. Fixes: 2481b5da9c6b ("wifi: cfg80211: handle BSS data contained in ML probe responses") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.6a0babed655a.Iad447fea417c63f683da793556b97c31d07a4aab@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: cfg80211: check BSSID Index against MaxBSSIDBenjamin Berg
Add a verification that the BSSID Index does not exceed the maximum number of BSSIDs in the Multiple-BSSID set. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.a7574d415adc.I02f40c2920a9f602898190679cc27d0c8ee2c67d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: improve association error reporting slightlyBenjamin Berg
There is no reason to check the request flags for each of the links, so pull that out of the loop. Also, within the loop we can set the per-link error everywhere. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.695faa9be279.I71b11a8d66a9cae4c27e242a47d1d92922609b03@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: add flag to disallow puncturing in 5 GHzJohannes Berg
Some devices may not be capable of handling puncturing in 5 GHz only (vs. the current flag that just removes puncturing support completely). Add a flag to support such devices: check and then downgrade the channel width if needed. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.49759510da7d.I12c5a61f0be512e0c4e574c2f794ef4b37ecaf6b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: cfg80211: handle indoor AFC/LPI AP in probe response and beaconAnjaneyulu
Mark Indoor LPI and Indoor AFC power types as valid based on channel flags. While on it, added default case. Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.091cfaaa5f45.I23cfa1104a16fd4eb9751b3d0d7b158db4ff3ecd@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: handle indoor AFC/LPI AP on assoc successAnjaneyulu
Update power_type in bss_conf based on Indoor AFC and LPI power types received in HE 6 GHz operation element on assoc success. Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.89c25dae34ff.Ifd8b2983f400623ac03dc032fc9a20025c9ca365@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>