linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-04-22	net: openvswitch: Check vport netdev name	Jun Gu
	Ensure that the provided netdev name is not one of its aliases to prevent unnecessary creation and destruction of the vport by ovs-vswitchd. Signed-off-by: Jun Gu <jun.gu@easystack.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20240419061425.132723-1-jun.gu@easystack.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22	netfilter: nfnetlink: Handle ACK flags for batch messages	Donald Hunter
	The NLM_F_ACK flag is ignored for nfnetlink batch begin and end messages. This is a problem for ynl which wants to receive an ack for every message it sends, not just the commands in between the begin/end messages. Add processing for ACKs for begin/end messages and provide responses when requested. I have checked that iproute2, pyroute2 and systemd are unaffected by this change since none of them use NLM_F_ACK for batch begin/end. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-5-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22	Merge branch 'for-uring-ubufops' into HEAD	Jakub Kicinski
	Pavel Begunkov says: ==================== implement io_uring notification (ubuf_info) stacking (net part) To have per request buffer notifications each zerocopy io_uring send request allocates a new ubuf_info. However, as an skb can carry only one uarg, it may force the stack to create many small skbs hurting performance in many ways. The patchset implements notification, i.e. an io_uring's ubuf_info extension, stacking. It attempts to link ubuf_info's into a list, allowing to have multiple of them per skb. liburing/examples/send-zerocopy shows up 6 times performance improvement for TCP with 4KB bytes per send, and levels it with MSG_ZEROCOPY. Without the patchset it requires much larger sends to utilise all potential. bytes \| before \| after (Kqps) 1200 \| 195 \| 1023 4000 \| 193 \| 1386 8000 \| 154 \| 1058 ==================== Link: https://lore.kernel.org/all/cover.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22	net: add callback for setting a ubuf_info to skb	Pavel Begunkov
	At the moment an skb can only have one ubuf_info associated with it, which might be a performance problem for zerocopy sends in cases like TCP via io_uring. Add a callback for assigning ubuf_info to skb, this way we will implement smarter assignment later like linking ubuf_info together. Note, it's an optional callback, which should be compatible with skb_zcopy_set(), that's because the net stack might potentially decide to clone an skb and take another reference to ubuf_info whenever it wishes. Also, a correct implementation should always be able to bind to an skb without prior ubuf_info, otherwise we could end up in a situation when the send would not be able to progress. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/b7918aadffeb787c84c9e72e34c729dc04f3a45d.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22	net: extend ubuf_info callback to ops structure	Pavel Begunkov
	We'll need to associate additional callbacks with ubuf_info, introduce a structure holding ubuf_info callbacks. Apart from a more smarter io_uring notification management introduced in next patches, it can be used to generalise msg_zerocopy_put_abort() and also store ->sg_from_iter, which is currently passed in struct msghdr. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/a62015541de49c0e2a8a0377a1d5d0a5aeb07016.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22	tcp: try to send bigger TSO packets	Eric Dumazet
	While investigating TCP performance, I found that TCP would sometimes send big skbs followed by a single MSS skb, in a 'locked' pattern. For instance, BIG TCP is enabled, MSS is set to have 4096 bytes of payload per segment. gso_max_size is set to 181000. This means that an optimal TCP packet size should contain 44 * 4096 = 180224 bytes of payload, However, I was seeing packets sizes interleaved in this pattern: 172032, 8192, 172032, 8192, 172032, 8192, <repeat> tcp_tso_should_defer() heuristic is defeated, because after a split of a packet in write queue for whatever reason (this might be a too small CWND or a small enough pacing_rate), the leftover packet in the queue is smaller than the optimal size. It is time to try to make 'leftover packets' bigger so that tcp_tso_should_defer() can give its full potential. After this patch, we can see the following output: 14:13:34.009273 IP6 sender > receiver: Flags [P.], seq 4048380:4098360, ack 1, win 256, options [nop,nop,TS val 3425678144 ecr 1561784500], length 49980 14:13:34.010272 IP6 sender > receiver: Flags [P.], seq 4098360:4148340, ack 1, win 256, options [nop,nop,TS val 3425678145 ecr 1561784501], length 49980 14:13:34.011271 IP6 sender > receiver: Flags [P.], seq 4148340:4198320, ack 1, win 256, options [nop,nop,TS val 3425678146 ecr 1561784502], length 49980 14:13:34.012271 IP6 sender > receiver: Flags [P.], seq 4198320:4248300, ack 1, win 256, options [nop,nop,TS val 3425678147 ecr 1561784503], length 49980 14:13:34.013272 IP6 sender > receiver: Flags [P.], seq 4248300:4298280, ack 1, win 256, options [nop,nop,TS val 3425678148 ecr 1561784504], length 49980 14:13:34.014271 IP6 sender > receiver: Flags [P.], seq 4298280:4348260, ack 1, win 256, options [nop,nop,TS val 3425678149 ecr 1561784505], length 49980 14:13:34.015272 IP6 sender > receiver: Flags [P.], seq 4348260:4398240, ack 1, win 256, options [nop,nop,TS val 3425678150 ecr 1561784506], length 49980 14:13:34.016270 IP6 sender > receiver: Flags [P.], seq 4398240:4448220, ack 1, win 256, options [nop,nop,TS val 3425678151 ecr 1561784507], length 49980 14:13:34.017269 IP6 sender > receiver: Flags [P.], seq 4448220:4498200, ack 1, win 256, options [nop,nop,TS val 3425678152 ecr 1561784508], length 49980 14:13:34.018276 IP6 sender > receiver: Flags [P.], seq 4498200:4548180, ack 1, win 256, options [nop,nop,TS val 3425678153 ecr 1561784509], length 49980 14:13:34.019259 IP6 sender > receiver: Flags [P.], seq 4548180:4598160, ack 1, win 256, options [nop,nop,TS val 3425678154 ecr 1561784510], length 49980 With 200 concurrent flows on a 100Gbit NIC, we can see a reduction of TSO packets (and ACK packets) of about 30 %. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240418214600.1291486-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22	tcp: call tcp_set_skb_tso_segs() from tcp_write_xmit()	Eric Dumazet
	tcp_write_xmit() calls tcp_init_tso_segs() to set gso_size and gso_segs on the packet. tcp_init_tso_segs() requires the stack to maintain an up to date tcp_skb_pcount(), and this makes sense for packets in rtx queue. Not so much for packets still in the write queue. In the following patch, we don't want to deal with tcp_skb_pcount() when moving payload from 2nd skb to 1st skb in the write queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240418214600.1291486-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22	tcp: remove dubious FIN exception from tcp_cwnd_test()	Eric Dumazet
	tcp_cwnd_test() has a special handing for the last packet in the write queue if it is smaller than one MSS and has the FIN flag. This is in violation of TCP RFC, and seems quite dubious. This packet can be sent only if the current CWND is bigger than the number of packets in flight. Making tcp_cwnd_test() result independent of the first skb in the write queue is needed for the last patch of the series. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240418214600.1291486-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22	devlink: extend devlink_param *set pointer	Mateusz Polchlopek
	Extend devlink_param *set function pointer to take extack as a param. Sometimes it is needed to pass information to the end user from set function. It is more proper to use for that netlink instead of passing message to dmesg. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-22	Merge tag 'nfsd-6.9-4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix an NFS/RDMA performance regression in v6.9-rc * tag 'nfsd-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "svcrdma: Add Write chunk WRs to the RPC's Send WR chain"
2024-04-22	xdp: use flags field to disambiguate broadcast redirect	Toke Høiland-Jørgensen
	When redirecting a packet using XDP, the bpf_redirect_map() helper will set up the redirect destination information in struct bpf_redirect_info (using the __bpf_xdp_redirect_map() helper function), and the xdp_do_redirect() function will read this information after the XDP program returns and pass the frame on to the right redirect destination. When using the BPF_F_BROADCAST flag to do multicast redirect to a whole map, __bpf_xdp_redirect_map() sets the 'map' pointer in struct bpf_redirect_info to point to the destination map to be broadcast. And xdp_do_redirect() reacts to the value of this map pointer to decide whether it's dealing with a broadcast or a single-value redirect. However, if the destination map is being destroyed before xdp_do_redirect() is called, the map pointer will be cleared out (by bpf_clear_redirect_map()) without waiting for any XDP programs to stop running. This causes xdp_do_redirect() to think that the redirect was to a single target, but the target pointer is also NULL (since broadcast redirects don't have a single target), so this causes a crash when a NULL pointer is passed to dev_map_enqueue(). To fix this, change xdp_do_redirect() to react directly to the presence of the BPF_F_BROADCAST flag in the 'flags' value in struct bpf_redirect_info to disambiguate between a single-target and a broadcast redirect. And only read the 'map' pointer if the broadcast flag is set, aborting if that has been cleared out in the meantime. This prevents the crash, while keeping the atomic (cmpxchg-based) clearing of the map pointer itself, and without adding any more checks in the non-broadcast fast path. Fixes: e624d4ed4aa8 ("xdp: Extend xdp_redirect_map with broadcast support") Reported-and-tested-by: syzbot+af9492708df9797198d6@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20240418071840.156411-1-toke@redhat.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-22	bridge/br_netlink.c: no need to return void function	Hangbin Liu
	br_info_notify is a void function. There is no need to return. Fixes: b6d0425b816e ("bridge: cfm: Netlink Notifications.") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22	tcp: do not export tcp_twsk_purge()	Eric Dumazet
	After commit 1eeb50435739 ("tcp/dccp: do not care about families in inet_twsk_purge()") tcp_twsk_purge() is no longer potentially called from a module. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22	icmp: prevent possible NULL dereferences from icmp_build_probe()	Eric Dumazet
	First problem is a double call to __in_dev_get_rcu(), because the second one could return NULL. if (__in_dev_get_rcu(dev) && __in_dev_get_rcu(dev)->ifa_list) Second problem is a read from dev->ip6_ptr with no NULL check: if (!list_empty(&rcu_dereference(dev->ip6_ptr)->addr_list)) Use the correct RCU API to fix these. v2: add missing include <net/addrconf.h> Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Andreas Roeseler <andreas.a.roeseler@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22	sysctl: treewide: constify ctl_table_header::ctl_table_arg	Thomas Weißschuh
	To be able to constify instances of struct ctl_tables it is necessary to remove ways through which non-const versions are exposed from the sysctl core. One of these is the ctl_table_arg member of struct ctl_table_header. Constify this reference as a prerequisite for the full constification of struct ctl_table instances. No functional change. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-20	Revert "svcrdma: Add Write chunk WRs to the RPC's Send WR chain"	Chuck Lever
	Performance regression reported with NFS/RDMA using Omnipath, bisected to commit e084ee673c77 ("svcrdma: Add Write chunk WRs to the RPC's Send WR chain"). Tracing on the server reports: nfsd-7771 [060] 1758.891809: svcrdma_sq_post_err: cq.id=205 cid=226 sc_sq_avail=13643/851 status=-12 sq_post_err reports ENOMEM, and the rdma->sc_sq_avail (13643) is larger than rdma->sc_sq_depth (851). The number of available Send Queue entries is always supposed to be smaller than the Send Queue depth. That seems like a Send Queue accounting bug in svcrdma. As it's getting to be late in the 6.9-rc cycle, revert this commit. It can be revisited in a subsequent kernel release. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218743 Fixes: e084ee673c77 ("svcrdma: Add Write chunk WRs to the RPC's Send WR chain") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-19	udp: preserve the connected status if only UDP cmsg	Yick Xie
	If "udp_cmsg_send()" returned 0 (i.e. only UDP cmsg), "connected" should not be set to 0. Otherwise it stops the connected socket from using the cached route. Fixes: 2e8de8576343 ("udp: add gso segment cmsg") Signed-off-by: Yick Xie <yick.xie@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240418170610.867084-1-yick.xie@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19	neighbour: no longer hold RTNL in neigh_dump_info()	Eric Dumazet
	neigh_dump_table() is already relying on RCU protection. pneigh_dump_table() is using its own protection (tbl->lock) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	neighbour: fix neigh_dump_info() return value	Eric Dumazet
	Change neigh_dump_table() and pneigh_dump_table() to either return 0 or -EMSGSIZE if not enough space was available in the skb. Then neigh_dump_info() can do the same. This allows NLMSG_DONE to be appended to the current skb at the end of a dump, saving a couple of recvmsg() system calls. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	neighbour: add RCU protection to neigh_tables[]	Eric Dumazet
	In order to remove RTNL protection from neightbl_dump_info() and neigh_dump_info() later, we need to add RCU protection to neigh_tables[]. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net: rps: locklessly access rflow->cpu	Jason Xing
	This is the last member in struct rps_dev_flow which should be protected locklessly. So finish it. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net: rps: protect filter locklessly	Jason Xing
	As we can see, rflow->filter can be written/read concurrently, so lockless access is needed. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net: rps: protect last_qtail with rps_input_queue_tail_save() helper	Jason Xing
	Removing one unnecessary reader protection and add another writer protection to finish the locklessly proctection job. Note: the removed READ_ONCE() is not needed because we only have to protect the locklessly reader in the different context (rps_may_expire_flow()). Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_skbprio: implement lockless skbprio_dump()	Eric Dumazet
	Instead of relying on RTNL, skbprio_dump() can use READ_ONCE() annotation, paired with WRITE_ONCE() one in skbprio_change(). Also add a READ_ONCE(sch->limit) in skbprio_enqueue(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_pie: implement lockless pie_dump()	Eric Dumazet
	Instead of relying on RTNL, pie_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in pie_change(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_hhf: implement lockless hhf_dump()	Eric Dumazet
	Instead of relying on RTNL, hhf_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in hhf_change(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_hfsc: implement lockless accesses to q->defcls	Eric Dumazet
	Instead of relying on RTNL, hfsc_dump_qdisc() can use READ_ONCE() annotation, paired with WRITE_ONCE() one in hfsc_change_qdisc(). Use READ_ONCE(q->defcls) in hfsc_classify() to no longer acquire qdisc lock from hfsc_change_qdisc(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_fq_pie: implement lockless fq_pie_dump()	Eric Dumazet
	Instead of relying on RTNL, fq_pie_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in fq_pie_change(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_fq_codel: implement lockless fq_codel_dump()	Eric Dumazet
	Instead of relying on RTNL, fq_codel_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in fq_codel_change(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_fifo: implement lockless __fifo_dump()	Eric Dumazet
	Instead of relying on RTNL, __fifo_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in __fifo_init(). Also add missing READ_ONCE(sh->limit) in bfifo_enqueue(), pfifo_enqueue() and pfifo_tail_enqueue(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_ets: implement lockless ets_dump()	Eric Dumazet
	Instead of relying on RTNL, ets_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in ets_change(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_tfs: implement lockless etf_dump()	Eric Dumazet
	Instead of relying on RTNL, codel_dump() can use READ_ONCE() annotations. There is no etf_change() yet, this patch imply aligns this qdisc with others. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_codel: implement lockless codel_dump()	Eric Dumazet
	Instead of relying on RTNL, codel_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in codel_change(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_choke: implement lockless choke_dump()	Eric Dumazet
	Instead of relying on RTNL, choke_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in choke_change(). v2: added a WRITE_ONCE(p->Scell_log, Scell_log) per Simon feedback in V1 Removed the READ_ONCE(q->limit) in choke_enqueue() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_cbs: implement lockless cbs_dump()	Eric Dumazet
	Instead of relying on RTNL, cbs_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in cbs_change(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: cake: implement lockless cake_dump()	Eric Dumazet
	Instead of relying on RTNL, cake_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in cake_change(). v2: addressed Simon feedback in V1: https://lore.kernel.org/netdev/20240417083549.GA3846178@kernel.org/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Toke Høiland-Jørgensen <toke@toke.dk> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	net_sched: sch_fq: implement lockless fq_dump()	Eric Dumazet
	Instead of relying on RTNL, fq_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() in fq_change() v2: Addressed Simon feedback in V1: https://lore.kernel.org/netdev/20240416181915.GT2320920@kernel.org/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-19	wifi: mac80211: handle link ID during management Tx	Sriram R
	During non-STA management Tx, when source address is same as one of the link addresses and even when userspace requested Tx on a specific link, the link ID is not set in the TX control information. Now if the MLD address is also the same as that of the link address, then mac80211 fills link as "unspecified", since it looks like MLD TX. This is unexpected, however, since non-STA TX must specify which link to use. In hwsim, this will (after warnings) result in dropping such frames as well. Use and set the link id if the link bss is matching the address and requested channel. Signed-off-by: Sriram R <quic_srirrama@quicinc.com> Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://msgid.link/20240410052705.169865-1-quic_adisi@quicinc.com Link: https://lore.kernel.org/r/0496fb7e-53cc-476f-8052-985d82fd8d01@quicinc.com [reword commit message, should spell out hwsim etc.] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: mac80211: handle sdata->u.ap.active flag with MLO	Aditya Kumar Singh
	Currently whenever link AP beacon is assigned, sdata->u.ap.active flag is set and whenever it is brought down, the flag is reset. However, with MLO, all the links of the same MLD would use the same sdata. Hence there is no need to set/reset for each link up/down. Also, resetting it when only one of the links went down is not desirable. Add changes to set the active flag only when first link is assigned beacon. Similarly, add changes to reset that flag only when last link is brought down. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://msgid.link/20240409094017.3165560-1-quic_adisi@quicinc.com [remove unnecessary check before constant true assignment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: cfg80211: add return docs for regulatory functions	Johannes Berg
	Add return value documentation for regulatory functions that are missing it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: cfg80211: make some regulatory functions void	Johannes Berg
	The return value of regulatory_hint_indoor() is always 0 for success, and the return value of regulatory_hint_found_beacon() is always ignored. Make them both have void return. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: mac80211: add return docs for sta_info_flush()	Johannes Berg
	Use the Return: annotation instead of spelling out "Returns" in the documentation, for both sta_info_flush()/__sta_info_flush(). Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: mac80211: keep mac80211 consistent on link activation failure	Benjamin Berg
	In the unlikely event that link_use_channel fails while activating a link, mac80211 would go into a bad state. Unfortunately, we cannot completely avoid failures from drivers in this case. However, what we can do is to just continue internally anyway and assume the driver is going to trigger a recovery flow from its side. Doing that means that we at least have a consistent state in mac80211 allowing such a recovery flow to succeed. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://msgid.link/20240418115219.1129e89f4b55.I6299678353e50e88b55c99b0bce15c64b52c2804@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: mac80211: simplify ieee80211_assign_link_chanctx()	Johannes Berg
	There's no need for a label/goto here, the only thing is that drv_assign_vif_chanctx() must succeed to set 'conf' and add the new context to the list, the remaining code is (and must be) the same regardless. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://msgid.link/20240418115219.a94852030d33.I9d647178ab25636372ed79e5312c68a06e0bf60c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: mac80211: reserve chanctx during find	Johannes Berg
	When searching for a chanctx for re-use, it's later adjusted and assigned. It may also be that another one is already assigned to the link in question, so unassign can also happen. In short, the driver is called multiple times. During these callbacks, it may thus change active links (on another interface), which then can in turn cause the found chanctx (that's going to be reused) to get removed and freed. To avoid this, temporarily assign it to the reserved chanctx and track the link that wants to use it in the reserved_links list. This causes the ieee80211_chanctx_refcount() to be increased by one during these operations, thus avoiding the free. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://msgid.link/20240418115219.94ea84c8ee1e.I0b247dbc0cd937ae6367bc0fc7e8d156b5d5f9b1@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: mac80211: defer link switch work in reconfig	Miri Korenblit
	If a link switch work was queued, and then a restart happened, the worker might be executed before the reconfig, and obviously it will fail (the HW might not respond to updates etc.) So, don't perform the switch if we are in reconfig, instead - do it at the end of the reconfig. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240415112355.1ef1008e3a0a.I19add3f2152dcfd55a759de97b1d09265c1cde98@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: mac80211: transmit deauth only if link is available	Johannes Berg
	There's an issue in that when we disconnect from an AP due to the AP switching to an unsupported channel, we might not tell the driver about this before we try to send the deauth. If the underlying implementation has detected the quiet CSA, this may cause issues if this is the only active link. Avoid this by transmitting (and flushing) the deauth only when there's an active link available that's not affected by quiet CSA. Since this introduces link->u.mgd.csa_blocked_tx and we no longer check sdata->csa_blocked_tx for the TX itself also rename the latter to csa_blocked_queues. Fixes: 6f0107d195a8 ("wifi: mac80211: introduce a feature flag for quiet in CSA") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240415112355.1d91db5e95aa.Iad3a5df3367f305dff48cd61776abfd6cf0fd4ab@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: mac80211: fix unaligned le16 access	Johannes Berg
	The AP removal timer field need not be aligned, so the code shouldn't access it directly, but use unaligned loads. Use get_unaligned_le16(), which even is shorter than the current code since it doesn't need a cast. Fixes: 8eb8dd2ffbbb ("wifi: mac80211: Support link removal using Reconfiguration ML element") Reviewed-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240418105220.356788ba0045.I2b3cdb3644e205d5bb10322c345c0499171cf5d2@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: mac80211: remove link before AP	Johannes Berg
	If the AP removal timer is long, we don't really want to remove the link immediately. However, we really should do it _before_ the AP removes it (which happens at or after count reaches 0), so subtract 1 from the countdown when scheduling the timer. This causes the link removal work to run just after the beacon with value 1 is received. If the counter is already zero, do it immediately. This fixes an issue where we do the removal too late and receive a beacon from the AP that's no longer associated with the MLD, but thus removed EHT and ML elements, and then we disconnect instead from the whole MLD, since one of the associated APs changed mode from EHT to HE. Fixes: 8eb8dd2ffbbb ("wifi: mac80211: Support link removal using Reconfiguration ML element") Reviewed-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240418105220.03ac4a09fa74.Ifb8c8d38e3402721a81ce5981568f47b5c5889cb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-04-19	wifi: nl80211: don't free NULL coalescing rule	Johannes Berg
	If the parsing fails, we can dereference a NULL pointer here. Cc: stable@vger.kernel.org Fixes: be29b99a9b51 ("cfg80211/nl80211: Add packet coalesce support") Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240418105220.b328f80406e7.Id75d961050deb05b3e4e354e024866f350c68103@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>