summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-06-16ip, ip6: Fix splice to raw and ping socketsDavid Howells
Splicing to SOCK_RAW sockets may set MSG_SPLICE_PAGES, but in such a case, __ip_append_data() will call skb_splice_from_iter() to access the 'from' data, assuming it to point to a msghdr struct with an iter, instead of using the provided getfrag function to access it. In the case of raw_sendmsg(), however, this is not the case and 'from' will point to a raw_frag_vec struct and raw_getfrag() will be the frag-getting function. A similar issue may occur with rawv6_sendmsg(). Fix this by ignoring MSG_SPLICE_PAGES if getfrag != ip_generic_getfrag as ip_generic_getfrag() expects "from" to be a msghdr*, but the other getfrags don't. Note that this will prevent MSG_SPLICE_PAGES from being effective for udplite. This likely affects ping sockets too. udplite looks like it should be okay as it expects "from" to be a msghdr. Signed-off-by: David Howells <dhowells@redhat.com> Reported-by: syzbot+d8486855ef44506fd675@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000ae4cbf05fdeb8349@google.com/ Fixes: 2dc334f1a63a ("splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()") Tested-by: syzbot+d8486855ef44506fd675@syzkaller.appspotmail.com cc: David Ahern <dsahern@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/1410156.1686729856@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-16xfrm: Linearize the skb after offloading if needed.Sebastian Andrzej Siewior
With offloading enabled, esp_xmit() gets invoked very late, from within validate_xmit_xfrm() which is after validate_xmit_skb() validates and linearizes the skb if the underlying device does not support fragments. esp_output_tail() may add a fragment to the skb while adding the auth tag/ IV. Devices without the proper support will then send skb->data points to with the correct length so the packet will have garbage at the end. A pcap sniffer will claim that the proper data has been sent since it parses the skb properly. It is not affected with INET_ESP_OFFLOAD disabled. Linearize the skb after offloading if the sending hardware requires it. It was tested on v4, v6 has been adopted. Fixes: 7785bba299a8d ("esp: Add a software GRO codepath") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-06-15net: add check for current MAC address in dev_set_mac_addressPiotr Gardocki
In some cases it is possible for kernel to come with request to change primary MAC address to the address that is already set on the given interface. Add proper check to return fast from the function in these cases. An example of such case is adding an interface to bonding channel in balance-alb mode: modprobe bonding mode=balance-alb miimon=100 max_bonds=1 ip link set bond0 up ifenslave bond0 <eth> Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15net: ioctl: Use kernel memory on protocol ioctl callbacksBreno Leitao
Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl callback. This is not flexible, because it is hard to reuse these functions without passing userspace buffers. Change the "struct proto" ioctls to avoid touching userspace memory and operate on kernel buffers, i.e., all protocol's ioctl callbacks is adapted to operate on a kernel memory other than on userspace (so, no more {put,get}_user() and friends being called in the ioctl callback). This changes the "struct proto" ioctl format in the following way: int (*ioctl)(struct sock *sk, int cmd, - unsigned long arg); + int *karg); (Important to say that this patch does not touch the "struct proto_ops" protocols) So, the "karg" argument, which is passed to the ioctl callback, is a pointer allocated to kernel space memory (inside a function wrapper). This buffer (karg) may contain input argument (copied from userspace in a prep function) and it might return a value/buffer, which is copied back to userspace if necessary. There is not one-size-fits-all format (that is I am using 'may' above), but basically, there are three type of ioctls: 1) Do not read from userspace, returns a result to userspace 2) Read an input parameter from userspace, and does not return anything to userspace 3) Read an input from userspace, and return a buffer to userspace. The default case (1) (where no input parameter is given, and an "int" is returned to userspace) encompasses more than 90% of the cases, but there are two other exceptions. Here is a list of exceptions: * Protocol RAW: * cmd = SIOCGETVIFCNT: * input and output = struct sioc_vif_req * cmd = SIOCGETSGCNT * input and output = struct sioc_sg_req * Explanation: for the SIOCGETVIFCNT case, userspace passes the input argument, which is struct sioc_vif_req. Then the callback populates the struct, which is copied back to userspace. * Protocol RAW6: * cmd = SIOCGETMIFCNT_IN6 * input and output = struct sioc_mif_req6 * cmd = SIOCGETSGCNT_IN6 * input and output = struct sioc_sg_req6 * Protocol PHONET: * cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE * input int (4 bytes) * Nothing is copied back to userspace. For the exception cases, functions sock_sk_ioctl_inout() will copy the userspace input, and copy it back to kernel space. The wrapper that prepare the buffer and put the buffer back to user is sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now calls sk_ioctl(), which will handle all cases. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230609152800.830401-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/mlx5/driver.h 617f5db1a626 ("RDMA/mlx5: Fix affinity assignment") dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports") https://lore.kernel.org/all/20230613125939.595e50b8@canb.auug.org.au/ tools/testing/selftests/net/mptcp/mptcp_join.sh 47867f0a7e83 ("selftests: mptcp: join: skip check if MIB counter not supported") 425ba803124b ("selftests: mptcp: join: support RM_ADDR for used endpoints or not") 45b1a1227a7a ("mptcp: introduces more address related mibs") 0639fa230a21 ("selftests: mptcp: add explicit check for new mibs") https://lore.kernel.org/netdev/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15dccp: Print deprecation notice.Kuniyuki Iwashima
DCCP was marked as Orphan in the MAINTAINERS entry 2 years ago in commit 054c4610bd05 ("MAINTAINERS: dccp: move Gerrit Renker to CREDITS"). It says we haven't heard from the maintainer for five years, so DCCP is not well maintained for 7 years now. Recently DCCP only receives updates for bugs, and major distros disable it by default. Removing DCCP would allow for better organisation of TCP fields to reduce the number of cache lines hit in the fast path. Let's add a deprecation notice when DCCP socket is created and schedule its removal to 2025. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15udplite: Print deprecation notice.Kuniyuki Iwashima
Recently syzkaller reported a 7-year-old null-ptr-deref [0] that occurs when a UDP-Lite socket tries to allocate a buffer under memory pressure. Someone should have stumbled on the bug much earlier if UDP-Lite had been used in a real app. Also, we do not always need a large UDP-Lite workload to hit the bug since UDP and UDP-Lite share the same memory accounting limit. Removing UDP-Lite would simplify UDP code removing a bunch of conditionals in fast path. Let's add a deprecation notice when UDP-Lite socket is created and schedule its removal to 2025. Link: https://lore.kernel.org/netdev/20230523163305.66466-1-kuniyu@amazon.com/ [0] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15net: tipc: resize nlattr array to correct sizeLin Ma
According to nla_parse_nested_deprecated(), the tb[] is supposed to the destination array with maxtype+1 elements. In current tipc_nl_media_get() and __tipc_nl_media_set(), a larger array is used which is unnecessary. This patch resize them to a proper size. Fixes: 1e55417d8fc6 ("tipc: add media set to new netlink api") Fixes: 46f15c6794fb ("tipc: add media get/dump to new netlink api") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Florian Westphal <fw@strlen.de> Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Link: https://lore.kernel.org/r/20230614120604.1196377-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15net: tls: make the offload check helper take skb not socketJakub Kicinski
All callers of tls_is_sk_tx_device_offloaded() currently do an equivalent of: if (skb->sk && tls_is_skb_tx_device_offloaded(skb->sk)) Have the helper accept skb and do the skb->sk check locally. Two drivers have local static inlines with similar wrappers already. While at it change the ifdef condition to TLS_DEVICE. Only TLS_DEVICE selects SOCK_VALIDATE_XMIT, so the two are equivalent. This makes removing the duplicated IS_ENABLED() check in funeth more obviously correct. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-15netpoll: allocate netdev tracker right awayJakub Kicinski
Commit 5fa5ae605821 ("netpoll: add net device refcount tracker to struct netpoll") was part of one of the initial netdev tracker introduction patches. It added an explicit netdev_tracker_alloc() for netpoll, presumably because the flow of the function is somewhat odd. After most of the core networking stack was converted to use the tracking hold() variants, netpoll's call to old dev_hold() stands out a bit. np is allocated by the caller and ready to use, we can use netdev_hold() here, even tho np->ndev will only be set to ndev inside __netpoll_setup(). Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-15net: create device lookup API with reference trackingJakub Kicinski
New users of dev_get_by_index() and dev_get_by_name() keep getting added and it would be nice to steer them towards the APIs with reference tracking. Add variants of those calls which allocate the reference tracker and use them in a couple of places. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-14net/sched: cls_api: Fix lockup on flushing explicitly created chainVlad Buslov
Mingshuai Ren reports: When a new chain is added by using tc, one soft lockup alarm will be generated after delete the prio 0 filter of the chain. To reproduce the problem, perform the following steps: (1) tc qdisc add dev eth0 root handle 1: htb default 1 (2) tc chain add dev eth0 (3) tc filter del dev eth0 chain 0 parent 1: prio 0 (4) tc filter add dev eth0 chain 0 parent 1: Fix the issue by accounting for additional reference to chains that are explicitly created by RTM_NEWCHAIN message as opposed to implicitly by RTM_NEWTFILTER message. Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush") Reported-by: Mingshuai Ren <renmingshuai@huawei.com> Closes: https://lore.kernel.org/lkml/87legswvi3.fsf@nvidia.com/T/ Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Link: https://lore.kernel.org/r/20230612093426.2867183-1-vladbu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14rtnetlink: move validate_linkmsg out of do_setlinkXin Long
This patch moves validate_linkmsg() out of do_setlink() to its callers and deletes the early validate_linkmsg() call in __rtnl_newlink(), so that it will not call validate_linkmsg() twice in either of the paths: - __rtnl_newlink() -> do_setlink() - __rtnl_newlink() -> rtnl_newlink_create() -> rtnl_create_link() Additionally, as validate_linkmsg() is now only called with a real dev, we can remove the NULL check for dev in validate_linkmsg(). Note that we moved validate_linkmsg() check to the places where it has not done any changes to the dev, as Jakub suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/cf2ef061e08251faf9e8be25ff0d61150c030475.1686585334.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14net/handshake: remove fput() that causes use-after-freeLin Ma
A reference underflow is found in TLS handshake subsystem that causes a direct use-after-free. Part of the crash log is like below: [ 2.022114] ------------[ cut here ]------------ [ 2.022193] refcount_t: underflow; use-after-free. [ 2.022288] WARNING: CPU: 0 PID: 60 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110 [ 2.022432] Modules linked in: [ 2.022848] RIP: 0010:refcount_warn_saturate+0xbe/0x110 [ 2.023231] RSP: 0018:ffffc900001bfe18 EFLAGS: 00000286 [ 2.023325] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00000000ffffdfff [ 2.023438] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 0000000000000001 [ 2.023555] RBP: ffff888004c20098 R08: ffffffff82b392c8 R09: 00000000ffffdfff [ 2.023693] R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888004c200d8 [ 2.023813] R13: 0000000000000000 R14: ffff888004c20000 R15: ffffc90000013ca8 [ 2.023930] FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 2.024062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.024161] CR2: ffff888003601000 CR3: 0000000002a2e000 CR4: 00000000000006f0 [ 2.024275] Call Trace: [ 2.024322] <TASK> [ 2.024367] ? __warn+0x7f/0x130 [ 2.024430] ? refcount_warn_saturate+0xbe/0x110 [ 2.024513] ? report_bug+0x199/0x1b0 [ 2.024585] ? handle_bug+0x3c/0x70 [ 2.024676] ? exc_invalid_op+0x18/0x70 [ 2.024750] ? asm_exc_invalid_op+0x1a/0x20 [ 2.024830] ? refcount_warn_saturate+0xbe/0x110 [ 2.024916] ? refcount_warn_saturate+0xbe/0x110 [ 2.024998] __tcp_close+0x2f4/0x3d0 [ 2.025065] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10 [ 2.025168] tcp_close+0x1f/0x70 [ 2.025231] inet_release+0x33/0x60 [ 2.025297] sock_release+0x1f/0x80 [ 2.025361] handshake_req_cancel_test2+0x100/0x2d0 [ 2.025457] kunit_try_run_case+0x4c/0xa0 [ 2.025532] kunit_generic_run_threadfn_adapter+0x15/0x20 [ 2.025644] kthread+0xe1/0x110 [ 2.025708] ? __pfx_kthread+0x10/0x10 [ 2.025780] ret_from_fork+0x2c/0x50 One can enable CONFIG_NET_HANDSHAKE_KUNIT_TEST config to reproduce above crash. The root cause of this bug is that the commit 1ce77c998f04 ("net/handshake: Unpin sock->file if a handshake is cancelled") adds one additional fput() function. That patch claims that the fput() is used to enable sock->file to be freed even when user space never calls DONE. However, it seems that the intended DONE routine will never give an additional fput() of ths sock->file. The existing two of them are just used to balance the reference added in sockfd_lookup(). This patch revert the mentioned commit to avoid the use-after-free. The patched kernel could successfully pass the KUNIT test and boot to shell. [ 0.733613] # Subtest: Handshake API tests [ 0.734029] 1..11 [ 0.734255] KTAP version 1 [ 0.734542] # Subtest: req_alloc API fuzzing [ 0.736104] ok 1 handshake_req_alloc NULL proto [ 0.736114] ok 2 handshake_req_alloc CLASS_NONE [ 0.736559] ok 3 handshake_req_alloc CLASS_MAX [ 0.737020] ok 4 handshake_req_alloc no callbacks [ 0.737488] ok 5 handshake_req_alloc no done callback [ 0.737988] ok 6 handshake_req_alloc excessive privsize [ 0.738529] ok 7 handshake_req_alloc all good [ 0.739036] # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7 [ 0.739444] ok 1 req_alloc API fuzzing [ 0.740065] ok 2 req_submit NULL req arg [ 0.740436] ok 3 req_submit NULL sock arg [ 0.740834] ok 4 req_submit NULL sock->file [ 0.741236] ok 5 req_lookup works [ 0.741621] ok 6 req_submit max pending [ 0.741974] ok 7 req_submit multiple [ 0.742382] ok 8 req_cancel before accept [ 0.742764] ok 9 req_cancel after accept [ 0.743151] ok 10 req_cancel after done [ 0.743510] ok 11 req_destroy works [ 0.743882] # Handshake API tests: pass:11 fail:0 skip:0 total:11 [ 0.744205] # Totals: pass:17 fail:0 skip:0 total:17 Acked-by: Chuck Lever <chuck.lever@oracle.com> Fixes: 1ce77c998f04 ("net/handshake: Unpin sock->file if a handshake is cancelled") Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://lore.kernel.org/r/20230613083204.633896-1-linma@zju.edu.cn Link: https://lore.kernel.org/r/20230614015249.987448-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14Merge tag 'wireless-2023-06-14' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== A couple of straggler fixes, mostly in the stack: - fix fragmentation for multi-link related elements - fix callback copy/paste error - fix multi-link locking - remove double-locking of wiphy mutex - transmit only on active links, not all - activate links in the correct order - don't remove links that weren't added - disable soft-IRQs for LQ lock in iwlwifi * tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression wifi: mac80211: fragment per STA profile correctly wifi: mac80211: Use active_links instead of valid_links in Tx wifi: cfg80211: remove links only on AP wifi: mac80211: take lock before setting vif links wifi: cfg80211: fix link del callback to call correct handler wifi: mac80211: fix link activation settings order wifi: cfg80211: fix double lock bug in reg_wdev_chan_valid() ==================== Link: https://lore.kernel.org/r/20230614075502.11765-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFOEdwin Peer
This filter already exists for excluding IPv6 SNMP stats. Extend its definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK. This patch constitutes a partial fix for a netlink attribute nesting overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the requester doesn't need them, the truncation of the VF list is avoided. While it was technically only the stats added in commit c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics") breaking the camel's back, the appreciable size of the stats data should never have been included without due consideration for the maximum number of VFs supported by PCI. Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice") Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Cc: Edwin Peer <espeer@gmail.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://lore.kernel.org/r/20230611105108.122586-1-gal@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14wifi: mac80211: Replace strlcpy with strscpyAzeem Shaikh
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). Direct replacement is safe here since LOCAL_ASSIGN is only used by TRACE macros and the return values are ignored. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230613003404.3538524-1-azeemshaikh38@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: cfg80211: replace strlcpy() with strscpy()Azeem Shaikh
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). Direct replacement is safe here since WIPHY_ASSIGN is only used by TRACE macros and the return values are ignored. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230612232301.2572316-1-azeemshaikh38@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: mac80211: Fix permissions for valid_links debugfs entryIlan Peer
The entry should be a read only one and not a write only one. Fix it. Fixes: 3d9011029227 ("wifi: mac80211: implement link switching") Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230611121219.c75316990411.I1565a7fcba8a37f83efffb0cc6b71c572b896e94@changeid [remove x16 change since it doesn't work yet] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: cfg80211: Support association to AP MLD with disabled linksIlan Peer
An AP part of an AP MLD might be temporarily disabled, and might be enabled later. Such a link should be included in the association exchange, but should not be used until enabled. Extend the NL80211_CMD_ASSOCIATE to also indicate disabled links. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.c4c61ee4c4a5.I784ef4a0d619fc9120514b5615458fbef3b3684a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: mac80211: Add getter functions for vif MLD stateIlan Peer
As a preparation to support disabled/dormant links, add the following function: - ieee80211_vif_usable_links(): returns the bitmap of the links that can be activated. Use this function in all the places that the bitmap of the usable links is needed. - ieee80211_vif_is_mld(): returns true iff the vif is an MLD. Use this function in all the places where an indication that the connection is a MLD is needed. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.86e3351da1fc.If6fe3a339fda2019f13f57ff768ecffb711b710a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: mac80211: allow disabling SMPS debugfs controlsMiri Korenblit
There are cases in which we don't want the user to override the smps mode, e.g. when SMPS should be disabled due to EMLSR. Add a driver flag to disable SMPS overriding and don't override if it is set. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.ef129e80556c.I74a298fdc86b87074c95228d3916739de1400597@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: mac80211: don't update rx_stats.last_rate for NDPJohannes Berg
If we get an NDP (null data packet), there's reason to believe the peer is just sending it to probe, and that would happen at a low rate. Don't track this packet for purposes of last RX rate reporting. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.8af46c4ac094.I13d9d5019addeaa4aff3c8a05f56c9f5a86b1ebd@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: mac80211: fix CSA processing while scanningBenjamin Berg
The channel switch parsing code would simply return if a scan is in-progress. Supposedly, this was because channel switch announcements from other APs should be ignored. For the beacon case, the function is already only called if we are associated with the sender. For the action frame cases, add the appropriate check whether the frame is coming from the AP we are associated with. Finally, drop the scanning check from ieee80211_sta_process_chanswitch. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.3366e9302468.I6c7e0b58c33b7fb4c675374cfe8c3a5cddcec416@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: mac80211: mlme: clarify WMM messagesJohannes Berg
These messages apply to a single link only, use link_info() to indicate that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.21a6bece4313.I08118e5e851fae2f9e43f8a58d3b6217709bf578@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: mac80211: pass roc->sdata to drv_cancel_remain_on_channel()Anjaneyulu
In suspend flow "sdata" is NULL, destroy all roc's which are started. pass "roc->sdata" to drv_cancel_remain_on_channel() to avoid NULL dereference and destroy that roc Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.c678187a308c.Ic11578778655e273931efc5355d570a16465d1be@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: mac80211: include key action/command in tracingJohannes Berg
We trace the key information and all, but not whether the key is added or removed - add that information. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.546e86e216df.Ie3bf9009926f8fa154dde52b0c02537ff7edae36@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: mac80211: add helpers to access sband iftype dataJohannes Berg
There's quite a bit of code accessing sband iftype data (HE, HE 6 GHz, EHT) and we always need to remember to use the ieee80211_vif_type_p2p() helper. Add new helpers to directly get it from the sband/vif rather than having to call ieee80211_vif_type_p2p(). Convert most code with the following spatch: @@ expression vif, sband; @@ -ieee80211_get_he_iftype_cap(sband, ieee80211_vif_type_p2p(vif)) +ieee80211_get_he_iftype_cap_vif(sband, vif) @@ expression vif, sband; @@ -ieee80211_get_eht_iftype_cap(sband, ieee80211_vif_type_p2p(vif)) +ieee80211_get_eht_iftype_cap_vif(sband, vif) @@ expression vif, sband; @@ -ieee80211_get_he_6ghz_capa(sband, ieee80211_vif_type_p2p(vif)) +ieee80211_get_he_6ghz_capa_vif(sband, vif) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230604120651.db099f49e764.Ie892966c49e22c7b7ee1073bc684f142debfdc84@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14wifi: cfg80211: S1G rate information and calculationsGilad Itzkovitch
Increase the size of S1G rate_info flags to support S1G and add flags for new S1G MCS and the supported bandwidths. Also, include S1G rate information to netlink STA rate message. Lastly, add rate calculation function for S1G MCS. Signed-off-by: Gilad Itzkovitch <gilad.itzkovitch@morsemicro.com> Link: https://lore.kernel.org/r/20230518000723.991912-1-gilad.itzkovitch@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-14net/sched: qdisc_destroy() old ingress and clsact Qdiscs before graftingPeilin Ye
mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized in ingress_init() to point to net_device::miniq_ingress. ingress Qdiscs access this per-net_device pointer in mini_qdisc_pair_swap(). Similar for clsact Qdiscs and miniq_egress. Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER requests (thanks Hillf Danton for the hint), when replacing ingress or clsact Qdiscs, for example, the old Qdisc ("@old") could access the same miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"), causing race conditions [1] including a use-after-free bug in mini_qdisc_pair_swap() reported by syzbot: BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901 ... Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319 print_report mm/kasan/report.c:430 [inline] kasan_report+0x11c/0x130 mm/kasan/report.c:536 mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 tcf_chain_head_change_item net/sched/cls_api.c:495 [inline] tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509 tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline] tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline] tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266 ... @old and @new should not affect each other. In other words, @old should never modify miniq_{in,e}gress after @new, and @new should not update @old's RCU state. Fixing without changing sch_api.c turned out to be difficult (please refer to Closes: for discussions). Instead, make sure @new's first call always happen after @old's last call (in {ingress,clsact}_destroy()) has finished: In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests, and call qdisc_destroy() for @old before grafting @new. Introduce qdisc_refcount_dec_if_one() as the counterpart of qdisc_refcount_inc_nz() used for filter requests. Introduce a non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check, just like qdisc_put() etc. Depends on patch "net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs". [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under TC_H_ROOT (no longer possible after commit c7cfbd115001 ("net/sched: sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8 transmission queues: Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then adds a flower filter X to A. Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and b2) to replace A, then adds a flower filter Y to B. Thread 1 A's refcnt Thread 2 RTM_NEWQDISC (A, RTNL-locked) qdisc_create(A) 1 qdisc_graft(A) 9 RTM_NEWTFILTER (X, RTNL-unlocked) __tcf_qdisc_find(A) 10 tcf_chain0_head_change(A) mini_qdisc_pair_swap(A) (1st) | | RTM_NEWQDISC (B, RTNL-locked) RCU sync 2 qdisc_graft(B) | 1 notify_and_destroy(A) | tcf_block_release(A) 0 RTM_NEWTFILTER (Y, RTNL-unlocked) qdisc_destroy(A) tcf_chain0_head_change(B) tcf_chain0_head_change_cb_del(A) mini_qdisc_pair_swap(B) (2nd) mini_qdisc_pair_swap(A) (3rd) | ... ... Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets on eth0 will not find filter Y in sch_handle_ingress(). This is just one of the possible consequences of concurrently accessing miniq_{in,e}gress pointers. Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops") Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops") Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/ Cc: Hillf Danton <hdanton@sina.com> Cc: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14net/sched: Refactor qdisc_graft() for ingress and clsact QdiscsPeilin Ye
Grafting ingress and clsact Qdiscs does not need a for-loop in qdisc_graft(). Refactor it. No functional changes intended. Tested-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14net/sched: act_ct: Fix promotion of offloaded unreplied tuplePaul Blakey
Currently UNREPLIED and UNASSURED connections are added to the nf flow table. This causes the following connection packets to be processed by the flow table which then skips conntrack_in(), and thus such the connections will remain UNREPLIED and UNASSURED even if reply traffic is then seen. Even still, the unoffloaded reply packets are the ones triggering hardware update from new to established state, and if there aren't any to triger an update and/or previous update was missed, hardware can get out of sync with sw and still mark packets as new. Fix the above by: 1) Not skipping conntrack_in() for UNASSURED packets, but still refresh for hardware, as before the cited patch. 2) Try and force a refresh by reply-direction packets that update the hardware rules from new to established state. 3) Remove any bidirectional flows that didn't failed to update in hardware for re-insertion as bidrectional once any new packet arrives. Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections") Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/1686313379-117663-1-git-send-email-paulb@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-13ethtool: ioctl: account for sopass diff in set_wolJustin Chen
sopass won't be set if wolopt doesn't change. This means the following will fail to set the correct sopass. ethtool -s eth0 wol s sopass 11:22:33:44:55:66 ethtool -s eth0 wol s sopass 22:44:55:66:77:88 Make sure we call into the driver layer set_wol if sopass is different. Fixes: 55b24334c0f2 ("ethtool: ioctl: improve error checking for set_wol") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Link: https://lore.kernel.org/r/1686605822-34544-1-git-send-email-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12kcm: Send multiple frags in one sendmsg()David Howells
Rewrite the AF_KCM transmission loop to send all the fragments in a single skb or frag_list-skb in one sendmsg() with MSG_SPLICE_PAGES set. The list of fragments in each skb is conveniently a bio_vec[] that can just be attached to a BVEC iter. Note: I'm working out the size of each fragment-skb by adding up bv_len for all the bio_vecs in skb->frags[] - but surely this information is recorded somewhere? For the skbs in head->frag_list, this is equal to skb->data_len, but not for the head. head->data_len includes all the tail frags too. Signed-off-by: David Howells <dhowells@redhat.com> cc: Tom Herbert <tom@herbertland.com> cc: Tom Herbert <tom@quantonium.net> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12kcm: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpageDavid Howells
When transmitting data, call down into the transport socket using sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than using sendpage. Signed-off-by: David Howells <dhowells@redhat.com> cc: Tom Herbert <tom@herbertland.com> cc: Tom Herbert <tom@quantonium.net> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12tcp_bpf: Make tcp_bpf_sendpage() go through tcp_bpf_sendmsg(MSG_SPLICE_PAGES)David Howells
Make tcp_bpf_sendpage() a wrapper around tcp_bpf_sendmsg(MSG_SPLICE_PAGES) rather than a loop calling tcp_sendpage(). sendpage() will be removed in the future. Signed-off-by: David Howells <dhowells@redhat.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Sitnicki <jakub@cloudflare.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpageDavid Howells
When transmitting data, call down into TCP using sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than performing sendpage calls to transmit header, data pages and trailer. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12net: flower: add support for matching cfm fieldsZahari Doychev
Add support to the tc flower classifier to match based on fields in CFM information elements like level and opcode. tc filter add dev ens6 ingress protocol 802.1q \ flower vlan_id 698 vlan_ethtype 0x8902 cfm mdl 5 op 46 \ action drop Signed-off-by: Zahari Doychev <zdoychev@maxlinear.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12net: flow_dissector: add support for cfm packetsZahari Doychev
Add support for dissecting cfm packets. The cfm packet header fields maintenance domain level and opcode can be dissected. Signed-off-by: Zahari Doychev <zdoychev@maxlinear.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12svcrdma: Revert 2a1e4f21d841 ("svcrdma: Normalize Send page handling")Chuck Lever
Get rid of the completion wait in svc_rdma_sendto(), and release pages in the send completion handler again. A subsequent patch will handle releasing those pages more efficiently. Reverted by hand: patch -R would not apply cleanly. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12SUNRPC: Revert 579900670ac7 ("svcrdma: Remove unused sc_pages field")Chuck Lever
Pre-requisite for releasing pages in the send completion handler. Reverted by hand: patch -R would not apply cleanly. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12SUNRPC: Revert cc93ce9529a6 ("svcrdma: Retain the page backing ↵Chuck Lever
rq_res.head[0].iov_base") Pre-requisite for releasing pages in the send completion handler. Reverted by hand: patch -R would not apply cleanly. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Clean up allocation of svc_rdma_rw_ctxtChuck Lever
The physical device's favored NUMA node ID is available when allocating a rw_ctxt. Use that value instead of relying on the assumption that the memory allocation happens to be running on a node close to the device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Clean up allocation of svc_rdma_send_ctxtChuck Lever
The physical device's favored NUMA node ID is available when allocating a send_ctxt. Use that value instead of relying on the assumption that the memory allocation happens to be running on a node close to the device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Clean up allocation of svc_rdma_recv_ctxtChuck Lever
The physical device's favored NUMA node ID is available when allocating a recv_ctxt. Use that value instead of relying on the assumption that the memory allocation happens to be running on a node close to the device. This clean up eliminates the hack of destroying recv_ctxts that were not created by the receive CQ thread -- recv_ctxts are now always allocated on a "good" node. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Allocate new transports on device's NUMA nodeChuck Lever
The physical device's NUMA node ID is available when allocating an svc_xprt for an incoming connection. Use that value to ensure the svc_xprt structure is allocated on the NUMA node closest to the device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12tcp: remove size parameter from tcp_stream_alloc_skb()Eric Dumazet
Now all tcp_stream_alloc_skb() callers pass @size == 0, we can remove this parameter. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12tcp: remove some dead codeEric Dumazet
Now all skbs in write queue do not contain any payload in skb->head, we can remove some dead code. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12tcp: let tcp_send_syn_data() build headless packetsEric Dumazet
tcp_send_syn_data() is the last component in TCP transmit path to put payload in skb->head. Switch it to use page frags, so that we can remove dead code later. This allows to put more payload than previous implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net: ethtool: don't require empty header nestsJakub Kicinski
Ethtool currently requires a header nest (which is used to carry the common family options) in all requests including dumps. $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get lib.ynl.NlError: Netlink error: Invalid argument nl_len = 64 (48) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'request header missing'} $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get \ --json '{"header":{}}'; ) [{'combined-count': 1, 'combined-max': 1, 'header': {'dev-index': 2, 'dev-name': 'enp1s0'}}] Requiring the header nest to always be there may seem nice from the consistency perspective, but it's not serving any practical purpose. We shouldn't burden the user like this. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>