summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2024-02-12net: add rcu safety to rtnl_prop_list_size()Eric Dumazet
rtnl_prop_list_size() can be called while alternative names are added or removed concurrently. if_nlmsg_size() / rtnl_calcit() can indeed be called without RTNL held. Use explicit RCU protection to avoid UAF. Fixes: 88f4fb0c7496 ("net: rtnetlink: put alternative names to getlink message") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240209181248.96637-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-12ipv4: Set the routing scope properly in ip_route_output_ports().Guillaume Nault
Set scope automatically in ip_route_output_ports() (using the socket SOCK_LOCALROUTE flag). This way, callers don't have to overload the tos with the RTO_ONLINK flag, like RT_CONN_FLAGS() does. For callers that don't pass a struct sock, this doesn't change anything as the scope is still set to RT_SCOPE_UNIVERSE when sk is NULL. Callers that passed a struct sock and used RT_CONN_FLAGS(sk) or RT_CONN_FLAGS_TOS(sk, tos) for the tos are modified to use ip_sock_tos(sk) and RT_TOS(tos) respectively, as overloading tos with the RTO_ONLINK flag now becomes unnecessary. In drivers/net/amt.c, all ip_route_output_ports() calls use a 0 tos parameter, ignoring the SOCK_LOCALROUTE flag of the socket. But the sk parameter is a kernel socket, which doesn't have any configuration path for setting SOCK_LOCALROUTE anyway. Therefore, ip_route_output_ports() will continue to initialise scope with RT_SCOPE_UNIVERSE and amt.c doesn't need to be modified. Also, remove RT_CONN_FLAGS() and RT_CONN_FLAGS_TOS() from route.h as these macros are now unused. The objective is to eventually remove RTO_ONLINK entirely to allow converting ->flowi4_tos to dscp_t. This will ensure proper isolation between the DSCP and ECN bits, thus minimising the risk of introducing bugs where TOS values interfere with ECN. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/dacfd2ab40685e20959ab7b53c427595ba229e7d.1707496938.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-12wifi: cfg80211: report unprotected deauth/disassoc in wowlanShaul Triebitz
Add to cfg80211_wowlan_wakeup another wakeup reason - unprot_deauth_disassoc. To be set to true if the woke up was due to an unprotected deauth or disassoc frame in MFP. In that case report WOWLAN_TRIG_UNPROTECTED_DEAUTH_DISASSOC. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240206164849.a3d739850d03.I8f52a21c4f36d1af1f8068bed79e2f9cbf8289ef@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12wifi: mac80211: drop injection on disabled-chan monitorJohannes Berg
If the driver uses the new IEEE80211_CHAN_CAN_MONITOR, we may monitor on channels that are, e.g. via regulatory, otherwise considered disabled. However, we really shouldn't transmit on them, so prevent that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240206164849.9c03dcf67dbe.Ib86a851c274c440908c663f6dd774b79bfc3965d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12wifi: cfg80211: optionally support monitor on disabled channelsJohannes Berg
If the hardware supports a disabled channel, it may in some cases be possible to use monitor mode (without any transmit) on it when it's otherwise disabled. Add a new channel flag IEEE80211_CHAN_CAN_MONITOR that makes it possible for a driver to indicate such a thing. Make it per channel so drivers could have a choice with it, perhaps it's only possible on some channels, perhaps some channels are not supported at all, but still there and marked disabled. In _nl80211_parse_chandef() simplify the code and check only for an unknown channel, _cfg80211_chandef_usable() will later check for IEEE80211_CHAN_DISABLED anyway. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240206164849.87fad3a21a09.I9116b2fdc2e2c9fd59a9273a64db7fcb41fc0328@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12wifi: cfg80211: rename UHB to 6 GHzJohannes Berg
UHB stands for "Ultra High Band", but this term doesn't really exist in the spec. Rename all occurrences to "6 GHz", but keep a few defines for userspace API compatibility. Link: https://msgid.link/20240206164849.c9cfb9400839.I153db3b951934a1d84409c17fbe1f1d1782543fa@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12wifi: mac80211: remove only own link stations during stop_apAditya Kumar Singh
Currently, whenever AP link is brought down via ieee80211_stop_ap() function, all stations connected to the sdata are flushed. However, in case of MLO there is a requirement to flush only stations connected to that link and not all. For instance - Consider 2 GHz and 5 GHz are AP MLD. Now due to some reason 5 GHz link of this AP is going down (link removal or any other case). All stations connected, even legacy stations connected to 2 GHz link AP would also be flushed. Flushing of other link stations is not desirable. Fix this issue by passing self link ID to sta_flush() function. This would then only remove the stations which are still using the passed link ID as their link sta. Other stations will not be affected. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://msgid.link/20240205162952.1697646-4-quic_adisi@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12wifi: mac80211: flush only stations using requests linksAditya Kumar Singh
Whenever sta_flush() function is invoked, all STAs present in that interface are flushed. In case of MLO, it is desirable to only flush such STAs that are at least using a given link id as one of their links. Add support for this by making change in the __sta_info_flush API argument to accept a link ID. And then, only if the STA is using the given link as one of its links, it would be flushed. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://msgid.link/20240205162952.1697646-3-quic_adisi@quicinc.com [reword commit message, in particular this isn't about "active" links] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12wifi: cfg80211: add support for link id attribute in NL80211_CMD_DEL_STATIONAditya Kumar Singh
Currently whenever NL80211_CMD_DEL_STATION command is called without any MAC address, all stations present on that interface are flushed. However with MLO there is a need to flush such stations only which are using at least a particular link from the AP MLD interface. For example - 2 GHz and 5 GHz are part of an AP MLD. To this interface, following stations are connected - 1. One non-EHT STA on 2 GHz link. 2. One non-EHT STA on 5 GHz link. 3. One Multi-Link STA having 2 GHz and 5 GHz as active links. Now if currently, NL80211_CMD_DEL_STATION is issued by the 2 GHz link without any MAC address, it would flush all station entries. However, flushing of station entry #2 at least is not desireable since it is connected to 5 GHz link alone. Hence, add an option to pass link ID as well in the command so that if link ID is passed, stations using that passed link ID alone would be flushed and others will not. So after this, station entries #1 and #3 alone would be flushed and #2 will remain as it is. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://msgid.link/20240205162952.1697646-2-quic_adisi@quicinc.com [clarify documentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12wifi: mac80211: remove gfp parameter from ieee80211_obss_color_collision_notifyLorenzo Bianconi
Get rid of gfp parameter from ieee80211_obss_color_collision_notify since it is no longer used. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://msgid.link/f91e1c78896408ac556586ba8c99e4e389aeba02.1707389901.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-12can: isotp: support dynamic flow control parametersOliver Hartkopp
The ISO15765-2 standard supports to take the PDUs communication parameters blocksize (BS) and Separation Time minimum (STmin) either from the first received flow control (FC) "static" or from every received FC "dynamic". Add a new CAN_ISOTP_DYN_FC_PARMS flag to support dynamic FC parameters. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20231208165729.3011-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-02-12can: bcm: add recvmsg flags for own, local and remote trafficNicolas Maier
CAN RAW sockets allow userspace to tell if a received CAN frame comes from the same socket, another socket on the same host, or another host. See commit 1e55659ce6dd ("can-raw: add msg_flags to distinguish local traffic"). However, this feature is missing in CAN BCM sockets. Add the same feature to CAN BCM sockets. When reading a received frame (opcode RX_CHANGED) using recvmsg, two flags in msg->msg_flags may be set following the previous convention (from CAN RAW), to distinguish between 'own', 'local' and 'remote' CAN traffic. Update the documentation to reflect this change. Signed-off-by: Nicolas Maier <nicolas.maier.dev@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20240120081018.2319-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-02-12netfilter: conntrack: expedite rcu in nf_conntrack_cleanup_net_listEric Dumazet
nf_conntrack_cleanup_net_list() is calling synchronize_net() while RTNL is not held. This effectively calls synchronize_rcu(). synchronize_rcu() is much slower than synchronize_rcu_expedited(), and cleanup_net() is currently single threaded. In many workloads we want cleanup_net() to be faster, in order to free memory and various sysfs and procfs entries as fast as possible. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12net: use synchronize_rcu_expedited in cleanup_net()Eric Dumazet
cleanup_net() is calling synchronize_rcu() right before acquiring RTNL. synchronize_rcu() is much slower than synchronize_rcu_expedited(), and cleanup_net() is currently single threaded. In many workloads we want cleanup_net() to be fast, in order to free memory and various sysfs and procfs entries as fast as possible. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12ipv4/fib: use synchronize_net() when holding RTNLEric Dumazet
tnode_free() should use synchronize_net() instead of syncronize_rcu() to release RTNL sooner. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12bridge: vlan: use synchronize_net() when holding RTNLEric Dumazet
br_vlan_flush() and nbp_vlan_flush() should use synchronize_net() instead of syncronize_rcu() to release RTNL sooner. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12net: use synchronize_net() in dev_change_name()Eric Dumazet
dev_change_name() holds RTNL, we better use synchronize_net() instead of plain synchronize_rcu(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12ipv6: mcast: remove one synchronize_net() barrier in ipv6_mc_down()Eric Dumazet
As discussed in the past (commit 2d3916f31891 ("ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()")) I think the synchronize_net() call in ipv6_mc_down() is not needed. Under load, synchronize_net() can last between 200 usec and 5 ms. KASAN seems to agree as well. Fixes: f185de28d9ae ("mld: add new workqueues for process mld events") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set.Kui-Feng Lee
Make the decision to set or clean the expires of a route based on the RTF_EXPIRES flag, rather than the value of the "expires" argument. This patch doesn't make difference logically, but make inet6_addr_modify() and modify_prefix_route() consistent. The function inet6_addr_modify() is the only caller of modify_prefix_route(), and it passes the RTF_EXPIRES flag and an expiration value. The RTF_EXPIRES flag is turned on or off based on the value of valid_lft. The RTF_EXPIRES flag is turned on if valid_lft is a finite value (not infinite, not 0xffffffff). Even if valid_lft is 0, the RTF_EXPIRES flag remains on. The expiration value being passed is equal to the valid_lft value if the flag is on. However, if the valid_lft value is infinite, the expiration value becomes 0 and the RTF_EXPIRES flag is turned off. Despite this, modify_prefix_route() decides to set the expiration value if the received expiration value is not zero. This mixing of infinite and zero cases creates an inconsistency. Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12net/ipv6: Remove expired routes with a separated list of routes.Kui-Feng Lee
FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree can be expensive if the number of routes in a table is big, even if most of them are permanent. Checking routes in a separated list of routes having expiration will avoid this potential issue. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12net/ipv6: Remove unnecessary clean.Kui-Feng Lee
The route here is newly created. It is unnecessary to call fib6_clean_expires() on it. Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12net/ipv6: set expires in rt6_add_dflt_router().Kui-Feng Lee
Pass the duration of a lifetime (in seconds) to the function rt6_add_dflt_router() so that it can properly set the expiration time. The function ndisc_router_discovery() is the only one that calls rt6_add_dflt_router(), and it will later set the expiration time for the route created by rt6_add_dflt_router(). However, there is a gap of time between calling rt6_add_dflt_router() and setting the expiration time in ndisc_router_discovery(). During this period, there is a possibility that a new route may be removed from the routing table. By setting the correct expiration time in rt6_add_dflt_router(), we can prevent this from happening. The reason for setting RTF_EXPIRES in rt6_add_dflt_router() is to start the Garbage Collection (GC) timer, as it only activates when a route with RTF_EXPIRES is added to a table. Suggested-by: David Ahern <dsahern@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12mptcp: really cope with fastopen racePaolo Abeni
Fastopen and PM-trigger subflow shutdown can race, as reported by syzkaller. In my first attempt to close such race, I missed the fact that the subflow status can change again before the subflow_state_change callback is invoked. Address the issue additionally copying with all the states directly reachable from TCP_FIN_WAIT1. Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support") Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race") Cc: stable@vger.kernel.org Reported-by: syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/458 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12mptcp: check addrs list in userspace_pm_get_local_idGeliang Tang
Before adding a new entry in mptcp_userspace_pm_get_local_id(), it's better to check whether this address is already in userspace pm local address list. If it's in the list, no need to add a new entry, just return it's address ID and use this address. Fixes: 8b20137012d9 ("mptcp: read attributes of addr entries managed by userspace PMs") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <geliang.tang@linux.dev> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12mptcp: corner case locking for rx path fields initializationPaolo Abeni
Most MPTCP-level related fields are under the mptcp data lock protection, but are written one-off without such lock at MPC complete time, both for the client and the server Leverage the mptcp_propagate_state() infrastructure to move such initialization under the proper lock client-wise. The server side critical init steps are done by mptcp_subflow_fully_established(): ensure the caller properly held the relevant lock, and avoid acquiring the same lock in the nested scopes. There are no real potential races, as write access to such fields is implicitly serialized by the MPTCP state machine; the primary goal is consistency. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12mptcp: fix more tx path fields initializationPaolo Abeni
The 'msk->write_seq' and 'msk->snd_nxt' are always updated under the msk socket lock, except at MPC handshake completiont time. Builds-up on the previous commit to move such init under the relevant lock. There are no known problems caused by the potential race, the primary goal is consistency. Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12mptcp: fix rcv space initializationPaolo Abeni
mptcp_rcv_space_init() is supposed to happen under the msk socket lock, but active msk socket does that without such protection. Leverage the existing mptcp_propagate_state() helper to that extent. We need to ensure mptcp_rcv_space_init will happen before mptcp_rcv_space_adjust(), and the release_cb does not assure that: explicitly check for such condition. While at it, move the wnd_end initialization out of mptcp_rcv_space_init(), it never belonged there. Note that the race does not produce ill effect in practice, but change allows cleaning-up and defying better the locking model. Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12mptcp: drop the push_pending fieldPaolo Abeni
Such field is there to avoid acquiring the data lock in a few spots, but it adds complexity to the already non trivial locking schema. All the relevant call sites (mptcp-level re-injection, set socket options), are slow-path, drop such field in favor of 'cb_flags', adding the relevant locking. This patch could be seen as an improvement, instead of a fix. But it simplifies the next patch. The 'Fixes' tag has been added to help having this series backported to stable. Fixes: e9d09baca676 ("mptcp: avoid atomic bit manipulation when possible") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12net-device: move lstats in net_device_read_txrxEric Dumazet
dev->lstats is notably used from loopback ndo_start_xmit() and other virtual drivers. Per cpu stats updates are dirtying per-cpu data, but the pointer itself is read-only. Fixes: 43a71cd66b9c ("net-device: reorganize net_device fast path variables") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12tcp: move tp->scaling_ratio to tcp_sock_read_txrx groupEric Dumazet
tp->scaling_ratio is a read mostly field, used in rx and tx fast paths. Fixes: d5fed5addb2b ("tcp: reorganize tcp_sock fast path variables") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Wei Wang <weiwan@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-10net: tls: fix returned read length with async decryptJakub Kicinski
We double count async, non-zc rx data. The previous fix was lucky because if we fully zc async_copy_bytes is 0 so we add 0. Decrypted already has all the bytes we handled, in all cases. We don't have to adjust anything, delete the erroneous line. Fixes: 4d42cd6bc2ac ("tls: rx: fix return value for async crypto") Co-developed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-10net: tls: fix use-after-free with partial reads and async decryptSabrina Dubroca
tls_decrypt_sg doesn't take a reference on the pages from clear_skb, so the put_page() in tls_decrypt_done releases them, and we trigger a use-after-free in process_rx_list when we try to read from the partially-read skb. Fixes: fd31f3996af2 ("tls: rx: decrypt into a fresh skb") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-10net: tls: handle backlogging of crypto requestsJakub Kicinski
Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our requests to the crypto API, crypto_aead_{encrypt,decrypt} can return -EBUSY instead of -EINPROGRESS in valid situations. For example, when the cryptd queue for AESNI is full (easy to trigger with an artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued to the backlog but still processed. In that case, the async callback will also be called twice: first with err == -EINPROGRESS, which it seems we can just ignore, then with err == 0. Compared to Sabrina's original patch this version uses the new tls_*crypt_async_wait() helpers and converts the EBUSY to EINPROGRESS to avoid having to modify all the error handling paths. The handling is identical. Fixes: a54667f6728c ("tls: Add support for encryption using async offload accelerator") Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records") Co-developed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-10tls: fix race between tx work scheduling and socket closeJakub Kicinski
Similarly to previous commit, the submitting thread (recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete(). Reorder scheduling the work before calling complete(). This seems more logical in the first place, as it's the inverse order of what the submitting thread will do. Reported-by: valis <sec@valis.email> Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-10tls: fix race between async notify and socket closeJakub Kicinski
The submitting thread (one which called recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete() so any code past that point risks touching already freed data. Try to avoid the locking and extra flags altogether. Have the main thread hold an extra reference, this way we can depend solely on the atomic ref counter for synchronization. Don't futz with reiniting the completion, either, we are now tightly controlling when completion fires. Reported-by: valis <sec@valis.email> Fixes: 0cada33241d9 ("net/tls: fix race condition causing kernel panic") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-10net: tls: factor out tls_*crypt_async_wait()Jakub Kicinski
Factor out waiting for async encrypt and decrypt to finish. There are already multiple copies and a subsequent fix will need more. No functional changes. Note that crypto_wait_req() returns wait->err Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-09Merge tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Some fscrypt-related fixups (sparse reads are used only for encrypted files) and two cap handling fixes from Xiubo and Rishabh" * tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client: ceph: always check dir caps asynchronously ceph: prevent use-after-free in encode_cap_msg() ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT libceph: just wait for more data to be available on the socket libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*() libceph: fail sparse-read if the data length doesn't match
2024-02-09work around gcc bugs with 'asm goto' with outputsLinus Torvalds
We've had issues with gcc and 'asm goto' before, and we created a 'asm_volatile_goto()' macro for that in the past: see commits 3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation bug") and a9f180345f53 ("compiler/gcc4: Make quirk for asm_volatile_goto() unconditional"). Then, much later, we ended up removing the workaround in commit 43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR 58670") because we no longer supported building the kernel with the affected gcc versions, but we left the macro uses around. Now, Sean Christopherson reports a new version of a very similar problem, which is fixed by re-applying that ancient workaround. But the problem in question is limited to only the 'asm goto with outputs' cases, so instead of re-introducing the old workaround as-is, let's rename and limit the workaround to just that much less common case. It looks like there are at least two separate issues that all hit in this area: (a) some versions of gcc don't mark the asm goto as 'volatile' when it has outputs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 which is easy to work around by just adding the 'volatile' by hand. (b) Internal compiler errors: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422 which are worked around by adding the extra empty 'asm' as a barrier, as in the original workaround. but the problem Sean sees may be a third thing since it involves bad code generation (not an ICE) even with the manually added 'volatile'. but the same old workaround works for this case, even if this feels a bit like voodoo programming and may only be hiding the issue. Reported-and-tested-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Andrew Pinski <quic_apinski@quicinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-09net: fill in MODULE_DESCRIPTION()s for net/schedBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the network schedulers. Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20240208164244.3818498-8-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09net: fill in MODULE_DESCRIPTION()s for ipv4 modulesBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the IPv4 modules. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240208164244.3818498-7-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09net: fill in MODULE_DESCRIPTION()s for ipv6 modulesBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the IPv6 modules. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240208164244.3818498-6-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09net: fill in MODULE_DESCRIPTION()s for 6LoWPANBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to IPv6 over Low power Wireless Personal Area Network. Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240208164244.3818498-5-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09net: fill in MODULE_DESCRIPTION()s for af_keyBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the PF_KEY socket helpers. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240208164244.3818498-4-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09net: fill in MODULE_DESCRIPTION()s for mpoaBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Multi-Protocol Over ATM (MPOA) driver. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240208164244.3818498-3-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09net: fill in MODULE_DESCRIPTION()s for xfrmBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the XFRM interface drivers. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240208164244.3818498-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09net/sched: act_mirred: Don't zero blockid when net device is being deletedVictor Nogueira
While testing tdc with parallel tests for mirred to block we caught an intermittent bug. The blockid was being zeroed out when a net device was deleted and, thus, giving us an incorrect blockid value whenever we tried to dump the mirred action. Since we don't increment the block refcount in the control path (and only use the ID), we don't need to zero the blockid field whenever a net device is going down. Fixes: 42f39036cda8 ("net/sched: act_mirred: Allow mirred to block") Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20240207222902.1469398-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09net: openvswitch: limit the number of recursions from action setsAaron Conole
The ovs module allows for some actions to recursively contain an action list for complex scenarios, such as sampling, checking lengths, etc. When these actions are copied into the internal flow table, they are evaluated to validate that such actions make sense, and these calls happen recursively. The ovs-vswitchd userspace won't emit more than 16 recursion levels deep. However, the module has no such limit and will happily accept limits larger than 16 levels nested. Prevent this by tracking the number of recursions happening and manually limiting it to 16 levels nested. The initial implementation of the sample action would track this depth and prevent more than 3 levels of recursion, but this was removed to support the clone use case, rather than limited at the current userspace limit. Fixes: 798c166173ff ("openvswitch: Optimize sample action for the clone use cases") Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240207132416.1488485-2-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09Merge branch 'for-io_uring-add-napi-busy-polling-support'Jakub Kicinski
Merge netdev bits of io_uring busy polling support. Jens Axboe says: ==================== io_uring: add napi busy polling support I finally got around to testing this patchset in its current form, and results look fine to me. It Works. Using the basic ping/pong test that's part of the liburing addition, without enabling NAPI I get: Stock settings, no NAPI, 100k packets: rtt(us) min/avg/max/mdev = 31.730/37.006/87.960/0.497 and with -t10 -b enabled: rtt(us) min/avg/max/mdev = 23.250/29.795/63.511/1.203 In short, this patchset enables per io_uring NAPI enablement, rather than need to enable that globally. This allows targeted NAPI usage with io_uring. Here's Stefan's v15 posting, which predates this one: https://lore.kernel.org/io-uring/20230608163839.2891748-1-shr@devkernel.io/ ==================== Link: https://lore.kernel.org/r/20240206163422.646218-1-axboe@kernel.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09net: add napi_busy_loop_rcu()Stefan Roesch
This adds the napi_busy_loop_rcu() function. This function assumes that the calling function is already holding the rcu read lock and napi_busy_loop() does not need to take the rcu read lock. Add a NAPI_F_NO_SCHED flag, which tells __napi_busy_loop() to abort if we need to reschedule rather than drop the RCU read lock and reschedule. Signed-off-by: Stefan Roesch <shr@devkernel.io> Link: https://lore.kernel.org/r/20230608163839.2891748-3-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09net: split off __napi_busy_poll from napi_busy_pollStefan Roesch
This splits off the key part of the napi_busy_poll function into its own function, __napi_busy_poll, and changes the prefer_busy_poll bool to be flag based to allow passing in more flags in the future. This is done in preparation for an additional napi_busy_poll() function, that doesn't take the rcu_read_lock(). The new function is introduced in the next patch. Signed-off-by: Stefan Roesch <shr@devkernel.io> Link: https://lore.kernel.org/r/20230608163839.2891748-2-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>