summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-15rds: Acquire per-CPU pointer within BH disabled sectionSebastian Andrzej Siewior
rds_page_remainder_alloc() obtains the current CPU with get_cpu() while disabling preemption. Then the CPU number is used to access the per-CPU data structure via per_cpu(). This can be optimized by relying on local_bh_disable() to provide a stable CPU number/ avoid migration and then using this_cpu_ptr() to retrieve the data structure. Cc: Allison Henderson <allison.henderson@oracle.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-15-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15rds: Disable only bottom halves in rds_page_remainder_alloc()Sebastian Andrzej Siewior
rds_page_remainder_alloc() is invoked from a preemptible context or a tasklet. There is no need to disable interrupts for locking. Use local_bh_disable() instead of local_irq_save() for locking. Cc: Allison Henderson <allison.henderson@oracle.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-14-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15mptcp: Use nested-BH locking for hmac_storageSebastian Andrzej Siewior
mptcp_delegated_actions is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Matthieu Baerts <matttbe@kernel.org> Cc: Mat Martineau <martineau@kernel.org> Cc: Geliang Tang <geliang@kernel.org> Cc: mptcp@lists.linux.dev Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-13-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15net/sched: Use nested-BH locking for sch_frag_data_storageSebastian Andrzej Siewior
sch_frag_data_storage is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add local_lock_t to the struct and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-12-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15net/sched: act_mirred: Move the recursion counter struct netdev_xmitSebastian Andrzej Siewior
mirred_nest_level is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Move mirred_nest_level to struct netdev_xmit as u8, provide wrappers. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Link: https://patch.msgid.link/20250512092736.229935-11-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15openvswitch: Move ovs_frag_data_storage into the struct ovs_pcpu_storageSebastian Andrzej Siewior
ovs_frag_data_storage is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Move ovs_frag_data_storage into the struct ovs_pcpu_storage which already provides locking for the structure. Cc: Aaron Conole <aconole@redhat.com> Cc: Eelco Chaudron <echaudro@redhat.com> Cc: Ilya Maximets <i.maximets@ovn.org> Cc: dev@openvswitch.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250512092736.229935-10-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15openvswitch: Use nested-BH locking for ovs_pcpu_storageSebastian Andrzej Siewior
ovs_pcpu_storage is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. The data structure can be referenced recursive and there is a recursion counter to avoid too many recursions. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. Add an owner of the struct which is the current task and acquire the lock only if the structure is not owned by the current task. Cc: Aaron Conole <aconole@redhat.com> Cc: Eelco Chaudron <echaudro@redhat.com> Cc: Ilya Maximets <i.maximets@ovn.org> Cc: dev@openvswitch.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250512092736.229935-9-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15openvswitch: Merge three per-CPU structures into oneSebastian Andrzej Siewior
exec_actions_level is a per-CPU integer allocated at compile time. action_fifos and flow_keys are per-CPU pointer and have their data allocated at module init time. There is no gain in splitting it, once the module is allocated, the structures are allocated. Merge the three per-CPU variables into ovs_pcpu_storage, adapt callers. Cc: Aaron Conole <aconole@redhat.com> Cc: Eelco Chaudron <echaudro@redhat.com> Cc: Ilya Maximets <i.maximets@ovn.org> Cc: dev@openvswitch.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250512092736.229935-8-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15xfrm: Use nested-BH locking for nat_keepalive_sk_ipv[46]Sebastian Andrzej Siewior
nat_keepalive_sk_ipv[46] is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Use sock_bh_locked which has a sock pointer and a local_lock_t. Use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-7-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15xdp: Use nested-BH locking for system_page_poolSebastian Andrzej Siewior
system_page_pool is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a page_pool member (original system_page_pool) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-6-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15ipv6: sr: Use nested-BH locking for hmac_storageSebastian Andrzej Siewior
hmac_storage is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: David Ahern <dsahern@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-5-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15ipv4/route: Use this_cpu_inc() for stats on PREEMPT_RTSebastian Andrzej Siewior
The statistics are incremented with raw_cpu_inc() assuming it always happens with bottom half disabled. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this is no longer true. Use this_cpu_inc() on PREEMPT_RT for the increment to not worry about preemption. Cc: David Ahern <dsahern@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-4-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15net: dst_cache: Use nested-BH locking for dst_cache::cacheSebastian Andrzej Siewior
dst_cache::cache is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-3-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15net: page_pool: Don't recycle into cache on PREEMPT_RTSebastian Andrzej Siewior
With preemptible softirq and no per-CPU locking in local_bh_disable() on PREEMPT_RT the consumer can be preempted while a skb is returned. Avoid the race by disabling the recycle into the cache on PREEMPT_RT. Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-2-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15Merge tag 'mt76-fixes-2025-05-15' of https://github.com/nbd168/wirelessJohannes Berg
Felix Fietkau says: =================== mt76 fix for 6.15 - disable napi on driver removal to fix warning - fix multicast rx regression on mt7925 =================== Link: https://patch.msgid.link/3b526d06-b717-4d47-817c-a9f47b796a31@nbd.name/ Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-15Merge branch 'octeontx2-improve-mailbox-tracing'Paolo Abeni
Subbaraya Sundeep says: ==================== octeontx2: Improve mailbox tracing Octeontx2 VF,PF and AF devices communicate using hardware shared mailbox region where VFs can only to talk to its PFs and PFs can only talk to AF. AF does the entire resource management for all PFs and VFs. The shared mbox region is used for synchronous requests (requests from PF to AF or VF to PF) and async notifications (notifications from AF to PFs or PF to VFs). Sending a request to AF from VF involves various stages like 1. VF allocates message in shared region 2. Triggers interrupt to PF 3. PF upon receiving interrupt from VF will copy the message from VF<->PF region to PF<->AF region 4. Triggers interrupt to AF 5. AF processes it and writes response in PF<->AF region 6. Triggers interrupt to PF 7. PF copies responses from PF<->AF region to VF<->PF region 8. Triggers interrupt to Vf 9. VF reads response in VF<->PF region Due to various stages involved, Tracepoints are used in mailbox code for debugging. Existing tracepoints need some improvements so that maximum information can be inferred from trace logs during an issue. This patchset tries to enhance existing tracepoints and also adds a couple of tracepoints. ==================== Link: https://patch.msgid.link/1747136408-30685-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15octeontx2: Add new tracepoint otx2_msg_statusSubbaraya Sundeep
Apart from netdev interface Octeontx2 PF does the following: 1. Sends its own requests to AF and receives responses from AF. 2. Receives async messages from AF. 3. Forwards VF requests to AF, sends respective responses from AF to VFs. 4. Sends async messages to VFs. This patch adds new tracepoint otx2_msg_status to display the status of PF wrt mailbox handling. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/1747136408-30685-5-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15octeontx2: Add pcifunc also to mailbox tracepointsSubbaraya Sundeep
This patch adds pcifunc which represents PF and VF device to the tracepoints otx2_msg_alloc, otx2_msg_send, otx2_msg_process so that it is easier to correlate which device allocated the message, which device forwarded it and which device processed that message. Also add message id in otx2_msg_send tracepoint to check which message is sent at any point of time from a device. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/1747136408-30685-4-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15octeontx2-af: Display names for CPT and UP messagesSubbaraya Sundeep
Mailbox UP messages and CPT messages names are not being displayed with their names in trace log files. Add those messages too in otx2_mbox_id2name. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/1747136408-30685-3-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15octeontx2-af: convert dev_dbg to tracepoint in mboxSubbaraya Sundeep
Use tracepoint instead of dev_dbg since the entire mailbox code uses tracepoints for debugging. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/1747136408-30685-2-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_requestKees Cook
Make sure that n_channels is set after allocating the struct cfg80211_registered_device::int_scan_req member. Seen with syzkaller: UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:1208:5 index 0 is out of range for type 'struct ieee80211_channel *[] __counted_by(n_channels)' (aka 'struct ieee80211_channel *[]') This was missed in the initial conversions because I failed to locate the allocation likely due to the "sizeof(void *)" not matching the "channels" array type. Reported-by: syzbot+4bcdddd48bb6f0be0da1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/680fd171.050a0220.2b69d1.045e.GAE@google.com/ Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by") Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/20250509184641.work.542-kees@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-15net: Look for bonding slaves in the bond's network namespaceShay Drory
Update the for_each_netdev_in_bond_rcu macro to iterate through network devices in the bond's network namespace instead of always using init_net. This change is safe because: 1. **Bond-Slave Namespace Relationship**: A bond device and its slaves must reside in the same network namespace. The bond device's namespace is established at creation time and cannot change. 2. **Slave Movement Implications**: Any attempt to move a slave device to a different namespace automatically removes it from the bond, as per kernel networking stack rules. This maintains the invariant that slaves must exist in the same namespace as their bond. This change is part of an effort to enable Link Aggregation (LAG) to work properly inside custom network namespaces. Previously, the macro would only find slave devices in the initial network namespace, preventing proper bonding functionality in custom namespaces. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250513081922.525716-1-mbloch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15ovpn: fix check for skb_to_sgvec_nomark() return valueAntonio Quartulli
Depending on the data offset, skb_to_sgvec_nomark() may use less scatterlist elements than what was forecasted by the previous call to skb_cow_data(). It specifically happens when 'skbheadlen(skb) < offset', because in this case we entirely skip the skb's head, which would have required its own scatterlist element. For this reason, it doesn't make sense to check that skb_to_sgvec_nomark() returns the same value as skb_cow_data(), but we can rather check for errors only, as it happens in other parts of the kernel. Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-05-15ovpn: improve 'no route to host' debug messageAntonio Quartulli
When debugging a 'no route to host' error it can be beneficial to know the address of the unreachable destination. Print it along the debugging text. While at it, add a missing parenthesis in a different debugging message inside ovpn_peer_endpoints_update(). Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-05-15ovpn: drop useless reg_state check in keepalive workerAntonio Quartulli
The keepalive worker is cancelled before calling unregister_netdevice_queue(), therefore it will never hit a situation where the reg_state can be different than NETDEV_REGISTERED. For this reason, checking reg_state is useless and the condition can be removed. Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-05-15selftest/net/ovpn: extend coverage with more test casesAntonio Quartulli
To increase code coverage, extend the ovpn selftests with the following cases: * connect UDP peers using a mix of IPv6 and IPv4 at the transport layer * run full test with tunnel MTU equal to transport MTU (exercising IP layer fragmentation) * ping "LAN IP" served by VPN peer ("LAN behind a client" test case) Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-05-15ovpn: fix ndo_start_xmit return value on errorAntonio Quartulli
ndo_start_xmit is basically expected to always return NETDEV_TX_OK. However, in case of error, it was currently returning NET_XMIT_DROP, which is not a valid netdev_tx_t return value, leading to misinterpretation. Change ndo_start_xmit to always return NETDEV_TX_OK to signal back to the caller that the packet was handled (even if dropped). Effects of this bug can be seen when sending IPv6 packets having no peer to forward them to: $ ip netns exec ovpn-server oping -c20 fd00:abcd:220:201::1 PING fd00:abcd:220:201::1 (fd00:abcd:220:201::1) 56 bytes of data.00:abcd:220:201 :1 ping_send failed: No buffer space available ping_sendto: No buffer space available ping_send failed: No buffer space available ping_sendto: No buffer space available ... Fixes: c2d950c4672a ("ovpn: add basic interface creation/destruction/management routines") Reported-by: Gert Doering <gert@greenie.muc.de> Closes: https://github.com/OpenVPN/ovpn-net-next/issues/5 Tested-by: Gert Doering <gert@greenie.muc.de> Acked-by: Gert Doering <gert@greenie.muc.de> Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31591.html Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-05-15selftest/net/ovpn: fix crash in case of getaddrinfo() failureAntonio Quartulli
getaddrinfo() may fail with error code different from EAI_FAIL or EAI_NONAME, however in this case we still try to free the results object, thus leading to a crash. Fix this by bailing out on any possible error. Fixes: 959bc330a439 ("testing/selftests: add test tool and scripts for ovpn module") Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-05-15ovpn: don't drop skb's dst when xmitting packetAntonio Quartulli
When routing a packet to a LAN behind a peer, ovpn needs to inspect the route entry that brought the packet there in the first place. If this packet is truly routable, the route entry provides the GW to be used when looking up the VPN peer to send the packet to. However, the route entry is currently dropped before entering the ovpn xmit function, because the IFF_XMIT_DST_RELEASE priv_flag is enabled by default. Clear the IFF_XMIT_DST_RELEASE flag during interface setup to allow the route entry (skb's dst) to survive and thus be inspected by the ovpn routing logic. Fixes: a3aaef8cd173 ("ovpn: implement peer lookup logic") Reported-by: Gert Doering <gert@greenie.muc.de> Closes: https://github.com/OpenVPN/ovpn-net-next/issues/2 Tested-by: Gert Doering <gert@greenie.muc.de> Acked-by: Gert Doering <gert@greenie.muc.de> # as a primary user Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31583.html Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-05-15ovpn: set skb->ignore_df = 1 before sending IPv6 packets outAntonio Quartulli
IPv6 user packets (sent over the tunnel) may be larger than the outgoing interface MTU after encapsulation. When this happens ovpn should allow the kernel to fragment them because they are "locally generated". To achieve the above, we must set skb->ignore_df = 1 so that ip6_fragment() can be made aware of this decision. Failing to do so will result in ip6_fragment() dropping the packet thinking it was "routed". No change is required in the IPv4 path, because when calling udp_tunnel_xmit_skb() we already pass the 'df' argument set to 0, therefore the resulting datagram is allowed to be fragmented if need be. Fixes: 08857b5ec5d9 ("ovpn: implement basic TX path (UDP)") Reported-by: Gert Doering <gert@greenie.muc.de> Closes: https://github.com/OpenVPN/ovpn-net-next/issues/3 Tested-by: Gert Doering <gert@greenie.muc.de> Acked-by: Gert Doering <gert@greenie.muc.de> # as primary user Link: https://mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31577.html Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-05-15MAINTAINERS: update git URL for ovpnAntonio Quartulli
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-05-15MAINTAINERS: add Sabrina as official reviewer for ovpnAntonio Quartulli
Sabrina put quite some effort in reviewing the ovpn module during its official submission to netdev. For this reason she obtain extensive knowledge of the module architecture and implementation. Make her an official reviewer, so that I can be supported in reviewing and acking new patches. Acked-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-05-15Merge branch 'eth-fbnic-add-devlink-dev-flash-support'Paolo Abeni
Lee Trager says: ==================== eth: fbnic: Add devlink dev flash support fbnic supports updating firmware using signed PLDM images. PLDM images are written into the flash. Flashing does not interrupt the operation of the device. V4: https://lore.kernel.org/netdev/20250510002851.3247880-1-lee@trager.us/T/#t V3: https://lore.kernel.org/lkml/20241111043058.1251632-1-lee@trager.us/T/ V2: https://lore.kernel.org/all/20241022013941.3764567-1-lee@trager.us/ ==================== Link: https://patch.msgid.link/20250512190109.2475614-1-lee@trager.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15eth: fbnic: Add devlink dev flash supportLee Trager
Add support to update the CMRT and control firmware as well as the UEFI driver on fbnic using devlink dev flash. Make sure the shutdown / quiescence paths like suspend take the devlink lock to prevent them from interrupting the FW flashing process. Signed-off-by: Lee Trager <lee@trager.us> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250512190109.2475614-6-lee@trager.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15eth: fbnic: Add mailbox support for PLDM updatesLee Trager
Add three new mailbox messages to support PLDM upgrades: * FW_START_UPGRADE - Enables driver to request starting a firmware upgrade by specifying the component to be upgraded and its size. * WRITE_CHUNK - Allows firmware to request driver to send a chunk of data at the specified offset. * FINISH_UPGRADE - Allows firmware to cancel the upgrade process and return an error. Signed-off-by: Lee Trager <lee@trager.us> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250512190109.2475614-5-lee@trager.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15eth: fbnic: Add support for multiple concurrent completion messagesLee Trager
Extend fbnic mailbox to support multiple concurrent completion messages at once. This enables fbnic to support running multiple operations at once which depend on a response from firmware via the mailbox. Signed-off-by: Lee Trager <lee@trager.us> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250512190109.2475614-4-lee@trager.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15eth: fbnic: Accept minimum anti-rollback version from firmwareLee Trager
fbnic supports applying firmware which may not be rolled back. This is implemented in firmware however it is useful for the driver to know the minimum supported firmware version. This will enable the driver validate new firmware before it is sent to the NIC. If it is too old the driver can provide a clear message that the version is too old. Signed-off-by: Lee Trager <lee@trager.us> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250512190109.2475614-3-lee@trager.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15pldmfw: Don't require send_package_data or send_component_table to be definedLee Trager
Not all drivers require send_package_data or send_component_table when updating firmware. Instead of forcing drivers to implement a stub allow these functions to go undefined. Signed-off-by: Lee Trager <lee@trager.us> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250512190109.2475614-2-lee@trager.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15octeontx2-pf: Do not reallocate all ntuple filtersSubbaraya Sundeep
If ntuple filters count is modified followed by unicast filters count using devlink then the ntuple count set by user is ignored and all the ntuple filters are being reallocated. Fix this by storing the ntuple count set by user. Without this patch, say if user tries to modify ntuple count as 8 followed by ucast filter count as 4 using devlink commands then ntuple count is being reverted to default value 16 i.e, not retaining user set value 8. Fixes: 39c469188b6d ("octeontx2-pf: Add ucast filter count configurability via devlink.") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747054357-5850-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15net: phy: marvell-88q2xxx: Enable temperature measurement in probe againDimitri Fedrau
Enabling of the temperature sensor was moved from mv88q2xxx_hwmon_probe to mv88q222x_config_init with the consequence that the sensor is only usable when the PHY is configured. Enable the sensor in mv88q2xxx_hwmon_probe as well to fix this. Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://patch.msgid.link/20250512-marvell-88q2xxx-hwmon-enable-at-probe-v4-1-9256a5c8f603@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15wifi: mt76: mt7925: fix missing hdr_trans_tlv command for broadcast wtblMing Yen Hsieh
Ensure that the hdr_trans_tlv command is included in the broadcast wtbl to prevent the IPv6 and multicast packet from being dropped by the chip. Cc: stable@vger.kernel.org Fixes: cb1353ef3473 ("wifi: mt76: mt7925: integrate *mlo_sta_cmd and *sta_cmd") Reported-by: Benjamin Xiao <fossben@pm.me> Tested-by: Niklas Schnelle <niks@kernel.org> Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://lore.kernel.org/lkml/EmWnO5b-acRH1TXbGnkx41eJw654vmCR-8_xMBaPMwexCnfkvKCdlU5u19CGbaapJ3KRu-l3B-tSUhf8CCQwL0odjo6Cd5YG5lvNeB-vfdg=@pm.me/ Link: https://patch.msgid.link/20250509010421.403022-1-mingyen.hsieh@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-05-15wifi: mt76: disable napi on driver removalFedor Pchelkin
A warning on driver removal started occurring after commit 9dd05df8403b ("net: warn if NAPI instance wasn't shut down"). Disable tx napi before deleting it in mt76_dma_cleanup(). WARNING: CPU: 4 PID: 18828 at net/core/dev.c:7288 __netif_napi_del_locked+0xf0/0x100 CPU: 4 UID: 0 PID: 18828 Comm: modprobe Not tainted 6.15.0-rc4 #4 PREEMPT(lazy) Hardware name: ASUS System Product Name/PRIME X670E-PRO WIFI, BIOS 3035 09/05/2024 RIP: 0010:__netif_napi_del_locked+0xf0/0x100 Call Trace: <TASK> mt76_dma_cleanup+0x54/0x2f0 [mt76] mt7921_pci_remove+0xd5/0x190 [mt7921e] pci_device_remove+0x47/0xc0 device_release_driver_internal+0x19e/0x200 driver_detach+0x48/0x90 bus_remove_driver+0x6d/0xf0 pci_unregister_driver+0x2e/0xb0 __do_sys_delete_module.isra.0+0x197/0x2e0 do_syscall_64+0x7b/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e Tested with mt7921e but the same pattern can be actually applied to other mt76 drivers calling mt76_dma_cleanup() during removal. Tx napi is enabled in their *_dma_init() functions and only toggled off and on again inside their suspend/resume/reset paths. So it should be okay to disable tx napi in such a generic way. Found by Linux Verification Center (linuxtesting.org). Fixes: 2ac515a5d74f ("mt76: mt76x02: use napi polling for tx cleanup") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Tested-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://patch.msgid.link/20250506115540.19045-1-pchelkin@ispras.ru Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-05-14Merge tag 'kbuild-fixes-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Add proper pahole version dependency to CONFIG_GENDWARFKSYMS to avoid module loading errors - Fix UAPI header tests for the OpenRISC architecture - Add dependency on the libdw package in Debian and RPM packages - Disable -Wdefault-const-init-unsafe warnings on Clang - Make "make clean ARCH=um" also clean the arch/x86/ directory - Revert the use of -fmacro-prefix-map=, which causes issues with debugger usability * tag 'kbuild-fixes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: fix typos "module.builtin" to "modules.builtin" Revert "kbuild, rust: use -fremap-path-prefix to make paths relative" Revert "kbuild: make all file references relative to source root" kbuild: fix dependency on sorttable init: remove unused CONFIG_CC_CAN_LINK_STATIC um: let 'make clean' properly clean underlying SUBARCH as well kbuild: Disable -Wdefault-const-init-unsafe kbuild: rpm-pkg: Add (elfutils-devel or libdw-devel) to BuildRequires kbuild: deb-pkg: Add libdw-dev:native to Build-Depends-Arch usr/include: openrisc: don't HDRTEST bpf_perf_event.h kbuild: Require pahole <v1.28 or >v1.29 with GENDWARFKSYMS on X86
2025-05-14Merge branch 'hv_netvsc-fix-error-nvsp_rndis_pkt_complete-error-status-2'Jakub Kicinski
Michael Kelley says: ==================== hv_netvsc: Fix error "nvsp_rndis_pkt_complete error status: 2" Starting with commit dca5161f9bd0 in the 6.3 kernel, the Linux driver for Hyper-V synthetic networking (netvsc) occasionally reports "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates that Hyper-V has rejected a network packet transmit request from the guest, and the outgoing network packet is dropped. Higher level network protocols presumably recover and resend the packet so there is no functional error, but performance is slightly impacted. Commit dca5161f9bd0 is not the cause of the error -- it only added reporting of an error that was already happening without any notice. The error has presumably been present since the netvsc driver was originally introduced into Linux. This patch set fixes the root cause of the problem, which is that the netvsc driver in Linux may send an incorrectly formatted VMBus message to Hyper-V when transmitting the network packet. The incorrect formatting occurs when the rndis header of the VMBus message crosses a page boundary due to how the Linux skb head memory is aligned. In such a case, two PFNs are required to describe the location of the rndis header, even though they are contiguous in guest physical address (GPA) space. Hyper-V requires that two PFNs be in a single "GPA range" data struture, but current netvsc code puts each PFN in its own GPA range, which Hyper-V rejects as an error in the case of the rndis header. The incorrect formatting occurs only for larger packets that netvsc must transmit via a VMBus "GPA Direct" message. There's no problem when netvsc transmits a smaller packet by copying it into a pre- allocated send buffer slot because the pre-allocated slots don't have page crossing issues. After commit 14ad6ed30a10 in the 6.14 kernel, the error occurs much more frequently in VMs with 16 or more vCPUs. It may occur every few seconds, or even more frequently, in a ssh session that outputs a lot of text. Commit 14ad6ed30a10 subtly changes how skb head memory is allocated, making it much more likely that the rndis header will cross a page boundary when the vCPU count is 16 or more. The changes in commit 14ad6ed30a10 are perfectly valid -- they just had the side effect of making the netvsc bug more prominent. One fix is to check for adjacent PFNs in vmbus_sendpacket_pagebuffer() and just combine them into a single GPA range. Such a fix is very contained. But conceptually it is fixing the problem at the wrong level. So this patch set takes the broader approach of maintaining the already known grouping of contiguous PFNs at a higher level in the netvsc driver code, and propagating that grouping down to the creation of the VMBus message to send to Hyper-V. Maintaining the grouping fixes this problem, and has the added benefit of allowing netvsc_dma_map() to make fewer calls to dma_map_single() to do bounce buffering in CoCo VMs. Patch 1 is a preparatory change to allow vmbus_sendpacket_mpb_desc() to specify multiple GPA ranges. In current code vmbus_sendpacket_mpb_desc() is used only by the storvsc synthetic SCSI driver, and it always creates a single GPA range. Patch 2 updates the netvsc driver to use vmbus_sendpacket_mpb_desc() instead of vmbus_sendpacket_pagebuffer(). Because the higher levels of netvsc still don't group contiguous PFNs, this patch is functionally neutral. The VMBus message to Hyper-V still has many GPA ranges, each with a single PFN. But it lays the groundwork for the next patch. Patch 3 changes the higher levels of netvsc to preserve the already known grouping of contiguous PFNs. When the contiguous groupings are passed to vmbus_sendpacket_mpb_desc(), GPA ranges containing multiple PFNs are produced, as expected by Hyper-V. This is point at which the core problem is fixed. Patches 4 and 5 remove code that is no longer necessary after the previous patches. These changes provide a net reduction of about 65 lines of code, which is an added benefit. These changes have been tested in normal VMs, in SEV-SNP and TDX CoCo VMs, and in Dv6-series VMs where the netvsp implementation is in the OpenHCL paravisor instead of the Hyper-V host. These changes are built against kernel version 6.15-rc6. [1] https://bugzilla.kernel.org/show_bug.cgi?id=217503 ==================== Link: https://patch.msgid.link/20250513000604.1396-1-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()Michael Kelley
With the netvsc driver changed to use vmbus_sendpacket_mpb_desc() instead of vmbus_sendpacket_pagebuffer(), the latter has no remaining callers. Remove it. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-6-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14hv_netvsc: Remove rmsg_pgcntMichael Kelley
init_page_array() now always creates a single page buffer array entry for the rndis message, even if the rndis message crosses a page boundary. As such, the number of page buffer array entries used for the rndis message must no longer be tracked -- it is always just 1. Remove the rmsg_pgcnt field and use "1" where the value is needed. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-5-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14hv_netvsc: Preserve contiguous PFN grouping in the page buffer arrayMichael Kelley
Starting with commit dca5161f9bd0 ("hv_netvsc: Check status in SEND_RNDIS_PKT completion message") in the 6.3 kernel, the Linux driver for Hyper-V synthetic networking (netvsc) occasionally reports "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates that Hyper-V has rejected a network packet transmit request from the guest, and the outgoing network packet is dropped. Higher level network protocols presumably recover and resend the packet so there is no functional error, but performance is slightly impacted. Commit dca5161f9bd0 is not the cause of the error -- it only added reporting of an error that was already happening without any notice. The error has presumably been present since the netvsc driver was originally introduced into Linux. The root cause of the problem is that the netvsc driver in Linux may send an incorrectly formatted VMBus message to Hyper-V when transmitting the network packet. The incorrect formatting occurs when the rndis header of the VMBus message crosses a page boundary due to how the Linux skb head memory is aligned. In such a case, two PFNs are required to describe the location of the rndis header, even though they are contiguous in guest physical address (GPA) space. Hyper-V requires that two rndis header PFNs be in a single "GPA range" data struture, but current netvsc code puts each PFN in its own GPA range, which Hyper-V rejects as an error. The incorrect formatting occurs only for larger packets that netvsc must transmit via a VMBus "GPA Direct" message. There's no problem when netvsc transmits a smaller packet by copying it into a pre- allocated send buffer slot because the pre-allocated slots don't have page crossing issues. After commit 14ad6ed30a10 ("net: allow small head cache usage with large MAX_SKB_FRAGS values") in the 6.14-rc4 kernel, the error occurs much more frequently in VMs with 16 or more vCPUs. It may occur every few seconds, or even more frequently, in an ssh session that outputs a lot of text. Commit 14ad6ed30a10 subtly changes how skb head memory is allocated, making it much more likely that the rndis header will cross a page boundary when the vCPU count is 16 or more. The changes in commit 14ad6ed30a10 are perfectly valid -- they just had the side effect of making the netvsc bug more prominent. Current code in init_page_array() creates a separate page buffer array entry for each PFN required to identify the data to be transmitted. Contiguous PFNs get separate entries in the page buffer array, and any information about contiguity is lost. Fix the core issue by having init_page_array() construct the page buffer array to represent contiguous ranges rather than individual pages. When these ranges are subsequently passed to netvsc_build_mpb_array(), it can build GPA ranges that contain multiple PFNs, as required to avoid the error "nvsp_rndis_pkt_complete error status: 2". If instead the network packet is sent by copying into a pre-allocated send buffer slot, the copy proceeds using the contiguous ranges rather than individual pages, but the result of the copying is the same. Also fix rndis_filter_send_request() to construct a contiguous range, since it has its own page buffer array. This change has a side benefit in CoCo VMs in that netvsc_dma_map() calls dma_map_single() on each contiguous range instead of on each page. This results in fewer calls to dma_map_single() but on larger chunks of memory, which should reduce contention on the swiotlb. Since the page buffer array now contains one entry for each contiguous range instead of for each individual page, the number of entries in the array can be reduced, saving 208 bytes of stack space in netvsc_xmit() when MAX_SKG_FRAGS has the default value of 17. [1] https://bugzilla.kernel.org/show_bug.cgi?id=217503 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217503 Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-4-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messagesMichael Kelley
netvsc currently uses vmbus_sendpacket_pagebuffer() to send VMBus messages. This function creates a series of GPA ranges, each of which contains a single PFN. However, if the rndis header in the VMBus message crosses a page boundary, the netvsc protocol with the host requires that both PFNs for the rndis header must be in a single "GPA range" data structure, which isn't possible with vmbus_sendpacket_pagebuffer(). As the first step in fixing this, add a new function netvsc_build_mpb_array() to build a VMBus message with multiple GPA ranges, each of which may contain multiple PFNs. Use vmbus_sendpacket_mpb_desc() to send this VMBus message to the host. There's no functional change since higher levels of netvsc don't maintain or propagate knowledge of contiguous PFNs. Based on its input, netvsc_build_mpb_array() still produces a separate GPA range for each PFN and the behavior is the same as with vmbus_sendpacket_pagebuffer(). But the groundwork is laid for a subsequent patch to provide the necessary grouping. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-3-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple rangesMichael Kelley
vmbus_sendpacket_mpb_desc() is currently used only by the storvsc driver and is hardcoded to create a single GPA range. To allow it to also be used by the netvsc driver to create multiple GPA ranges, no longer hardcode as having a single GPA range. Allow the calling driver to specify the rangecount in the supplied descriptor. Update the storvsc driver to reflect this new approach. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-2-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14net: cpsw: isolate cpsw_ndo_ioctl() to just the old driverVladimir Oltean
cpsw->slaves[slave_no].phy should be equal to netdev->phydev, because it is assigned from phy_attach_direct(). The latter is indirectly called from the two identically named cpsw_slave_open() functions, one in cpsw.c and another in cpsw_new.c. Thus, the driver should not need custom logic to find the PHY, the core can find it, and phy_do_ioctl_running() achieves exactly that. However, that is only the case for cpsw_new and for the cpsw driver in dual EMAC mode. This is explained in more detail in the previous commit. Thus, allow the simpler core logic to execute for cpsw_new, and move cpsw_ndo_ioctl() to cpsw.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250512114422.4176010-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>