summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2024-03-06Bluetooth: Add new state HCI_POWERING_DOWNJonas Dreßler
Add a new state HCI_POWERING_DOWN that indicates that the device is currently powering down, this will be useful for the next commit. Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: Remove HCI_POWER_OFF_TIMEOUTJonas Dreßler
With commit cf75ad8b41d2 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED"), the power off sequence got refactored so that this timeout was no longer necessary, let's remove the leftover define from the header too. Fixes: cf75ad8b41d2 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED") Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06mac802154: fix llsec key resources release in mac802154_llsec_key_delFedor Pchelkin
mac802154_llsec_key_del() can free resources of a key directly without following the RCU rules for waiting before the end of a grace period. This may lead to use-after-free in case llsec_lookup_key() is traversing the list of keys in parallel with a key deletion: refcount_t: addition on 0; use-after-free. WARNING: CPU: 4 PID: 16000 at lib/refcount.c:25 refcount_warn_saturate+0x162/0x2a0 Modules linked in: CPU: 4 PID: 16000 Comm: wpan-ping Not tainted 6.7.0 #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:refcount_warn_saturate+0x162/0x2a0 Call Trace: <TASK> llsec_lookup_key.isra.0+0x890/0x9e0 mac802154_llsec_encrypt+0x30c/0x9c0 ieee802154_subif_start_xmit+0x24/0x1e0 dev_hard_start_xmit+0x13e/0x690 sch_direct_xmit+0x2ae/0xbc0 __dev_queue_xmit+0x11dd/0x3c20 dgram_sendmsg+0x90b/0xd60 __sys_sendto+0x466/0x4c0 __x64_sys_sendto+0xe0/0x1c0 do_syscall_64+0x45/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Also, ieee802154_llsec_key_entry structures are not freed by mac802154_llsec_key_del(): unreferenced object 0xffff8880613b6980 (size 64): comm "iwpan", pid 2176, jiffies 4294761134 (age 60.475s) hex dump (first 32 bytes): 78 0d 8f 18 80 88 ff ff 22 01 00 00 00 00 ad de x......."....... 00 00 00 00 00 00 00 00 03 00 cd ab 00 00 00 00 ................ backtrace: [<ffffffff81dcfa62>] __kmem_cache_alloc_node+0x1e2/0x2d0 [<ffffffff81c43865>] kmalloc_trace+0x25/0xc0 [<ffffffff88968b09>] mac802154_llsec_key_add+0xac9/0xcf0 [<ffffffff8896e41a>] ieee802154_add_llsec_key+0x5a/0x80 [<ffffffff8892adc6>] nl802154_add_llsec_key+0x426/0x5b0 [<ffffffff86ff293e>] genl_family_rcv_msg_doit+0x1fe/0x2f0 [<ffffffff86ff46d1>] genl_rcv_msg+0x531/0x7d0 [<ffffffff86fee7a9>] netlink_rcv_skb+0x169/0x440 [<ffffffff86ff1d88>] genl_rcv+0x28/0x40 [<ffffffff86fec15c>] netlink_unicast+0x53c/0x820 [<ffffffff86fecd8b>] netlink_sendmsg+0x93b/0xe60 [<ffffffff86b91b35>] ____sys_sendmsg+0xac5/0xca0 [<ffffffff86b9c3dd>] ___sys_sendmsg+0x11d/0x1c0 [<ffffffff86b9c65a>] __sys_sendmsg+0xfa/0x1d0 [<ffffffff88eadbf5>] do_syscall_64+0x45/0xf0 [<ffffffff890000ea>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 Handle the proper resource release in the RCU callback function mac802154_llsec_key_del_rcu(). Note that if llsec_lookup_key() finds a key, it gets a refcount via llsec_key_get() and locally copies key id from key_entry (which is a list element). So it's safe to call llsec_key_put() and free the list entry after the RCU grace period elapses. Found by Linux Verification Center (linuxtesting.org). Fixes: 5d637d5aabd8 ("mac802154: add llsec structures and mutators") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Alexander Aring <aahringo@redhat.com> Message-ID: <20240228163840.6667-1-pchelkin@ispras.ru> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2024-03-06inet: Add getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERTJuntong Deng
Currently getsockopt does not support IP_ROUTER_ALERT and IPV6_ROUTER_ALERT, and we are unable to get the values of these two socket options through getsockopt. This patch adds getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERT. Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-05net: add helpers for EEE configurationRussell King
Add helpers that phylib and phylink can use to manage EEE configuration and determine whether the MAC should be permitted to use LPI based on that configuration. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20240302195306.3207716-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05nfc: core: make nfc_class constantRicardo B. Marliere
Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the nfc_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-6-8fa378595b93@marliere.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05net: gro: enable fast path for more casesEric Dumazet
Currently the so-called GRO fast path is only enabled for napi_frags_skb() callers. After the prior patch, we no longer have to clear frag0 whenever we pulled bytes to skb->head. We therefore can initialize frag0 to skb->data so that GRO fast path can be used in the following additional cases: - Drivers using header split (populating skb->data with headers, and having payload in one or more page fragments). - Drivers not using any page frag (entire packet is in skb->data) Add a likely() in skb_gro_may_pull() to help the compiler to generate better code if possible. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-05net: gro: change skb_gro_network_header()Eric Dumazet
Change skb_gro_network_header() to accept a const sk_buff and to no longer check if frag0 is NULL or not. This allows to remove skb_gro_frag0_invalidate() which is seen in profiles when header-split is enabled. sk_buff parameter is constified for skb_gro_header_fast(), inet_gro_compute_pseudo() and ip6_gro_compute_pseudo(). Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-05net: gro: rename skb_gro_header_hard()Eric Dumazet
skb_gro_header_hard() is renamed to skb_gro_may_pull() to match the convention used by common helpers like pskb_may_pull(). This means the condition is inverted: if (skb_gro_header_hard(skb, hlen)) slow_path(); becomes: if (!skb_gro_may_pull(skb, hlen)) slow_path(); Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-04wifi: mac80211: introduce a feature flag for quiet in CSAJohannes Berg
When doing CSA in multi-link, there really isn't a need to stop transmissions entirely. Add a feature flag for drivers to indicate they can handle quiet in CSA (be it by parsing themselves, or by implementing drv_pre_channel_switch()), to make that possible. Also clean up the csa_block_tx handling: it clearly cannot handle multi-link due to the way queues are stopped, move it to the sdata. Drivers should be doing it themselves for working properly during CSA in MLO anyway. Also rename it to indicate that it reflects TX was blocked at mac80211. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240228095719.258439191541.I2469d206e2bf5cb244cfde2b4bbc2ae6d1cd3dd9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-04wifi: mac80211: pass link conf to abort_channel_switchJohannes Berg
Pass the link conf to the abort_channel_switch driver method so the driver can handle things correctly. Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240228095718.27f621106ddd.Iadd3d69b722ffe5934779a32a0e4e596a4e33ed4@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-04wifi: mac80211: pass link_id to channel switch opsJohannes Berg
For CSA to work correctly in multi-link scenarios, pass the link_id to the relevant callbacks. While at it, unify/deduplicate the tracing for them. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://msgid.link/20240228095718.b7726635c054.I0be5d00af4acb48cfbd23a9dbf067f9aeb66469d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-04wifi: cfg80211: allow cfg80211_defragment_element() without outputJohannes Berg
If we just want to determine the length of the fragmented data, we basically need the same logic, and really we want it to be _literally_ the same logic, so it cannot be out of sync in any way. Allow calling cfg80211_defragment_element() without an output buffer, where it then just returns the required output size. Also add this to the tests, just to exercise it, using the pre-calculated length to really do the defragmentation, which checks that this is sufficient. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://msgid.link/20240228095718.6d6565b9e3f2.Ib441903f4b8644ba04b1c766f90580ee6f54fc66@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-04wifi: cfg80211: expose cfg80211_iter_rnr() to driversJohannes Berg
In mac80211 we'll need to look at reduced neighbor report entries for channel switch purposes, so export the iteration function to make that simpler. Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240228095718.0954809964ef.I53e95c017aa71f14e8d1057afbbc75982ddb43df@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-04wifi: mac80211: add ieee80211_vif_link_active() helperJohannes Berg
We sometimes need to check if a link is active, and this is complicated by the fact that active_links has no bits set when the vif isn't (acting as) an MLD. Add a small new helper ieee80211_vif_link_active() to make that a bit easier, and use it in a few places. Reviewed-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240228094901.688760aff5f7.I06892a503f5ecb9563fbd678d35d08daf7a044b0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-04wifi: mac80211: add link id to ieee80211_gtk_rekey_add()Shaul Triebitz
In MLO, we need the link id in the GTK key to be given by the driver after rekeying in wowlan, so add that. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240228094500.ce1bfc83a680.I43a6f8ab2804ee07116a37d5b9ec601b843464b1@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-01net: bql: fix building with BQL disabledArnd Bergmann
It is now possible to disable BQL, but that causes the cpsw driver to break: drivers/net/ethernet/ti/am65-cpsw-nuss.c:297:28: error: no member named 'dql' in 'struct netdev_queue' 297 | dql_avail(&netif_txq->dql), There is already a helper function in net/sch_generic.h that could be used to help here. Move its implementation into the common linux/netdevice.h along with the other bql interfaces and change both users over to the new interface. Fixes: ea7f3cfaa588 ("net: bql: allow the config to be disabled") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around idev->cnf.ignore_routes_with_linkdownEric Dumazet
idev->cnf.ignore_routes_with_linkdown can be used without any locks, add appropriate annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around cnf.forwardingEric Dumazet
idev->cnf.forwarding and net->ipv6.devconf_all->forwarding might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around cnf.mtu6Eric Dumazet
idev->cnf.mtu6 might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields") https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/ Adjacent changes: drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order") drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers") drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists") net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28net: ioam6: multicast eventJustin Iurman
Add a multicast group to the ioam6 generic netlink family and provide ioam6_event() to send an ioam6 event to the multicast group. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28tcp: make the dropreason really work when calling tcp_rcv_state_process()Jason Xing
Update three callers including both ipv4 and ipv6 and let the dropreason mechanism work in reality. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28tcp: add dropreasons in tcp_rcv_state_process()Jason Xing
In this patch, I equipped this function with more dropreasons, but it still doesn't work yet, which I will do later. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28tcp: introduce dropreasons in receive pathJason Xing
Soon later patches can use these relatively more accurate reasons to recognise and find out the cause. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28tcp: add a dropreason definitions and prepare for cookie checkJason Xing
Adding one drop reason to detect the condition of skb dropped because of hook points in cookie check and extending NO_SOCKET to consider another two cases can be used later. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28net: make SK_MEMORY_PCPU_RESERV tunableAdam Li
This patch adds /proc/sys/net/core/mem_pcpu_rsv sysctl file, to make SK_MEMORY_PCPU_RESERV tunable. Commit 3cd3399dd7a8 ("net: implement per-cpu reserves for memory_allocated") introduced per-cpu forward alloc cache: "Implement a per-cpu cache of +1/-1 MB, to reduce number of changes to sk->sk_prot->memory_allocated, which would otherwise be cause of false sharing." sk_prot->memory_allocated points to global atomic variable: atomic_long_t tcp_memory_allocated ____cacheline_aligned_in_smp; If increasing the per-cpu cache size from 1MB to e.g. 16MB, changes to sk->sk_prot->memory_allocated can be further reduced. Performance may be improved on system with many cores. Signed-off-by: Adam Li <adamli@os.amperecomputing.com> Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26ipv6: anycast: complete RCU handling of struct ifacaddr6Eric Dumazet
struct ifacaddr6 are already freed after RCU grace period. Add __rcu qualifier to aca_next pointer, and idev->ac_list Add relevant rcu_assign_pointer() and dereference accessors. ipv6_chk_acast_dev() no longer needs to acquire idev->lock. /proc/net/anycast6 is now purely RCU protected, it no longer acquires idev->lock. Similarly in6_dump_addrs() can use RCU protection to iterate through anycast addresses. It was relying on a mixture of RCU and RTNL but next patches will get rid of RTNL there. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223201054.220534-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26nexthop: allow nexthop_mpath_fill_node() to be called without RTNLEric Dumazet
nexthop_mpath_fill_node() will be potentially called from contexts holding rcu_lock instead of RTNL. Suggested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/all/ZdZDWVdjMaQkXBgW@shredder/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCUEric Dumazet
Add a new field into struct fib_dump_filter, to let callers tell if they use RTNL locking or RCU. This is used in the following patch, when inet_dump_fib() no longer holds RTNL. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flagEric Dumazet
Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag allows dump operations registered via rtnl_register() or rtnl_register_module() to opt-out from RTNL protection. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-23genetlink: make info in GENL_REQ_ATTR_CHECK() constJakub Kicinski
Make the local variable in GENL_REQ_ATTR_CHECK() const. genl_info_dump() returns a const pointer, so the macro is currently hard to use in genl dumps. Link: https://lore.kernel.org/r/20240222222819.156320-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22net: mctp: take ownership of skb in mctp_local_outputJeremy Kerr
Currently, mctp_local_output only takes ownership of skb on success, and we may leak an skb if mctp_local_output fails in specific states; the skb ownership isn't transferred until the actual output routing occurs. Instead, make mctp_local_output free the skb on all error paths up to the route action, so it always consumes the passed skb. Fixes: 833ef3b91de6 ("mctp: Populate socket implementation") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240220081053.1439104-1-jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22Merge tag 'nf-next-24-02-21' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for net-next 1. Prefer KMEM_CACHE() macro to create kmem caches, from Kunwu Chan. Patches 2 and 3 consolidate nf_log NULL checks and introduces extra boundary checks on family and type to make it clear that no out of bounds access will happen. No in-tree user currently passes such values, but thats not clear from looking at the function. From Pablo Neira Ayuso. Patch 4, also from Pablo, gets rid of unneeded conditional in nft_osf init function. Patch 5, from myself, fixes erroneous Kconfig dependencies that came in an earlier net-next pull request. This should get rid of the xtables related build failure reports. Patches 6 to 10 are an update to nftables' concatenated-ranges set type to speed up element insertions. This series also compacts a few data structures and cleans up a few oddities such as reliance on ZERO_SIZE_PTR when asking to allocate a set with no elements. From myself. Patches 11 moves the nf_reinject function from the netfilter core (vmlinux) into the nfnetlink_queue backend, the only location where this is called from. Also from myself. Patch 12, from Kees Cook, switches xtables' compat layer to use unsafe_memcpy because xt_entry_target cannot easily get converted to a real flexible array (its UAPI and used inside other structs). * tag 'nf-next-24-02-21' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: x_tables: Use unsafe_memcpy() for 0-sized destination netfilter: move nf_reinject into nfnetlink_queue modules netfilter: nft_set_pipapo: use GFP_KERNEL for insertions netfilter: nft_set_pipapo: speed up bulk element insertions netfilter: nft_set_pipapo: shrink data structures netfilter: nft_set_pipapo: do not rely on ZERO_SIZE_PTR netfilter: nft_set_pipapo: constify lookup fn args where possible netfilter: xtables: fix up kconfig dependencies netfilter: nft_osf: simplify init path netfilter: nf_log: validate nf_logger_find_get() netfilter: nf_log: consolidate check for NULL logger in lookup function netfilter: expect: Simplify the allocation of slab caches in nf_conntrack_expect_init ==================== Link: https://lore.kernel.org/r/20240221112637.5396-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/udp.c f796feabb9f5 ("udp: add local "peek offset enabled" flag") 56667da7399e ("net: implement lockless setsockopt(SO_PEEK_OFF)") Adjacent changes: net/unix/garbage.c aa82ac51d633 ("af_unix: Drop oob_skb ref before purging queue in GC.") 11498715f266 ("af_unix: Remove io_uring code for GC.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22Merge tag 'wireless-next-2024-02-22' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.9 The third "new features" pull request for v6.9. This is a quick followup to send commit 04edb5dc68f4 ("wifi: ath12k: Fix uninitialized use of ret in ath12k_mac_allocate()") to fix the ath12k clang warning introduced in the previous pull request. We also have support for QCA2066 in ath11k, several new features in ath12k and few other changes in drivers. In stack it's mostly cleanup and refactoring. Major changes: ath12k * firmware-2.bin support * support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) * QCN9274: support split-PHY devices * WCN7850: enable Power Save Mode in station mode * WCN7850: P2P support ath11k: * QCA6390 & WCN6855: support 2 concurrent station interfaces * QCA2066 support iwlwifi * mvm: support wider-bandwidth OFDMA * bump firmware API to 90 for BZ/SC devices brcmfmac * DMI nvram filename quirk for ACEPC W5 Pro * tag 'wireless-next-2024-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (75 commits) wifi: wilc1000: revert reset line logic flip wifi: brcmfmac: Add DMI nvram filename quirk for ACEPC W5 Pro wifi: rtlwifi: set initial values for unexpected cases of USB endpoint priority wifi: rtl8xxxu: check vif before using in rtl8xxxu_tx() wifi: rtlwifi: rtl8192cu: Fix TX aggregation wifi: wilc1000: remove AKM suite be32 conversion for external auth request wifi: nl80211: refactor parsing CSA offsets wifi: nl80211: force WLAN_AKM_SUITE_SAE in big endian in NL80211_CMD_EXTERNAL_AUTH wifi: iwlwifi: load b0 version of ucode for HR1/HR2 wifi: iwlwifi: handle per-phy statistics from fw wifi: iwlwifi: iwl-fh.h: fix kernel-doc issues wifi: iwlwifi: api: fix kernel-doc reference wifi: iwlwifi: mvm: unlock mvm if there is no primary link wifi: iwlwifi: bump FW API to 90 for BZ/SC devices wifi: iwlwifi: mvm: support PHY context version 6 wifi: iwlwifi: mvm: partially support PHY context version 6 wifi: iwlwifi: mvm: support wider-bandwidth OFDMA wifi: cfg80211: use ML element parsing helpers wifi: mac80211: align ieee80211_mle_get_bss_param_ch_cnt() wifi: cfg80211: refactor RNR parsing ... ==================== Link: https://lore.kernel.org/r/20240222105205.CEC54C433F1@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22net: mctp: separate key correlation across netsJeremy Kerr
Currently, we lookup sk_keys from the entire struct net_namespace, which may contain multiple MCTP net IDs. In those cases we want to distinguish between endpoints with the same EID but different net ID. Add the net ID data to the struct mctp_sk_key, populate on add and filter on this during route lookup. For the ioctl interface, we use a default net of MCTP_INITIAL_DEFAULT_NET (ie., what will be in use for single-net configurations), but we'll extend the ioctl interface to provide net-specific tag allocation in an upcoming change. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: avoid confusion over local/peer dest/source addressesJeremy Kerr
We have a double-swap of local and peer addresses in mctp_alloc_local_tag; the arguments in both call sites are swapped, but there is also a swap in the implementation of alloc_local_tag. This is opaque because we're using source/dest address references, which don't match the local/peer semantics. Avoid this confusion by naming the arguments as 'local' and 'peer', and remove the double swap. The calling order now matches mctp_key_alloc. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22netfilter: nft_flow_offload: reset dst in route object after setting up flowPablo Neira Ayuso
dst is transferred to the flow object, route object does not own it anymore. Reset dst in route object, otherwise if flow_offload_add() fails, error path releases dst twice, leading to a refcount underflow. Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-21wifi: cfg80211: clean up cfg80211_inform_bss_frame_data()Johannes Berg
Make cfg80211_inform_bss_frame_data() call the existing cfg80211_inform_bss_data() after parsing the frame in the appropriate way, so we have less code duplication. This required introducing a new CFG80211_BSS_FTYPE_S1G_BEACON, but that can be used by other drivers as well. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240216135047.874aed1eff5f.Ib7d88d126eec50c64763251a78cb432bb5df14df@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21wifi: mac80211: make associated BSS pointer visible to the driverMiri Korenblit
Some drivers need the data in it, so move it to the link conf, which is exposed to the driver. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240206164849.6fe9782b87b4.Ifbffef638f07ca7f5c2b27f40d2cf2942d21de0b@changeid [remove bss pointer from internal struct, update docs] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21wifi: mac80211: check beacon countdown is complete on per link basisAditya Kumar Singh
Currently, function to check if beacon countdown is complete uses deflink to fetch the beacon and check the counter. However, with MLO, there is a need to check the counter for the beacon in a particular link. Add support to use link_id in order to fetch the beacon from a particular link data. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://msgid.link/20240216144621.514385-2-quic_adisi@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21Merge tag 'wireless-next-2024-02-20' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.9 The second "new features" pull request for v6.9. Lots of iwlwifi and stack changes this time. And naturally smaller changes to other drivers. We also twice merged wireless into wireless-next to avoid conflicts between the trees. Major changes: stack * mac80211: negotiated TTLM request support * SPP A-MSDU support * mac80211: wider bandwidth OFDMA config support iwlwifi * kunit tests * bump FW API to 89 for AX/BZ/SC devices * enable SPP A-MSDUs * support for new devices ath12k * refactoring in preparation for Multi-Link Operation (MLO) support * 1024 Block Ack window size support * provide firmware wmi logs via a trace event ath11k * 36 bit DMA mask support * support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) rtl8xxxu * TP-Link TL-WN823N V2 support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-21netfilter: move nf_reinject into nfnetlink_queue modulesFlorian Westphal
No need to keep this in the core, move it to the nfnetlink_queue module. nf_reroute is moved too, there were no other callers. Signed-off-by: Florian Westphal <fw@strlen.de>
2024-02-20net: reorganize "struct sock" fieldsEric Dumazet
Last major reorg happened in commit 9115e8cd2a0c ("net: reorganize struct sock for better data locality") Since then, many changes have been done. Before SO_PEEK_OFF support is added to TCP, we need to move sk_peek_off to a better location. It is time to make another pass, and add six groups, without explicit alignment. - sock_write_rx (following sk_refcnt) read-write fields in rx path. - sock_read_rx read-mostly fields in rx path. - sock_read_rxtx read-mostly fields in both rx and tx paths. - sock_write_rxtx read-write fields in both rx and tx paths. - sock_write_tx read-write fields in tx paths. - sock_read_tx read-mostly fields in tx paths. Results on TCP_RR benchmarks seem to show a gain (4 to 5 %). It is possible UDP needs a change, because sk_peek_off shares a cache line with sk_receive_queue. If this the case, we can exchange roles of sk->sk_receive and up->reader_queue queues. After this change, we have the following layout: struct sock { struct sock_common __sk_common; /* 0 0x88 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ __u8 __cacheline_group_begin__sock_write_rx[0]; /* 0x88 0 */ atomic_t sk_drops; /* 0x88 0x4 */ __s32 sk_peek_off; /* 0x8c 0x4 */ struct sk_buff_head sk_error_queue; /* 0x90 0x18 */ struct sk_buff_head sk_receive_queue; /* 0xa8 0x18 */ /* --- cacheline 3 boundary (192 bytes) --- */ struct { atomic_t rmem_alloc; /* 0xc0 0x4 */ int len; /* 0xc4 0x4 */ struct sk_buff * head; /* 0xc8 0x8 */ struct sk_buff * tail; /* 0xd0 0x8 */ } sk_backlog; /* 0xc0 0x18 */ struct { atomic_t rmem_alloc; /* 0 0x4 */ int len; /* 0x4 0x4 */ struct sk_buff * head; /* 0x8 0x8 */ struct sk_buff * tail; /* 0x10 0x8 */ /* size: 24, cachelines: 1, members: 4 */ /* last cacheline: 24 bytes */ }; __u8 __cacheline_group_end__sock_write_rx[0]; /* 0xd8 0 */ __u8 __cacheline_group_begin__sock_read_rx[0]; /* 0xd8 0 */ rcu * sk_rx_dst; /* 0xd8 0x8 */ int sk_rx_dst_ifindex; /* 0xe0 0x4 */ u32 sk_rx_dst_cookie; /* 0xe4 0x4 */ unsigned int sk_ll_usec; /* 0xe8 0x4 */ unsigned int sk_napi_id; /* 0xec 0x4 */ u16 sk_busy_poll_budget; /* 0xf0 0x2 */ u8 sk_prefer_busy_poll; /* 0xf2 0x1 */ u8 sk_userlocks; /* 0xf3 0x1 */ int sk_rcvbuf; /* 0xf4 0x4 */ rcu * sk_filter; /* 0xf8 0x8 */ /* --- cacheline 4 boundary (256 bytes) --- */ union { rcu * sk_wq; /* 0x100 0x8 */ struct socket_wq * sk_wq_raw; /* 0x100 0x8 */ }; /* 0x100 0x8 */ union { rcu * sk_wq; /* 0 0x8 */ struct socket_wq * sk_wq_raw; /* 0 0x8 */ }; void (*sk_data_ready)(struct sock *); /* 0x108 0x8 */ long sk_rcvtimeo; /* 0x110 0x8 */ int sk_rcvlowat; /* 0x118 0x4 */ __u8 __cacheline_group_end__sock_read_rx[0]; /* 0x11c 0 */ __u8 __cacheline_group_begin__sock_read_rxtx[0]; /* 0x11c 0 */ int sk_err; /* 0x11c 0x4 */ struct socket * sk_socket; /* 0x120 0x8 */ struct mem_cgroup * sk_memcg; /* 0x128 0x8 */ rcu * sk_policy[2]; /* 0x130 0x10 */ /* --- cacheline 5 boundary (320 bytes) --- */ __u8 __cacheline_group_end__sock_read_rxtx[0]; /* 0x140 0 */ __u8 __cacheline_group_begin__sock_write_rxtx[0]; /* 0x140 0 */ socket_lock_t sk_lock; /* 0x140 0x20 */ u32 sk_reserved_mem; /* 0x160 0x4 */ int sk_forward_alloc; /* 0x164 0x4 */ u32 sk_tsflags; /* 0x168 0x4 */ __u8 __cacheline_group_end__sock_write_rxtx[0]; /* 0x16c 0 */ __u8 __cacheline_group_begin__sock_write_tx[0]; /* 0x16c 0 */ int sk_write_pending; /* 0x16c 0x4 */ atomic_t sk_omem_alloc; /* 0x170 0x4 */ int sk_sndbuf; /* 0x174 0x4 */ int sk_wmem_queued; /* 0x178 0x4 */ refcount_t sk_wmem_alloc; /* 0x17c 0x4 */ /* --- cacheline 6 boundary (384 bytes) --- */ unsigned long sk_tsq_flags; /* 0x180 0x8 */ union { struct sk_buff * sk_send_head; /* 0x188 0x8 */ struct rb_root tcp_rtx_queue; /* 0x188 0x8 */ }; /* 0x188 0x8 */ union { struct sk_buff * sk_send_head; /* 0 0x8 */ struct rb_root tcp_rtx_queue; /* 0 0x8 */ }; struct sk_buff_head sk_write_queue; /* 0x190 0x18 */ u32 sk_dst_pending_confirm; /* 0x1a8 0x4 */ u32 sk_pacing_status; /* 0x1ac 0x4 */ struct page_frag sk_frag; /* 0x1b0 0x10 */ /* --- cacheline 7 boundary (448 bytes) --- */ struct timer_list sk_timer; /* 0x1c0 0x28 */ /* XXX last struct has 4 bytes of padding */ unsigned long sk_pacing_rate; /* 0x1e8 0x8 */ atomic_t sk_zckey; /* 0x1f0 0x4 */ atomic_t sk_tskey; /* 0x1f4 0x4 */ __u8 __cacheline_group_end__sock_write_tx[0]; /* 0x1f8 0 */ __u8 __cacheline_group_begin__sock_read_tx[0]; /* 0x1f8 0 */ unsigned long sk_max_pacing_rate; /* 0x1f8 0x8 */ /* --- cacheline 8 boundary (512 bytes) --- */ long sk_sndtimeo; /* 0x200 0x8 */ u32 sk_priority; /* 0x208 0x4 */ u32 sk_mark; /* 0x20c 0x4 */ rcu * sk_dst_cache; /* 0x210 0x8 */ netdev_features_t sk_route_caps; /* 0x218 0x8 */ u16 sk_gso_type; /* 0x220 0x2 */ u16 sk_gso_max_segs; /* 0x222 0x2 */ unsigned int sk_gso_max_size; /* 0x224 0x4 */ gfp_t sk_allocation; /* 0x228 0x4 */ u32 sk_txhash; /* 0x22c 0x4 */ u8 sk_pacing_shift; /* 0x230 0x1 */ bool sk_use_task_frag; /* 0x231 0x1 */ __u8 __cacheline_group_end__sock_read_tx[0]; /* 0x232 0 */ u8 sk_gso_disabled:1; /* 0x232: 0 0x1 */ u8 sk_kern_sock:1; /* 0x232:0x1 0x1 */ u8 sk_no_check_tx:1; /* 0x232:0x2 0x1 */ u8 sk_no_check_rx:1; /* 0x232:0x3 0x1 */ /* XXX 4 bits hole, try to pack */ u8 sk_shutdown; /* 0x233 0x1 */ u16 sk_type; /* 0x234 0x2 */ u16 sk_protocol; /* 0x236 0x2 */ unsigned long sk_lingertime; /* 0x238 0x8 */ /* --- cacheline 9 boundary (576 bytes) --- */ struct proto * sk_prot_creator; /* 0x240 0x8 */ rwlock_t sk_callback_lock; /* 0x248 0x8 */ int sk_err_soft; /* 0x250 0x4 */ u32 sk_ack_backlog; /* 0x254 0x4 */ u32 sk_max_ack_backlog; /* 0x258 0x4 */ kuid_t sk_uid; /* 0x25c 0x4 */ spinlock_t sk_peer_lock; /* 0x260 0x4 */ int sk_bind_phc; /* 0x264 0x4 */ struct pid * sk_peer_pid; /* 0x268 0x8 */ const struct cred * sk_peer_cred; /* 0x270 0x8 */ ktime_t sk_stamp; /* 0x278 0x8 */ /* --- cacheline 10 boundary (640 bytes) --- */ int sk_disconnects; /* 0x280 0x4 */ u8 sk_txrehash; /* 0x284 0x1 */ u8 sk_clockid; /* 0x285 0x1 */ u8 sk_txtime_deadline_mode:1; /* 0x286: 0 0x1 */ u8 sk_txtime_report_errors:1; /* 0x286:0x1 0x1 */ u8 sk_txtime_unused:6; /* 0x286:0x2 0x1 */ /* XXX 1 byte hole, try to pack */ void * sk_user_data; /* 0x288 0x8 */ void * sk_security; /* 0x290 0x8 */ struct sock_cgroup_data sk_cgrp_data; /* 0x298 0x8 */ void (*sk_state_change)(struct sock *); /* 0x2a0 0x8 */ void (*sk_write_space)(struct sock *); /* 0x2a8 0x8 */ void (*sk_error_report)(struct sock *); /* 0x2b0 0x8 */ int (*sk_backlog_rcv)(struct sock *, struct sk_buff *); /* 0x2b8 0x8 */ /* --- cacheline 11 boundary (704 bytes) --- */ void (*sk_destruct)(struct sock *); /* 0x2c0 0x8 */ rcu * sk_reuseport_cb; /* 0x2c8 0x8 */ rcu * sk_bpf_storage; /* 0x2d0 0x8 */ struct callback_head sk_rcu __attribute__((__aligned__(8))); /* 0x2d8 0x10 */ netns_tracker ns_tracker; /* 0x2e8 0x8 */ /* size: 752, cachelines: 12, members: 105 */ /* sum members: 749, holes: 1, sum holes: 1 */ /* sum bitfield members: 12 bits, bit holes: 1, sum bit holes: 4 bits */ /* paddings: 1, sum paddings: 4 */ /* forced alignments: 1 */ /* last cacheline: 48 bytes */ }; Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240216162006.2342759-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-20net: introduce abstraction for network memoryMina Almasry
Add the netmem_ref type, an abstraction for network memory. To add support for new memory types to the net stack, we must first abstract the current memory type. Currently parts of the net stack use struct page directly: - page_pool - drivers - skb_frag_t Originally the plan was to reuse struct page* for the new memory types, and to set the LSB on the page* to indicate it's not really a page. However, for compiler type checking we need to introduce a new type. netmem_ref is introduced to abstract the underlying memory type. Currently it's a no-op abstraction that is always a struct page underneath. In parallel there is an undergoing effort to add support for devmem to the net stack: https://lore.kernel.org/netdev/20231208005250.2910004-1-almasrymina@google.com/ netmem_ref can be pointers to different underlying memory types, and the low bits are set to indicate the memory type. Helpers are provided to convert netmem pointers to the underlying memory type (currently only struct page). In the devmem series helpers are provided so that calling code can use netmem without worrying about the underlying memory type unless absolutely necessary. Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-19net: page_pool: fix recycle stats for system page_pool allocatorLorenzo Bianconi
Use global percpu page_pool_recycle_stats counter for system page_pool allocator instead of allocating a separate percpu variable for each (also percpu) page pool instance. Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/87f572425e98faea3da45f76c3c68815c01a20ee.1708075412.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-19page_pool: disable direct recycling based on pool->cpuid on destroyAlexander Lobakin
Now that direct recycling is performed basing on pool->cpuid when set, memory leaks are possible: 1. A pool is destroyed. 2. Alloc cache is emptied (it's done only once). 3. pool->cpuid is still set. 4. napi_pp_put_page() does direct recycling basing on pool->cpuid. 5. Now alloc cache is not empty, but it won't ever be freed. In order to avoid that, rewrite pool->cpuid to -1 when unlinking NAPI to make sure no direct recycling will be possible after emptying the cache. This involves a bit of overhead as pool->cpuid now must be accessed via READ_ONCE() to avoid partial reads. Rename page_pool_unlink_napi() -> page_pool_disable_direct_recycling() to reflect what it actually does and unexport it. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20240215113905.96817-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-18mptcp: fix lockless access in subflow ULP diagPaolo Abeni
Since the introduction of the subflow ULP diag interface, the dump callback accessed all the subflow data with lockless. We need either to annotate all the read and write operation accordingly, or acquire the subflow socket lock. Let's do latter, even if slower, to avoid a diffstat havoc. Fixes: 5147dfb50832 ("mptcp: allow dumping subflow context to userspace") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16net: bridge: switchdev: Skip MDB replays of deferred events on offloadTobias Waldekranz
Before this change, generation of the list of MDB events to replay would race against the creation of new group memberships, either from the IGMP/MLD snooping logic or from user configuration. While new memberships are immediately visible to walkers of br->mdb_list, the notification of their existence to switchdev event subscribers is deferred until a later point in time. So if a replay list was generated during a time that overlapped with such a window, it would also contain a replay of the not-yet-delivered event. The driver would thus receive two copies of what the bridge internally considered to be one single event. On destruction of the bridge, only a single membership deletion event was therefore sent. As a consequence of this, drivers which reference count memberships (at least DSA), would be left with orphan groups in their hardware database when the bridge was destroyed. This is only an issue when replaying additions. While deletion events may still be pending on the deferred queue, they will already have been removed from br->mdb_list, so no duplicates can be generated in that scenario. To a user this meant that old group memberships, from a bridge in which a port was previously attached, could be reanimated (in hardware) when the port joined a new bridge, without the new bridge's knowledge. For example, on an mv88e6xxx system, create a snooping bridge and immediately add a port to it: root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \ > ip link set dev x3 up master br0 And then destroy the bridge: root@infix-06-0b-00:~$ ip link del dev br0 root@infix-06-0b-00:~$ mvls atu ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a DEV:0 Marvell 88E6393X 33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . . 33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . . ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a root@infix-06-0b-00:~$ The two IPv6 groups remain in the hardware database because the port (x3) is notified of the host's membership twice: once via the original event and once via a replay. Since only a single delete notification is sent, the count remains at 1 when the bridge is destroyed. Then add the same port (or another port belonging to the same hardware domain) to a new bridge, this time with snooping disabled: root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \ > ip link set dev x3 up master br1 All multicast, including the two IPv6 groups from br0, should now be flooded, according to the policy of br1. But instead the old memberships are still active in the hardware database, causing the switch to only forward traffic to those groups towards the CPU (port 0). Eliminate the race in two steps: 1. Grab the write-side lock of the MDB while generating the replay list. This prevents new memberships from showing up while we are generating the replay list. But it leaves the scenario in which a deferred event was already generated, but not delivered, before we grabbed the lock. Therefore: 2. Make sure that no deferred version of a replay event is already enqueued to the switchdev deferred queue, before adding it to the replay list, when replaying additions. Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>