summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-01-19tcp: avoid the lookup process failing to get sk in ehash tableJason Xing
While one cpu is working on looking up the right socket from ehash table, another cpu is done deleting the request socket and is about to add (or is adding) the big socket from the table. It means that we could miss both of them, even though it has little chance. Let me draw a call trace map of the server side. CPU 0 CPU 1 ----- ----- tcp_v4_rcv() syn_recv_sock() inet_ehash_insert() -> sk_nulls_del_node_init_rcu(osk) __inet_lookup_established() -> __sk_nulls_add_node_rcu(sk, list) Notice that the CPU 0 is receiving the data after the final ack during 3-way shakehands and CPU 1 is still handling the final ack. Why could this be a real problem? This case is happening only when the final ack and the first data receiving by different CPUs. Then the server receiving data with ACK flag tries to search one proper established socket from ehash table, but apparently it fails as my map shows above. After that, the server fetches a listener socket and then sends a RST because it finds a ACK flag in the skb (data), which obeys RST definition in RFC 793. Besides, Eric pointed out there's one more race condition where it handles tw socket hashdance. Only by adding to the tail of the list before deleting the old one can we avoid the race if the reader has already begun the bucket traversal and it would possibly miss the head. Many thanks to Eric for great help from beginning to end. Fixes: 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/lkml/20230112065336.41034-1-kerneljasonxing@gmail.com/ Link: https://lore.kernel.org/r/20230118015941.1313-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-19fs: port xattr to mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->setattr() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-18net: sched: gred: prevent races when adding offloads to statsJakub Kicinski
Naresh reports seeing a warning that gred is calling u64_stats_update_begin() with preemption enabled. Arnd points out it's coming from _bstats_update(). We should be holding the qdisc lock when writing to stats, they are also updated from the datapath. Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/all/CA+G9fYsTr9_r893+62u6UGD3dVaCE-kN9C-Apmb2m=hxjc1Cqg@mail.gmail.com/ Fixes: e49efd5288bd ("net: sched: gred: support reporting stats from offloads") Link: https://lore.kernel.org/r/20230113044137.1383067-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-18Merge tag 'wireless-2023-01-18' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.2 Third set of fixes for v6.2. This time most of them are for drivers, only one revert for mac80211. For an important mt76 fix we had to cherry pick two commits from wireless-next. * tag 'wireless-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()" wifi: mt76: dma: fix a regression in adding rx buffers wifi: mt76: handle possible mt76_rx_token_consume failures wifi: mt76: dma: do not increment queue head if mt76_dma_add_buf fails wifi: rndis_wlan: Prevent buffer overflow in rndis_query_oid wifi: brcmfmac: fix regression for Broadcom PCIe wifi devices wifi: brcmfmac: avoid NULL-deref in survey dump for 2G only device wifi: brcmfmac: avoid handling disabled channels for survey dump ==================== Link: https://lore.kernel.org/r/20230118073749.AF061C433EF@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-18mm: remove zap_page_range and create zap_vma_pagesMike Kravetz
zap_page_range was originally designed to unmap pages within an address range that could span multiple vmas. While working on [1], it was discovered that all callers of zap_page_range pass a range entirely within a single vma. In addition, the mmu notification call within zap_page range does not correctly handle ranges that span multiple vmas. When crossing a vma boundary, a new mmu_notifier_range_init/end call pair with the new vma should be made. Instead of fixing zap_page_range, do the following: - Create a new routine zap_vma_pages() that will remove all pages within the passed vma. Most users of zap_page_range pass the entire vma and can use this new routine. - For callers of zap_page_range not passing the entire vma, instead call zap_page_range_single(). - Remove zap_page_range. [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/ Link: https://lkml.kernel.org/r/20230104002732.232573-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Suggested-by: Peter Xu <peterx@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18fs: port vfs_*() helpers to struct mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-18mac80211: support minimal EHT rate reporting on RXJohannes Berg
Add minimal support for RX EHT rate reporting, not yet adding (modifying) any radiotap headers, just statistics for cfg80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18wifi: mac80211: Add HE MU-MIMO related flags in ieee80211_bss_confMuna Sinada
Adding flags for SU Beamformer, SU Beamformee, MU Beamformer and Full Bandwidth UL MU-MIMO for HE. This is utilized to pass MU-MIMO configurations from user space to driver in AP mode. Signed-off-by: Muna Sinada <quic_msinada@quicinc.com> Link: https://lore.kernel.org/r/1665006886-23874-2-git-send-email-quic_msinada@quicinc.com [fixed indentation, removed redundant !!] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18wifi: mac80211: Add VHT MU-MIMO related flags in ieee80211_bss_confMuna Sinada
Adding flags for SU Beamformer, SU Beamformee, MU Beamformer and MU Beamformee for VHT. This is utilized to pass MU-MIMO configurations from user space to driver in AP mode. Signed-off-by: Muna Sinada <quic_msinada@quicinc.com> Link: https://lore.kernel.org/r/1665006886-23874-1-git-send-email-quic_msinada@quicinc.com [fixed indentation, removed redundant !!] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18wifi: cfg80211: Support 32 bytes KCK key in GTK rekey offloadShivani Baranwal
Currently, maximum KCK key length supported for GTK rekey offload is 24 bytes but with some newer AKMs the KCK key length can be 32 bytes. e.g., 00-0F-AC:24 AKM suite with SAE finite cyclic group 21. Add support to allow 32 bytes KCK keys in GTK rekey offload. Signed-off-by: Shivani Baranwal <quic_shivbara@quicinc.com> Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/20221206143715.1802987-3-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18wifi: cfg80211: Fix extended KCK key length check in nl80211_set_rekey_data()Shivani Baranwal
The extended KCK key length check wrongly using the KEK key attribute for validation. Due to this GTK rekey offload is failing when the KCK key length is 24 bytes even though the driver advertising WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK flag. Use correct attribute to fix the same. Fixes: 093a48d2aa4b ("cfg80211: support bigger kek/kck key length") Signed-off-by: Shivani Baranwal <quic_shivbara@quicinc.com> Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/20221206143715.1802987-2-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18wifi: cfg80211: remove support for static WEPJohannes Berg
This reverts commit b8676221f00d ("cfg80211: Add support for static WEP in the driver") since no driver ever ended up using it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18l2tp: prevent lockdep issue in l2tp_tunnel_register()Eric Dumazet
lockdep complains with the following lock/unlock sequence: lock_sock(sk); write_lock_bh(&sk->sk_callback_lock); [1] release_sock(sk); [2] write_unlock_bh(&sk->sk_callback_lock); We need to swap [1] and [2] to fix this issue. Fixes: 0b2c59720e65 ("l2tp: close all race conditions in l2tp_tunnel_register()") Reported-by: syzbot+bbd35b345c7cab0d9a08@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/20230114030137.672706-1-xiyou.wangcong@gmail.com/T/#m1164ff20628671b0f326a24cb106ab3239c70ce3 Cc: Cong Wang <cong.wang@bytedance.com> Cc: Guillaume Nault <gnault@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18xdp: document xdp_do_flush() before napi_complete_done()Magnus Karlsson
Document in the XDP_REDIRECT manual section that drivers must call xdp_do_flush() before napi_complete_done(). The two reasons behind this can be found following the links below. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/ Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller
Pablo Niera Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Fix syn-retransmits until initiator gives up when connection is re-used due to rst marked as invalid, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextDavid S. Miller
Florian Westphal says: ==================== Netfilter updates for net-next following patch set includes netfilter updates for your *net-next* tree. 1. Replace pr_debug use with nf_log infra for debugging in sctp conntrack. 2. Remove pr_debug calls, they are either useless or we have better options in place. 3. Avoid repeated load of ct->status in some spots. Some bit-flags cannot change during the lifeetime of a connection, so no need to re-fetch those. 4. Avoid uneeded nesting of rcu_read_lock during tuple lookup. 5. Remove the CLUSTERIP target. Marked as obsolete for years, and we still have WARN splats wrt. races of the out-of-band /proc interface installed by this target. 6. Add static key to nf_tables to avoid the retpoline mitigation if/else if cascade provided the cpu doesn't need the retpoline thunk. 7. add nf_tables objref calls to the retpoline mitigation workaround. 8. Split parts of nft_ct.c that do not need symbols exported by the conntrack modules and place them in nf_tables directly. This allows to avoid indirect call for 'ct status' checks. 9. Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not indicate an error if the referenced object (set, chain, rule...) did not exist, from Fernando. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18ipv6: Remove extra counter pull before gcTanmay Bhushan
Per cpu entries are no longer used in consideration for doing gc or not. Remove the extra per cpu entries pull to directly check for time and perform gc. Signed-off-by: Tanmay Bhushan <007047221b@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18netfilter: nf_tables: add support to destroy operationFernando Fernandez Mancera
Introduce NFT_MSG_DESTROY* message type. The destroy operation performs a delete operation but ignoring the ENOENT errors. This is useful for the transaction semantics, where failing to delete an object which does not exist results in aborting the transaction. This new command allows the transaction to proceed in case the object does not exist. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-18netfilter: nf_tables: avoid retpoline overhead for some ct expression callsFlorian Westphal
nft_ct expression cannot be made builtin to nf_tables without also forcing the conntrack itself to be builtin. However, this can be avoided by splitting retrieval of a few selector keys that only need to access the nf_conn structure, i.e. no function calls to nf_conntrack code. Many rulesets start with something like "ct status established,related accept" With this change, this no longer requires an indirect call, which gives about 1.8% more throughput with a simple conntrack-enabled forwarding test (retpoline thunk used). Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-18netfilter: nf_tables: avoid retpoline overhead for objref callsFlorian Westphal
objref expression is builtin, so avoid calls to it for RETOLINE=y builds. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-18netfilter: nf_tables: add static key to skip retpoline workaroundsFlorian Westphal
If CONFIG_RETPOLINE is enabled nf_tables avoids indirect calls for builtin expressions. On newer cpus indirect calls do not go through the retpoline thunk anymore, even for RETPOLINE=y builds. Just like with the new tc retpoline wrappers: Add a static key to skip the if / else if cascade if the cpu does not require retpolines. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-18netfilter: ip_tables: remove clusterip targetFlorian Westphal
Marked as 'to be removed soon' since kernel 4.1 (2015). Functionality was superseded by the 'cluster' match, added in kernel 2.6.30 (2009). clusterip_tg_check still has races that can give proc_dir_entry 'ipt_CLUSTERIP/10.1.1.2' already registered followed by a WARN splat. Remove it instead of trying to fix this up again. clusterip uapi header is left as-is for now. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-18netfilter: conntrack: move rcu read lock to nf_conntrack_find_getFlorian Westphal
Move rcu_read_lock/unlock to nf_conntrack_find_get(), this avoids nested rcu_read_lock call from resolve_normal_ct(). Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-18netfilter: conntrack: avoid reload of ct->statusFlorian Westphal
Compiler can't merge the two test_bit() calls, so load ct->status once and use non-atomic accesses. This is fine because IPS_EXPECTED or NAT_CLASH are either set at ct creation time or not at all, but compiler can't know that. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-18netfilter: conntrack: remove pr_debug callsFlorian Westphal
Those are all useless or dubious. getorigdst() is called via setsockopt, so return value/errno will already indicate an appropriate error. For other pr_debug calls there are better replacements, such as slab/slub debugging or 'conntrack -E' (ctnetlink events). Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-18netfilter: conntrack: sctp: use nf log infrastructure for invalid packetsFlorian Westphal
The conntrack logging facilities include useful info such as in/out interface names and packet headers. Use those in more places instead of pr_debug calls. Furthermore, several pr_debug calls can be removed, they are useless on production machines due to the sheer volume of log messages. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-17Bluetooth: Fix possible deadlock in rfcomm_sk_state_changeYing Hsu
syzbot reports a possible deadlock in rfcomm_sk_state_change [1]. While rfcomm_sock_connect acquires the sk lock and waits for the rfcomm lock, rfcomm_sock_release could have the rfcomm lock and hit a deadlock for acquiring the sk lock. Here's a simplified flow: rfcomm_sock_connect: lock_sock(sk) rfcomm_dlc_open: rfcomm_lock() rfcomm_sock_release: rfcomm_sock_shutdown: rfcomm_lock() __rfcomm_dlc_close: rfcomm_k_state_change: lock_sock(sk) This patch drops the sk lock before calling rfcomm_dlc_open to avoid the possible deadlock and holds sk's reference count to prevent use-after-free after rfcomm_dlc_open completes. Reported-by: syzbot+d7ce59...@syzkaller.appspotmail.com Fixes: 1804fdf6e494 ("Bluetooth: btintel: Combine setting up MSFT extension") Link: https://syzkaller.appspot.com/bug?extid=d7ce59b06b3eb14fd218 [1] Signed-off-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-01-17Bluetooth: ISO: Fix possible circular locking dependencyLuiz Augusto von Dentz
This attempts to fix the following trace: iso-tester/52 is trying to acquire lock: ffff8880024e0070 (&hdev->lock){+.+.}-{3:3}, at: iso_sock_listen+0x29e/0x440 but task is already holding lock: ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_sock_listen+0x8b/0x440 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: lock_acquire+0x176/0x3d0 lock_sock_nested+0x32/0x80 iso_connect_cfm+0x1a3/0x630 hci_cc_le_setup_iso_path+0x195/0x340 hci_cmd_complete_evt+0x1ae/0x500 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 -> #1 (hci_cb_list_lock){+.+.}-{3:3}: lock_acquire+0x176/0x3d0 __mutex_lock+0x13b/0xf50 hci_le_remote_feat_complete_evt+0x17e/0x320 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 -> #0 (&hdev->lock){+.+.}-{3:3}: check_prev_add+0xfc/0x1190 __lock_acquire+0x1e27/0x2750 lock_acquire+0x176/0x3d0 __mutex_lock+0x13b/0xf50 iso_sock_listen+0x29e/0x440 __sys_listen+0xe6/0x160 __x64_sys_listen+0x25/0x30 do_syscall_64+0x42/0x90 entry_SYSCALL_64_after_hwframe+0x62/0xcc other info that might help us debug this: Chain exists of: &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(hci_cb_list_lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(&hdev->lock); *** DEADLOCK *** 1 lock held by iso-tester/52: #0: ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_sock_listen+0x8b/0x440 Fixes: f764a6c2c1e4 ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-01-17Bluetooth: hci_event: Fix Invalid wait contextLuiz Augusto von Dentz
This fixes the following trace caused by attempting to lock cmd_sync_work_lock while holding the rcu_read_lock: kworker/u3:2/212 is trying to lock: ffff888002600910 (&hdev->cmd_sync_work_lock){+.+.}-{3:3}, at: hci_cmd_sync_queue+0xad/0x140 other info that might help us debug this: context-{4:4} 4 locks held by kworker/u3:2/212: #0: ffff8880028c6530 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work+0x4dc/0x9a0 #1: ffff888001aafde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work+0x4dc/0x9a0 #2: ffff888002600070 (&hdev->lock){+.+.}-{3:3}, at: hci_cc_le_set_cig_params+0x64/0x4f0 #3: ffffffffa5994b00 (rcu_read_lock){....}-{1:2}, at: hci_cc_le_set_cig_params+0x2f9/0x4f0 Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-01-17Bluetooth: ISO: Fix possible circular locking dependencyLuiz Augusto von Dentz
This attempts to fix the following trace: kworker/u3:1/184 is trying to acquire lock: ffff888001888130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_connect_cfm+0x2de/0x690 but task is already holding lock: ffff8880028d1c20 (&conn->lock){+.+.}-{2:2}, at: iso_connect_cfm+0x265/0x690 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&conn->lock){+.+.}-{2:2}: lock_acquire+0x176/0x3d0 _raw_spin_lock+0x2a/0x40 __iso_sock_close+0x1dd/0x4f0 iso_sock_release+0xa0/0x1b0 sock_close+0x5e/0x120 __fput+0x102/0x410 task_work_run+0xf1/0x160 exit_to_user_mode_prepare+0x170/0x180 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x4e/0x90 entry_SYSCALL_64_after_hwframe+0x62/0xcc -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: check_prev_add+0xfc/0x1190 __lock_acquire+0x1e27/0x2750 lock_acquire+0x176/0x3d0 lock_sock_nested+0x32/0x80 iso_connect_cfm+0x2de/0x690 hci_cc_le_setup_iso_path+0x195/0x340 hci_cmd_complete_evt+0x1ae/0x500 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&conn->lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(&conn->lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); *** DEADLOCK *** Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type") Fixes: f764a6c2c1e4 ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-01-17Bluetooth: hci_sync: fix memory leak in hci_update_adv_data()Zhengchao Shao
When hci_cmd_sync_queue() failed in hci_update_adv_data(), inst_ptr is not freed, which will cause memory leak, convert to use ERR_PTR/PTR_ERR to pass the instance to callback so no memory needs to be allocated. Fixes: 651cd3d65b0f ("Bluetooth: convert hci_update_adv_data to hci_sync") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-01-17Bluetooth: hci_conn: Fix memory leaksZhengchao Shao
When hci_cmd_sync_queue() failed in hci_le_terminate_big() or hci_le_big_terminate(), the memory pointed by variable d is not freed, which will cause memory leak. Add release process to error path. Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-01-17Bluetooth: hci_sync: Fix use HCI_OP_LE_READ_BUFFER_SIZE_V2Luiz Augusto von Dentz
Don't try to use HCI_OP_LE_READ_BUFFER_SIZE_V2 if controller don't support ISO channels, but in order to check if ISO channels are supported HCI_OP_LE_READ_LOCAL_FEATURES needs to be done earlier so the features bits can be checked on hci_le_read_buffer_size_sync. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216817 Fixes: c1631dbc00c1 ("Bluetooth: hci_sync: Fix hci_read_buffer_size_sync") Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-01-17Bluetooth: Fix a buffer overflow in mgmt_mesh_add()Harshit Mogalapalli
Smatch Warning: net/bluetooth/mgmt_util.c:375 mgmt_mesh_add() error: __memcpy() 'mesh_tx->param' too small (48 vs 50) Analysis: 'mesh_tx->param' is array of size 48. This is the destination. u8 param[sizeof(struct mgmt_cp_mesh_send) + 29]; // 19 + 29 = 48. But in the caller 'mesh_send' we reject only when len > 50. len > (MGMT_MESH_SEND_SIZE + 31) // 19 + 31 = 50. Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-01-17netfilter: conntrack: handle tcp challenge acks during connection reuseFlorian Westphal
When a connection is re-used, following can happen: [ connection starts to close, fin sent in either direction ] > syn # initator quickly reuses connection < ack # peer sends a challenge ack > rst # rst, sequence number == ack_seq of previous challenge ack > syn # this syn is expected to pass Problem is that the rst will fail window validation, so it gets tagged as invalid. If ruleset drops such packets, we get repeated syn-retransmits until initator gives up or peer starts responding with syn/ack. Before the commit indicated in the "Fixes" tag below this used to work: The challenge-ack made conntrack re-init state based on the challenge ack itself, so the following rst would pass window validation. Add challenge-ack support: If we get ack for syn, record the ack_seq, and then check if the rst sequence number matches the last ack number seen in reverse direction. Fixes: c7aab4f17021 ("netfilter: nf_conntrack_tcp: re-init for syn packets only") Reported-by: Michal Tesar <mtesar@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-17Drivers: hv: Make remove callback of hyperv driver void returnedDawei Li
Since commit fc7a6209d571 ("bus: Make remove callback return void") forces bus_type::remove be void-returned, it doesn't make much sense for any bus based driver implementing remove callbalk to return non-void to its caller. As such, change the remove function for Hyper-V VMBus based drivers to return void. Signed-off-by: Dawei Li <set_pte_at@outlook.com> Link: https://lore.kernel.org/r/TYCP286MB2323A93C55526E4DF239D3ACCAFA9@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-01-17HID: Make lowlevel driver structs constThomas Weißschuh
Nothing is nor should be modifying these structs so mark them as const. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: David Rheinsberg <david.rheinsberg@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2023-01-17HID: Unexport struct hidp_hid_driverThomas Weißschuh
As there are no external users this implementation detail does not need to be exported. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: David Rheinsberg <david.rheinsberg@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2023-01-17Merge wireless into wireless-nextKalle Valo
Due to the two cherry picked commits from wireless to wireless-next we have several conflicts in mt76. To avoid any bugs with conflicts merge wireless into wireless-next. 96f134dc1964 wifi: mt76: handle possible mt76_rx_token_consume failures fe13dad8992b wifi: mt76: dma: do not increment queue head if mt76_dma_add_buf fails
2023-01-17inet: fix fast path in __inet_hash_connect()Pietro Borrello
__inet_hash_connect() has a fast path taken if sk_head(&tb->owners) is equal to the sk parameter. sk_head() returns the hlist_entry() with respect to the sk_node field. However entries in the tb->owners list are inserted with respect to the sk_bind_node field with sk_add_bind_node(). Thus the check would never pass and the fast path never execute. This fast path has never been executed or tested as this bug seems to be present since commit 1da177e4c3f4 ("Linux-2.6.12-rc2"), thus remove it to reduce code complexity. Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230112-inet_hash_connect_bind_head-v3-1-b591fd212b93@diag.uniroma1.it Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-17net: kfree_skb_list use kmem_cache_free_bulkJesper Dangaard Brouer
The kfree_skb_list function walks SKB (via skb->next) and frees them individually to the SLUB/SLAB allocator (kmem_cache). It is more efficient to bulk free them via the kmem_cache_free_bulk API. This patches create a stack local array with SKBs to bulk free while walking the list. Bulk array size is limited to 16 SKBs to trade off stack usage and efficiency. The SLUB kmem_cache "skbuff_head_cache" uses objsize 256 bytes usually in an order-1 page 8192 bytes that is 32 objects per slab (can vary on archs and due to SLUB sharing). Thus, for SLUB the optimal bulk free case is 32 objects belonging to same slab, but runtime this isn't likely to occur. The expected gain from using kmem_cache bulk alloc and free API have been assessed via a microbencmark kernel module[1]. The module 'slab_bulk_test01' results at bulk 16 element: kmem-in-loop Per elem: 109 cycles(tsc) 30.532 ns (step:16) kmem-bulk Per elem: 64 cycles(tsc) 17.905 ns (step:16) More detailed description of benchmarks avail in [2]. [1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm [2] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/kfree_skb_list01.org V2: rename function to kfree_skb_add_bulk. Reviewed-by: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-17net: fix call location in kfree_skb_list_reasonJesper Dangaard Brouer
The SKB drop reason uses __builtin_return_address(0) to give the call "location" to trace_kfree_skb() tracepoint skb:kfree_skb. To keep this stable for compilers kfree_skb_reason() is annotated with __fix_address (noinline __noclone) as fixed in commit c205cc7534a9 ("net: skb: prevent the split of kfree_skb_reason() by gcc"). The function kfree_skb_list_reason() invoke kfree_skb_reason(), which cause the __builtin_return_address(0) "location" to report the unexpected address of kfree_skb_list_reason. Example output from 'perf script': kpktgend_0 1337 [000] 81.002597: skb:kfree_skb: skbaddr=0xffff888144824700 protocol=2048 location=kfree_skb_list_reason+0x1e reason: QDISC_DROP Patch creates an __always_inline __kfree_skb_reason() helper call that is called from both kfree_skb_list() and kfree_skb_list_reason(). Suggestions for solutions that shares code better are welcome. As preparation for next patch move __kfree_skb() invocation out of this helper function. Reviewed-by: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-17devlink: remove some unnecessary codeDan Carpenter
This code checks if (attrs[DEVLINK_ATTR_TRAP_POLICER_ID]) twice. Once at the start of the function and then a couple lines later. Delete the second check since that one must be true. Because the second condition is always true, it means the: policer_item = group_item->policer_item; assignment is immediately over-written. Delete that as well. Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/Y8EJz8oxpMhfiPUb@kili Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-17sched: add new attr TCA_EXT_WARN_MSG to report tc extact messageHangbin Liu
We will report extack message if there is an error via netlink_ack(). But if the rule is not to be exclusively executed by the hardware, extack is not passed along and offloading failures don't get logged. In commit 81c7288b170a ("sched: cls: enable verbose logging") Marcelo made cls could log verbose info for offloading failures, which helps improving Open vSwitch debuggability when using flower offloading. It would also be helpful if userspace monitor tools, like "tc monitor", could log this kind of message, as it doesn't require vswitchd log level adjusment. Let's add a new tc attributes to report the extack message so the monitor program could receive the failures. e.g. # tc monitor added chain dev enp3s0f1np1 parent ffff: chain 0 added filter dev enp3s0f1np1 ingress protocol all pref 49152 flower chain 0 handle 0x1 ct_state +trk+new not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 Warning: mlx5_core: matching on ct_state +new isn't supported. In this patch I only report the extack message on add/del operations. It doesn't look like we need to report the extack message on get/dump operations. Note this message not only reporte to multicast groups, it could also be reported unicast, which may affect the current usersapce tool's behaivor. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20230113034353.2766735-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-16Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()"Eric Dumazet
This reverts commit 13e5afd3d773c6fc6ca2b89027befaaaa1ea7293. ieee80211_if_free() is already called from free_netdev(ndev) because ndev->priv_destructor == ieee80211_if_free syzbot reported: general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] CPU: 0 PID: 10041 Comm: syz-executor.0 Not tainted 6.2.0-rc2-syzkaller-00388-g55b98837e37d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 RIP: 0010:pcpu_get_page_chunk mm/percpu.c:262 [inline] RIP: 0010:pcpu_chunk_addr_search mm/percpu.c:1619 [inline] RIP: 0010:free_percpu mm/percpu.c:2271 [inline] RIP: 0010:free_percpu+0x186/0x10f0 mm/percpu.c:2254 Code: 80 3c 02 00 0f 85 f5 0e 00 00 48 8b 3b 48 01 ef e8 cf b3 0b 00 48 ba 00 00 00 00 00 fc ff df 48 8d 78 20 48 89 f9 48 c1 e9 03 <80> 3c 11 00 0f 85 3b 0e 00 00 48 8b 58 20 48 b8 00 00 00 00 00 fc RSP: 0018:ffffc90004ba7068 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff88823ffe2b80 RCX: 0000000000000004 RDX: dffffc0000000000 RSI: ffffffff81c1f4e7 RDI: 0000000000000020 RBP: ffffe8fffe8fc220 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000000 R11: 1ffffffff2179ab2 R12: ffff8880b983d000 R13: 0000000000000003 R14: 0000607f450fc220 R15: ffff88823ffe2988 FS: 00007fcb349de700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32220000 CR3: 000000004914f000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> netdev_run_todo+0x6bf/0x1100 net/core/dev.c:10352 ieee80211_register_hw+0x2663/0x4040 net/mac80211/main.c:1411 mac80211_hwsim_new_radio+0x2537/0x4d80 drivers/net/wireless/mac80211_hwsim.c:4583 hwsim_new_radio_nl+0xa09/0x10f0 drivers/net/wireless/mac80211_hwsim.c:5176 genl_family_rcv_msg_doit.isra.0+0x1e6/0x2d0 net/netlink/genetlink.c:968 genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline] genl_rcv_msg+0x4ff/0x7e0 net/netlink/genetlink.c:1065 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076 netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356 netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 ____sys_sendmsg+0x712/0x8c0 net/socket.c:2476 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530 __sys_sendmsg+0xf7/0x1c0 net/socket.c:2559 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 13e5afd3d773 ("wifi: mac80211: fix memory leak in ieee80211_if_add()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Zhengchao Shao <shaozhengchao@huawei.com> Cc: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230113124326.3533978-1-edumazet@google.com
2023-01-16l2tp: close all race conditions in l2tp_tunnel_register()Cong Wang
The code in l2tp_tunnel_register() is racy in several ways: 1. It modifies the tunnel socket _after_ publishing it. 2. It calls setup_udp_tunnel_sock() on an existing socket without locking. 3. It changes sock lock class on fly, which triggers many syzbot reports. This patch amends all of them by moving socket initialization code before publishing and under sock lock. As suggested by Jakub, the l2tp lockdep class is not necessary as we can just switch to bh_lock_sock_nested(). Fixes: 37159ef2c1ae ("l2tp: fix a lockdep splat") Fixes: 6b9f34239b00 ("l2tp: fix races in tunnel creation") Reported-by: syzbot+52866e24647f9a23403f@syzkaller.appspotmail.com Reported-by: syzbot+94cc2a66fc228b23f360@syzkaller.appspotmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Guillaume Nault <gnault@redhat.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Parkin <tparkin@katalix.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16l2tp: convert l2tp_tunnel_list to idrCong Wang
l2tp uses l2tp_tunnel_list to track all registered tunnels and to allocate tunnel ID's. IDR can do the same job. More importantly, with IDR we can hold the ID before a successful registration so that we don't need to worry about late error handling, it is not easy to rollback socket changes. This is a preparation for the following fix. Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Guillaume Nault <gnault@redhat.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Parkin <tparkin@katalix.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16virtio/vsock: replace virtio_vsock_pkt with sk_buffBobby Eshleman
This commit changes virtio/vsock to use sk_buff instead of virtio_vsock_pkt. Beyond better conforming to other net code, using sk_buff allows vsock to use sk_buff-dependent features in the future (such as sockmap) and improves throughput. This patch introduces the following performance changes: Tool: Uperf Env: Phys Host + L1 Guest Payload: 64k Threads: 16 Test Runs: 10 Type: SOCK_STREAM Before: commit b7bfaa761d760 ("Linux 6.2-rc3") Before ------ g2h: 16.77Gb/s h2g: 10.56Gb/s After ----- g2h: 21.04Gb/s h2g: 10.76Gb/s Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16net/sched: sch_taprio: fix possible use-after-freeEric Dumazet
syzbot reported a nasty crash [1] in net_tx_action() which made little sense until we got a repro. This repro installs a taprio qdisc, but providing an invalid TCA_RATE attribute. qdisc_create() has to destroy the just initialized taprio qdisc, and taprio_destroy() is called. However, the hrtimer used by taprio had already fired, therefore advance_sched() called __netif_schedule(). Then net_tx_action was trying to use a destroyed qdisc. We can not undo the __netif_schedule(), so we must wait until one cpu serviced the qdisc before we can proceed. Many thanks to Alexander Potapenko for his help. [1] BUG: KMSAN: uninit-value in queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline] BUG: KMSAN: uninit-value in do_raw_spin_trylock include/linux/spinlock.h:191 [inline] BUG: KMSAN: uninit-value in __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline] BUG: KMSAN: uninit-value in _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138 queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline] do_raw_spin_trylock include/linux/spinlock.h:191 [inline] __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline] _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138 spin_trylock include/linux/spinlock.h:359 [inline] qdisc_run_begin include/net/sch_generic.h:187 [inline] qdisc_run+0xee/0x540 include/net/pkt_sched.h:125 net_tx_action+0x77c/0x9a0 net/core/dev.c:5086 __do_softirq+0x1cc/0x7fb kernel/softirq.c:571 run_ksoftirqd+0x2c/0x50 kernel/softirq.c:934 smpboot_thread_fn+0x554/0x9f0 kernel/smpboot.c:164 kthread+0x31b/0x430 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 Uninit was created at: slab_post_alloc_hook mm/slab.h:732 [inline] slab_alloc_node mm/slub.c:3258 [inline] __kmalloc_node_track_caller+0x814/0x1250 mm/slub.c:4970 kmalloc_reserve net/core/skbuff.c:358 [inline] __alloc_skb+0x346/0xcf0 net/core/skbuff.c:430 alloc_skb include/linux/skbuff.h:1257 [inline] nlmsg_new include/net/netlink.h:953 [inline] netlink_ack+0x5f3/0x12b0 net/netlink/af_netlink.c:2436 netlink_rcv_skb+0x55d/0x6c0 net/netlink/af_netlink.c:2507 rtnetlink_rcv+0x30/0x40 net/core/rtnetlink.c:6108 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0xabc/0xe90 net/socket.c:2482 ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2536 __sys_sendmsg net/socket.c:2565 [inline] __do_sys_sendmsg net/socket.c:2574 [inline] __se_sys_sendmsg net/socket.c:2572 [inline] __x64_sys_sendmsg+0x367/0x540 net/socket.c:2572 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 6.0.0-rc2-syzkaller-47461-gac3859c02d7f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022 Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>