summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-12-04netfilter: nft_set_hash: skip duplicated elements pending gc runPablo Neira Ayuso
rhashtable does not provide stable walk, duplicated elements are possible in case of resizing. I considered that checking for errors when calling rhashtable_walk_next() was sufficient to detect the resizing. However, rhashtable_walk_next() returns -EAGAIN only at the end of the iteration, which is too late, because a gc work containing duplicated elements could have been already scheduled for removal to the worker. Add a u32 gc worker sequence number per set, bump it on every workqueue run. Annotate gc worker sequence number on the expired element. Use it to skip those already seen in this gc workqueue run. Note that this new field is never reset in case gc transaction fails, so next gc worker run on the expired element overrides it. Wraparound of gc worker sequence number should not be an issue with stale gc worker sequence number in the element, that would just postpone the element removal in one gc run. Note that it is not possible to use flags to annotate that element is pending gc run to detect duplicates, given that gc transaction can be invalidated in case of update from the control plane, therefore, not allowing to clear such flag. On x86_64, pahole reports no changes in the size of nft_rhash_elem. Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Reported-by: Laurent Fasnacht <laurent.fasnacht@proton.ch> Tested-by: Laurent Fasnacht <laurent.fasnacht@proton.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-12-04Merge tag 'loongarch-fixes-6.13-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix bugs about EFI screen info, hugetlb pte clear and Lockdep-RCU splat in KVM, plus some trival cleanups" * tag 'loongarch-fixes-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Protect kvm_io_bus_{read,write}() with SRCU LoongArch: KVM: Protect kvm_check_requests() with SRCU LoongArch: BPF: Adjust the parameter of emit_jirl() LoongArch: Add architecture specific huge_pte_clear() LoongArch/irq: Use seq_put_decimal_ull_width() for decimal values LoongArch: Fix reserving screen info memory for above-4G firmware
2024-12-04Merge tag 'platform-drivers-x86-v6.13-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - asus-nb-wmi: Silence unknown event warning when charger is plugged in - asus-wmi: Handle return code variations during thermal policy writing graciously - samsung-laptop: Correct module description * tag 'platform-drivers-x86-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: asus-nb-wmi: Ignore unknown event 0xCF platform/x86: asus-wmi: Ignore return value when writing thermal policy platform/x86: samsung-laptop: Match MODULE_DESCRIPTION() to functionality
2024-12-04tracing: Fix cmp_entries_dup() to respect sort() comparison rulesKuan-Wei Chiu
The cmp_entries_dup() function used as the comparator for sort() violated the symmetry and transitivity properties required by the sorting algorithm. Specifically, it returned 1 whenever memcmp() was non-zero, which broke the following expectations: * Symmetry: If x < y, then y > x. * Transitivity: If x < y and y < z, then x < z. These violations could lead to incorrect sorting and failure to correctly identify duplicate elements. Fix the issue by directly returning the result of memcmp(), which adheres to the required comparison properties. Cc: stable@vger.kernel.org Fixes: 08d43a5fa063 ("tracing: Add lock-free tracing_map") Link: https://lore.kernel.org/20241203202228.1274403-1-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-04netfilter: ipset: Hold module reference while requesting a modulePhil Sutter
User space may unload ip_set.ko while it is itself requesting a set type backend module, leading to a kernel crash. The race condition may be provoked by inserting an mdelay() right after the nfnl_unlock() call. Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-12-04net: sched: fix ordering of qlen adjustmentLion Ackermann
Changes to sch->q.qlen around qdisc_tree_reduce_backlog() need to happen _before_ a call to said function because otherwise it may fail to notify parent qdiscs when the child is about to become empty. Signed-off-by: Lion Ackermann <nnamrec@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-04net: sched: fix erspan_opt settings in cls_flowerXin Long
When matching erspan_opt in cls_flower, only the (version, dir, hwid) fields are relevant. However, in fl_set_erspan_opt() it initializes all bits of erspan_opt and its mask to 1. This inadvertently requires packets to match not only the (version, dir, hwid) fields but also the other fields that are unexpectedly set to 1. This patch resolves the issue by ensuring that only the (version, dir, hwid) fields are configured in fl_set_erspan_opt(), leaving the other fields to 0 in erspan_opt. Fixes: 79b1011cb33d ("net: sched: allow flower to match erspan options") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-03ethtool: Fix access to uninitialized fields in set RXNFC commandGal Pressman
The check for non-zero ring with RSS is only relevant for ETHTOOL_SRXCLSRLINS command, in other cases the check tries to access memory which was not initialized by the userspace tool. Only perform the check in case of ETHTOOL_SRXCLSRLINS. Without this patch, filter deletion (for example) could statistically result in a false error: # ethtool --config-ntuple eth3 delete 484 rmgr: Cannot delete RX class rule: Invalid argument Cannot delete classification rule Fixes: 9e43ad7a1ede ("net: ethtool: only allow set_rxnfc with rss + ring_cookie if driver opts in") Link: https://lore.kernel.org/netdev/871a9ecf-1e14-40dd-bbd7-e90c92f89d47@nvidia.com/ Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20241202164805.1637093-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-03Revert "udp: avoid calling sock_def_readable() if possible"Fernando Fernandez Mancera
This reverts commit 612b1c0dec5bc7367f90fc508448b8d0d7c05414. On a scenario with multiple threads blocking on a recvfrom(), we need to call sock_def_readable() on every __udp_enqueue_schedule_skb() otherwise the threads won't be woken up as __skb_wait_for_more_packets() is using prepare_to_wait_exclusive(). Link: https://bugzilla.redhat.com/2308477 Fixes: 612b1c0dec5b ("udp: avoid calling sock_def_readable() if possible") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241202155620.1719-1-ffmancera@riseup.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-03net: Make napi_hash_lock irq safeJoe Damato
Make napi_hash_lock IRQ safe. It is used during the control path, and is taken and released in napi_hash_add and napi_hash_del, which will typically be called by calls to napi_enable and napi_disable. This change avoids a deadlock in pcnet32 (and other any other drivers which follow the same pattern): CPU 0: pcnet32_open spin_lock_irqsave(&lp->lock, ...) napi_enable napi_hash_add <- before this executes, CPU 1 proceeds spin_lock(napi_hash_lock) [...] spin_unlock_irqrestore(&lp->lock, flags); CPU 1: pcnet32_close napi_disable napi_hash_del spin_lock(napi_hash_lock) < INTERRUPT > pcnet32_interrupt spin_lock(lp->lock) <- DEADLOCK Changing the napi_hash_lock to be IRQ safe prevents the IRQ from firing on CPU 1 until napi_hash_lock is released, preventing the deadlock. Cc: stable@vger.kernel.org Fixes: 86e25f40aa1e ("net: napi: Add napi_config") Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/netdev/85dd4590-ea6b-427d-876a-1d8559c7ad82@roeck-us.net/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241202182103.363038-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-03netfilter: nft_inner: incorrect percpu area handling under softirqPablo Neira Ayuso
Softirq can interrupt ongoing packet from process context that is walking over the percpu area that contains inner header offsets. Disable bh and perform three checks before restoring the percpu inner header offsets to validate that the percpu area is valid for this skbuff: 1) If the NFT_PKTINFO_INNER_FULL flag is set on, then this skbuff has already been parsed before for inner header fetching to register. 2) Validate that the percpu area refers to this skbuff using the skbuff pointer as a cookie. If there is a cookie mismatch, then this skbuff needs to be parsed again. 3) Finally, validate if the percpu area refers to this tunnel type. Only after these three checks the percpu area is restored to a on-stack copy and bh is enabled again. After inner header fetching, the on-stack copy is stored back to the percpu area. Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching") Reported-by: syzbot+84d0441b9860f0d63285@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-12-03Merge tag 'for-6.13-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - add lockdep annotations for io_uring/encoded read integration, inode lock is held when returning to userspace - properly reflect experimental config option to sysfs - handle NULL root in case the rescue mode accepts invalid/damaged tree roots (rescue=ibadroot) - regression fix of a deadlock between transaction and extent locks - fix pending bio accounting bug in encoded read ioctl - fix NOWAIT mode when checking references for NOCOW files - fix use-after-free in a rb-tree cleanup in ref-verify debugging tool * tag 'for-6.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix lockdep warnings on io_uring encoded reads btrfs: ref-verify: fix use-after-free after invalid ref action btrfs: add a sanity check for btrfs root in btrfs_search_slot() btrfs: don't loop for nowait writes when checking for cross references btrfs: sysfs: advertise experimental features only if CONFIG_BTRFS_EXPERIMENTAL=y btrfs: fix deadlock between transaction commits and extent locks btrfs: fix use-after-free in btrfs_encoded_read_endio()
2024-12-03Merge tag 'fs_for_v6.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota and udf fixes from Jan Kara: "Two small UDF fixes for better handling of corrupted filesystem and a quota fix to fix handling of filesystem freezing" * tag 'fs_for_v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Verify inode link counts before performing rename udf: Skip parent dir link count update if corrupted quota: flush quota_release_work upon quota writeback
2024-12-03Merge tag 'xfs-fixes-6.13-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Carlos Maiolino: - Use xchg() in xlog_cil_insert_pcp_aggregate() - Fix ABBA deadlock on a race between mount and log shutdown - Fix quota softlimit incoherency on delalloc - Fix sparse inode limits on runt AG - remove unknown compat feature checks in SB write valdation - Eliminate a lockdep false positive * tag 'xfs-fixes-6.13-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay xfs: Use xchg() in xlog_cil_insert_pcp_aggregate() xfs: prevent mount and log shutdown race xfs: delalloc and quota softlimit timers are incoherent xfs: fix sparse inode limits on runt AG xfs: remove unknown compat feature check in superblock write validation xfs: eliminate lockdep false positives in xfs_attr_shortform_list
2024-12-03igb: Fix potential invalid memory access in igb_init_module()Yuan Can
The pci_register_driver() can fail and when this happened, the dca_notifier needs to be unregistered, otherwise the dca_notifier can be called when igb fails to install, resulting to invalid memory access. Fixes: bbd98fe48a43 ("igb: Fix DCA errors and do not use context index for 82576") Signed-off-by: Yuan Can <yuancan@huawei.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-03ixgbe: Correct BASE-BX10 compliance codeTore Amundsen
SFF-8472 (section 5.4 Transceiver Compliance Codes) defines bit 6 as BASE-BX10. Bit 6 means a value of 0x40 (decimal 64). The current value in the source code is 0x64, which appears to be a mix-up of hex and decimal values. A value of 0x64 (binary 01100100) incorrectly sets bit 2 (1000BASE-CX) and bit 5 (100BASE-FX) as well. Fixes: 1b43e0d20f2d ("ixgbe: Add 1000BASE-BX support") Signed-off-by: Tore Amundsen <tore@amundsen.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Ernesto Castellotti <ernesto@castellotti.net> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-03ixgbe: downgrade logging of unsupported VF API version to debugJacob Keller
The ixgbe PF driver logs an info message when a VF attempts to negotiate an API version which it does not support: VF 0 requested invalid api version 6 The ixgbevf driver attempts to load with mailbox API v1.5, which is required for best compatibility with other hosts such as the ESX VMWare PF. The Linux PF only supports API v1.4, and does not currently have support for the v1.5 API. The logged message can confuse users, as the v1.5 API is valid, but just happens to not currently be supported by the Linux PF. Downgrade the info message to a debug message, and fix the language to use 'unsupported' instead of 'invalid' to improve message clarity. Long term, we should investigate whether the improvements in the v1.5 API make sense for the Linux PF, and if so implement them properly. This may require yet another API version to resolve issues with negotiating IPSEC offload support. Fixes: 339f28964147 ("ixgbevf: Add support for new mailbox communication between PF and VF") Reported-by: Yifei Liu <yifei.l.liu@oracle.com> Link: https://lore.kernel.org/intel-wired-lan/20240301235837.3741422-1-yifei.l.liu@oracle.com/ Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-03ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5Jacob Keller
Commit 339f28964147 ("ixgbevf: Add support for new mailbox communication between PF and VF") added support for v1.5 of the PF to VF mailbox communication API. This commit mistakenly enabled IPSEC offload for API v1.5. No implementation of the v1.5 API has support for IPSEC offload. This offload is only supported by the Linux PF as mailbox API v1.4. In fact, the v1.5 API is not implemented in any Linux PF. Attempting to enable IPSEC offload on a PF which supports v1.5 API will not work. Only the Linux upstream ixgbe and ixgbevf support IPSEC offload, and only as part of the v1.4 API. Fix the ixgbevf Linux driver to stop attempting IPSEC offload when the mailbox API does not support it. The existing API design choice makes it difficult to support future API versions, as other non-Linux hosts do not implement IPSEC offload. If we add support for v1.5 to the Linux PF, then we lose support for IPSEC offload. A full solution likely requires a new mailbox API with a proper negotiation to check that IPSEC is actually supported by the host. Fixes: 339f28964147 ("ixgbevf: Add support for new mailbox communication between PF and VF") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-03idpf: set completion tag for "empty" bufs associated with a packetJoshua Hay
Commit d9028db618a6 ("idpf: convert to libeth Tx buffer completion") inadvertently removed code that was necessary for the tx buffer cleaning routine to iterate over all buffers associated with a packet. When a frag is too large for a single data descriptor, it will be split across multiple data descriptors. This means the frag will span multiple buffers in the buffer ring in order to keep the descriptor and buffer ring indexes aligned. The buffer entries in the ring are technically empty and no cleaning actions need to be performed. These empty buffers can precede other frags associated with the same packet. I.e. a single packet on the buffer ring can look like: buf[0]=skb0.frag0 buf[1]=skb0.frag1 buf[2]=empty buf[3]=skb0.frag2 The cleaning routine iterates through these buffers based on a matching completion tag. If the completion tag is not set for buf2, the loop will end prematurely. Frag2 will be left uncleaned and next_to_clean will be left pointing to the end of packet, which will break the cleaning logic for subsequent cleans. This consequently leads to tx timeouts. Assign the empty bufs the same completion tag for the packet to ensure the cleaning routine iterates over all of the buffers associated with the packet. Fixes: d9028db618a6 ("idpf: convert to libeth Tx buffer completion") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Madhu chittim <madhu.chittim@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-03ice: Fix VLAN pruning in switchdev modeMarcin Szycik
In switchdev mode the uplink VSI should receive all unmatched packets, including VLANs. Therefore, VLAN pruning should be disabled if uplink is in switchdev mode. It is already being done in ice_eswitch_setup_env(), however the addition of ice_up() in commit 44ba608db509 ("ice: do switchdev slow-path Rx using PF VSI") caused VLAN pruning to be re-enabled after disabling it. Add a check to ice_set_vlan_filtering_features() to ensure VLAN filtering will not be enabled if uplink is in switchdev mode. Note that ice_is_eswitch_mode_switchdev() is being used instead of ice_is_switchdev_running(), as the latter would only return true after the whole switchdev setup completes. Fixes: 44ba608db509 ("ice: do switchdev slow-path Rx using PF VSI") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Priya Singh <priyax.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-03ice: Fix NULL pointer dereference in switchdevWojciech Drewek
Commit 608a5c05c39b ("virtchnl: support queue rate limit and quanta size configuration") introduced new virtchnl ops: - get_qos_caps - cfg_q_bw - cfg_q_quanta New ops were added to ice_virtchnl_dflt_ops, in commit 015307754a19 ("ice: Support VF queue rate limit and quanta size configuration"), but not to the ice_virtchnl_repr_ops. Because of that, if we get one of those messages in switchdev mode we end up with NULL pointer dereference: [ 1199.794701] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 1199.794804] Workqueue: ice ice_service_task [ice] [ 1199.794878] RIP: 0010:0x0 [ 1199.795027] Call Trace: [ 1199.795033] <TASK> [ 1199.795039] ? __die+0x20/0x70 [ 1199.795051] ? page_fault_oops+0x140/0x520 [ 1199.795064] ? exc_page_fault+0x7e/0x270 [ 1199.795074] ? asm_exc_page_fault+0x22/0x30 [ 1199.795086] ice_vc_process_vf_msg+0x6e5/0xd30 [ice] [ 1199.795165] __ice_clean_ctrlq+0x734/0x9d0 [ice] [ 1199.795207] ice_service_task+0xccf/0x12b0 [ice] [ 1199.795248] process_one_work+0x21a/0x620 [ 1199.795260] worker_thread+0x18d/0x330 [ 1199.795269] ? __pfx_worker_thread+0x10/0x10 [ 1199.795279] kthread+0xec/0x120 [ 1199.795288] ? __pfx_kthread+0x10/0x10 [ 1199.795296] ret_from_fork+0x2d/0x50 [ 1199.795305] ? __pfx_kthread+0x10/0x10 [ 1199.795312] ret_from_fork_asm+0x1a/0x30 [ 1199.795323] </TASK> Fixes: 015307754a19 ("ice: Support VF queue rate limit and quanta size configuration") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-03ice: fix PHY timestamp extraction for ETH56GPrzemyslaw Korba
Fix incorrect PHY timestamp extraction for ETH56G. It's better to use FIELD_PREP() than manual shift. Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-03ice: fix PHY Clock Recovery availability checkArkadiusz Kubalewski
To check if PHY Clock Recovery mechanic is available for a device, there is a need to verify if given PHY is available within the netlist, but the netlist node type used for the search is wrong, also the search context shall be specified. Modify the search function to allow specifying the context in the search. Use the PHY node type instead of CLOCK CONTROLLER type, also use proper search context which for PHY search is PORT, as defined in E810 Datasheet [1] ('3.3.8.2.4 Node Part Number and Node Options (0x0003)' and 'Table 3-105. Program Topology Device NVM Admin Command'). [1] https://cdrdv2.intel.com/v1/dl/getContent/613875?explicitVersion=true Fixes: 91e43ca0090b ("ice: fix linking when CONFIG_PTP_1588_CLOCK=n") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-03module: Convert default symbol namespace to string literalMasahiro Yamada
Commit cdd30ebb1b9f ("module: Convert symbol namespace to string literal") only converted MODULE_IMPORT_NS() and EXPORT_SYMBOL_NS(), leaving DEFAULT_SYMBOL_NAMESPACE as a macro expansion. This commit converts DEFAULT_SYMBOL_NAMESPACE in the same way to avoid annoyance for the default namespace as well. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-03doc: module: revert misconversions for MODULE_IMPORT_NS()Masahiro Yamada
This reverts the misconversions introduced by commit cdd30ebb1b9f ("module: Convert symbol namespace to string literal"). The affected descriptions refer to MODULE_IMPORT_NS() tags in general, rather than suggesting the use of the empty string ("") as the namespace. Fixes: cdd30ebb1b9f ("module: Convert symbol namespace to string literal") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-03scripts/nsdeps: get 'make nsdeps' working againMasahiro Yamada
Since commit cdd30ebb1b9f ("module: Convert symbol namespace to string literal"), when MODULE_IMPORT_NS() is missing, 'make nsdeps' inserts pointless code: MODULE_IMPORT_NS("ns"); Here, "ns" is not a namespace, but the variable in the semantic patch. It must not be quoted. Instead, a string literal must be passed to Coccinelle. Fixes: cdd30ebb1b9f ("module: Convert symbol namespace to string literal") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-03LoongArch: KVM: Protect kvm_io_bus_{read,write}() with SRCUHuacai Chen
When we enable lockdep we get such a warning: ============================= WARNING: suspicious RCU usage 6.12.0-rc7+ #1891 Tainted: G W ----------------------------- arch/loongarch/kvm/../../../virt/kvm/kvm_main.c:5945 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-system-loo/948: #0: 90000001184a00a8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0xf4/0xe20 [kvm] stack backtrace: CPU: 2 UID: 0 PID: 948 Comm: qemu-system-loo Tainted: G W 6.12.0-rc7+ #1891 Tainted: [W]=WARN Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022 Stack : 0000000000000089 9000000005a0db9c 90000000071519c8 900000012c578000 900000012c57b940 0000000000000000 900000012c57b948 9000000007e53788 900000000815bcc8 900000000815bcc0 900000012c57b7b0 0000000000000001 0000000000000001 4b031894b9d6b725 0000000005dec000 9000000100427b00 00000000000003d2 0000000000000001 000000000000002d 0000000000000003 0000000000000030 00000000000003b4 0000000005dec000 0000000000000000 900000000806d000 9000000007e53788 00000000000000b4 0000000000000004 0000000000000004 0000000000000000 0000000000000000 9000000107baf600 9000000008916000 9000000007e53788 9000000005924778 000000001fe001e5 00000000000000b0 0000000000000007 0000000000000000 0000000000071c1d ... Call Trace: [<9000000005924778>] show_stack+0x38/0x180 [<90000000071519c4>] dump_stack_lvl+0x94/0xe4 [<90000000059eb754>] lockdep_rcu_suspicious+0x194/0x240 [<ffff80000221f47c>] kvm_io_bus_read+0x19c/0x1e0 [kvm] [<ffff800002225118>] kvm_emu_mmio_read+0xd8/0x440 [kvm] [<ffff8000022254bc>] kvm_handle_read_fault+0x3c/0xe0 [kvm] [<ffff80000222b3c8>] kvm_handle_exit+0x228/0x480 [kvm] Fix it by protecting kvm_io_bus_{read,write}() with SRCU. Cc: stable@vger.kernel.org Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-12-03net: hsr: must allocate more bytes for RedBox supportEric Dumazet
Blamed commit forgot to change hsr_init_skb() to allocate larger skb for RedBox case. Indeed, send_hsr_supervision_frame() will add two additional components (struct hsr_sup_tlv and struct hsr_sup_payload) syzbot reported the following crash: skbuff: skb_over_panic: text:ffffffff8afd4b0a len:34 put:6 head:ffff88802ad29e00 data:ffff88802ad29f22 tail:0x144 end:0x140 dev:gretap0 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:206 ! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 2 UID: 0 PID: 7611 Comm: syz-executor Not tainted 6.12.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:skb_panic+0x157/0x1d0 net/core/skbuff.c:206 Code: b6 04 01 84 c0 74 04 3c 03 7e 21 8b 4b 70 41 56 45 89 e8 48 c7 c7 a0 7d 9b 8c 41 57 56 48 89 ee 52 4c 89 e2 e8 9a 76 79 f8 90 <0f> 0b 4c 89 4c 24 10 48 89 54 24 08 48 89 34 24 e8 94 76 fb f8 4c RSP: 0018:ffffc90000858ab8 EFLAGS: 00010282 RAX: 0000000000000087 RBX: ffff8880598c08c0 RCX: ffffffff816d3e69 RDX: 0000000000000000 RSI: ffffffff816de786 RDI: 0000000000000005 RBP: ffffffff8c9b91c0 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000302 R11: ffffffff961cc1d0 R12: ffffffff8afd4b0a R13: 0000000000000006 R14: ffff88804b938130 R15: 0000000000000140 FS: 000055558a3d6500(0000) GS:ffff88806a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1295974ff8 CR3: 000000002ab6e000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> skb_over_panic net/core/skbuff.c:211 [inline] skb_put+0x174/0x1b0 net/core/skbuff.c:2617 send_hsr_supervision_frame+0x6fa/0x9e0 net/hsr/hsr_device.c:342 hsr_proxy_announce+0x1a3/0x4a0 net/hsr/hsr_device.c:436 call_timer_fn+0x1a0/0x610 kernel/time/timer.c:1794 expire_timers kernel/time/timer.c:1845 [inline] __run_timers+0x6e8/0x930 kernel/time/timer.c:2419 __run_timer_base kernel/time/timer.c:2430 [inline] __run_timer_base kernel/time/timer.c:2423 [inline] run_timer_base+0x111/0x190 kernel/time/timer.c:2439 run_timer_softirq+0x1a/0x40 kernel/time/timer.c:2449 handle_softirqs+0x213/0x8f0 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu kernel/softirq.c:637 [inline] irq_exit_rcu+0xbb/0x120 kernel/softirq.c:649 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline] sysvec_apic_timer_interrupt+0xa4/0xc0 arch/x86/kernel/apic/apic.c:1049 </IRQ> Fixes: 5055cccfc2d1 ("net: hsr: Provide RedBox support (HSR-SAN)") Reported-by: syzbot+7f4643b267cc680bfa1c@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lukasz Majewski <lukma@denx.de> Link: https://patch.msgid.link/20241202100558.507765-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-03rtnetlink: fix double call of rtnl_link_get_net_ifla()Cong Wang
Currently rtnl_link_get_net_ifla() gets called twice when we create peer devices, once in rtnl_add_peer_net() and once in each ->newlink() implementation. This looks safer, however, it leads to a classic Time-of-Check to Time-of-Use (TOCTOU) bug since IFLA_NET_NS_PID is very dynamic. And because of the lack of checking error pointer of the second call, it also leads to a kernel crash as reported by syzbot. Fix this by getting rid of the second call, which already becomes redudant after Kuniyuki's work. We have to propagate the result of the first rtnl_link_get_net_ifla() down to each ->newlink(). Reported-by: syzbot+21ba4d5adff0b6a7cfc6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=21ba4d5adff0b6a7cfc6 Fixes: 0eb87b02a705 ("veth: Set VETH_INFO_PEER to veth_link_ops.peer_type.") Fixes: 6b84e558e95d ("vxcan: Set VXCAN_INFO_PEER to vxcan_link_ops.peer_type.") Fixes: fefd5d082172 ("netkit: Set IFLA_NETKIT_PEER_INFO to netkit_link_ops.peer_type.") Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241129212519.825567-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-03wifi: mac80211: fix station NSS capability initialization orderBenjamin Lin
Station's spatial streaming capability should be initialized before handling VHT OMN, because the handling requires the capability information. Fixes: a8bca3e9371d ("wifi: mac80211: track capability/opmode NSS separately") Signed-off-by: Benjamin Lin <benjamin-jw.lin@mediatek.com> Link: https://patch.msgid.link/20241118080722.9603-1-benjamin-jw.lin@mediatek.com [rewrite subject] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-12-03wifi: mac80211: fix vif addr when switching from monitor to stationFelix Fietkau
Since adding support for opting out of virtual monitor support, a zero vif addr was used to indicate passive vs active monitor to the driver. This would break the vif->addr when changing the netdev mac address before switching the interface from monitor to sta mode. Fix the regression by adding a separate flag to indicate whether vif->addr is valid. Reported-by: syzbot+9ea265d998de25ac6a46@syzkaller.appspotmail.com Fixes: 9d40f7e32774 ("wifi: mac80211: add flag to opt out of virtual monitor support") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/20241115115850.37449-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-12-03wifi: mac80211: fix a queue stall in certain cases of CSAEmmanuel Grumbach
If we got an unprotected action frame with CSA and then we heard the beacon with the CSA IE, we'll block the queues with the CSA reason twice. Since this reason is refcounted, we won't wake up the queues since we wake them up only once and the ref count will never reach 0. This led to blocked queues that prevented any activity (even disconnection wouldn't reset the queue state and the only way to recover would be to reload the kernel module. Fix this by not refcounting the CSA reason. It becomes now pointless to maintain the csa_blocked_queues state. Remove it. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Fixes: 414e090bc41d ("wifi: mac80211: restrict public action ECSA frame handling") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219447 Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20241119173108.5ea90828c2cc.I4f89e58572fb71ae48e47a81e74595cac410fbac@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-12-03wifi: mac80211: wake the queues in case of failure in resumeEmmanuel Grumbach
In case we fail to resume, we'll WARN with "Hardware became unavailable during restart." and we'll wait until user space does something. It'll typically bring the interface down and up to recover. This won't work though because the queues are still stopped on IEEE80211_QUEUE_STOP_REASON_SUSPEND reason. Make sure we clear that reason so that we give a chance to the recovery to succeed. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219447 Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20241119173108.cd628f560f97.I76a15fdb92de450e5329940125f3c58916be3942@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-12-03wifi: cfg80211: clear link ID from bitmap during link delete after clean upAditya Kumar Singh
Currently, during link deletion, the link ID is first removed from the valid_links bitmap before performing any clean-up operations. However, some functions require the link ID to remain in the valid_links bitmap. One such example is cfg80211_cac_event(). The flow is - nl80211_remove_link() cfg80211_remove_link() ieee80211_del_intf_link() ieee80211_vif_set_links() ieee80211_vif_update_links() ieee80211_link_stop() cfg80211_cac_event() cfg80211_cac_event() requires link ID to be present but it is cleared already in cfg80211_remove_link(). Ultimately, WARN_ON() is hit. Therefore, clear the link ID from the bitmap only after completing the link clean-up. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://patch.msgid.link/20241121-mlo_dfs_fix-v2-1-92c3bf7ab551@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-12-03wifi: mac80211: init cnt before accessing elem in ieee80211_copy_mbssid_beaconHaoyu Li
With the new __counted_by annocation in cfg80211_mbssid_elems, the "cnt" struct member must be set before accessing the "elem" array. Failing to do so will trigger a runtime warning when enabling CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Fixes: c14679d7005a ("wifi: cfg80211: Annotate struct cfg80211_mbssid_elems with __counted_by") Signed-off-by: Haoyu Li <lihaoyu499@gmail.com> Link: https://patch.msgid.link/20241123172500.311853-1-lihaoyu499@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-12-03wifi: mac80211: fix mbss changed flags corruption on 32 bit systemsIssam Hamdi
On 32-bit systems, the size of an unsigned long is 4 bytes, while a u64 is 8 bytes. Therefore, when using or_each_set_bit(bit, &bits, sizeof(changed) * BITS_PER_BYTE), the code is incorrectly searching for a bit in a 32-bit variable that is expected to be 64 bits in size, leading to incorrect bit finding. Solution: Ensure that the size of the bits variable is correctly adjusted for each architecture. Call Trace: ? show_regs+0x54/0x58 ? __warn+0x6b/0xd4 ? ieee80211_link_info_change_notify+0xcc/0xd4 [mac80211] ? report_bug+0x113/0x150 ? exc_overflow+0x30/0x30 ? handle_bug+0x27/0x44 ? exc_invalid_op+0x18/0x50 ? handle_exception+0xf6/0xf6 ? exc_overflow+0x30/0x30 ? ieee80211_link_info_change_notify+0xcc/0xd4 [mac80211] ? exc_overflow+0x30/0x30 ? ieee80211_link_info_change_notify+0xcc/0xd4 [mac80211] ? ieee80211_mesh_work+0xff/0x260 [mac80211] ? cfg80211_wiphy_work+0x72/0x98 [cfg80211] ? process_one_work+0xf1/0x1fc ? worker_thread+0x2c0/0x3b4 ? kthread+0xc7/0xf0 ? mod_delayed_work_on+0x4c/0x4c ? kthread_complete_and_exit+0x14/0x14 ? ret_from_fork+0x24/0x38 ? kthread_complete_and_exit+0x14/0x14 ? ret_from_fork_asm+0xf/0x14 ? entry_INT80_32+0xf0/0xf0 Signed-off-by: Issam Hamdi <ih@simonwunderlich.de> Link: https://patch.msgid.link/20241125162920.2711462-1-ih@simonwunderlich.de [restore no-op path for no changes] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-12-03wifi: nl80211: fix NL80211_ATTR_MLO_LINK_ID off-by-oneLin Ma
Since the netlink attribute range validation provides inclusive checking, the *max* of attribute NL80211_ATTR_MLO_LINK_ID should be IEEE80211_MLD_MAX_NUM_LINKS - 1 otherwise causing an off-by-one. One crash stack for demonstration: ================================================================== BUG: KASAN: wild-memory-access in ieee80211_tx_control_port+0x3b6/0xca0 net/mac80211/tx.c:5939 Read of size 6 at addr 001102080000000c by task fuzzer.386/9508 CPU: 1 PID: 9508 Comm: syz.1.386 Not tainted 6.1.70 #2 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x177/0x231 lib/dump_stack.c:106 print_report+0xe0/0x750 mm/kasan/report.c:398 kasan_report+0x139/0x170 mm/kasan/report.c:495 kasan_check_range+0x287/0x290 mm/kasan/generic.c:189 memcpy+0x25/0x60 mm/kasan/shadow.c:65 ieee80211_tx_control_port+0x3b6/0xca0 net/mac80211/tx.c:5939 rdev_tx_control_port net/wireless/rdev-ops.h:761 [inline] nl80211_tx_control_port+0x7b3/0xc40 net/wireless/nl80211.c:15453 genl_family_rcv_msg_doit+0x22e/0x320 net/netlink/genetlink.c:756 genl_family_rcv_msg net/netlink/genetlink.c:833 [inline] genl_rcv_msg+0x539/0x740 net/netlink/genetlink.c:850 netlink_rcv_skb+0x1de/0x420 net/netlink/af_netlink.c:2508 genl_rcv+0x24/0x40 net/netlink/genetlink.c:861 netlink_unicast_kernel net/netlink/af_netlink.c:1326 [inline] netlink_unicast+0x74b/0x8c0 net/netlink/af_netlink.c:1352 netlink_sendmsg+0x882/0xb90 net/netlink/af_netlink.c:1874 sock_sendmsg_nosec net/socket.c:716 [inline] __sock_sendmsg net/socket.c:728 [inline] ____sys_sendmsg+0x5cc/0x8f0 net/socket.c:2499 ___sys_sendmsg+0x21c/0x290 net/socket.c:2553 __sys_sendmsg net/socket.c:2582 [inline] __do_sys_sendmsg net/socket.c:2591 [inline] __se_sys_sendmsg+0x19e/0x270 net/socket.c:2589 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x45/0x90 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x63/0xcd Update the policy to ensure correct validation. Fixes: 7b0a0e3c3a88 ("wifi: cfg80211: do some rework towards MLO link APIs") Signed-off-by: Lin Ma <linma@zju.edu.cn> Suggested-by: Cengiz Can <cengiz.can@canonical.com> Link: https://patch.msgid.link/20241130170526.96698-1-linma@zju.edu.cn Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-12-03net/qed: allow old cards not supporting "num_images" to workLouis Leseur
Commit 43645ce03e00 ("qed: Populate nvm image attribute shadow.") added support for populating flash image attributes, notably "num_images". However, some cards were not able to return this information. In such cases, the driver would return EINVAL, causing the driver to exit. Add check to return EOPNOTSUPP instead of EINVAL when the card is not able to return these information. The caller function already handles EOPNOTSUPP without error. Fixes: 43645ce03e00 ("qed: Populate nvm image attribute shadow.") Co-developed-by: Florian Forestier <florian@forestier.re> Signed-off-by: Florian Forestier <florian@forestier.re> Signed-off-by: Louis Leseur <louis.leseur@gmail.com> Link: https://patch.msgid.link/20241128083633.26431-1-louis.leseur@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-03Merge branch 'two-fixes-for-smc'Paolo Abeni
Wen Gu says: ==================== two fixes for SMC This patch set contains two bugfixes, to fix SMC warning and panic issues in race conditions. ==================== Link: https://patch.msgid.link/20241127133014.100509-1-guwen@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-03net/smc: fix LGR and link use-after-free issueWen Gu
We encountered a LGR/link use-after-free issue, which manifested as the LGR/link refcnt reaching 0 early and entering the clear process, making resource access unsafe. refcount_t: addition on 0; use-after-free. WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 refcount_warn_saturate+0x9c/0x140 Workqueue: events smc_lgr_terminate_work [smc] Call trace: refcount_warn_saturate+0x9c/0x140 __smc_lgr_terminate.part.45+0x2a8/0x370 [smc] smc_lgr_terminate_work+0x28/0x30 [smc] process_one_work+0x1b8/0x420 worker_thread+0x158/0x510 kthread+0x114/0x118 or refcount_t: underflow; use-after-free. WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x140 Workqueue: smc_hs_wq smc_listen_work [smc] Call trace: refcount_warn_saturate+0xf0/0x140 smcr_link_put+0x1cc/0x1d8 [smc] smc_conn_free+0x110/0x1b0 [smc] smc_conn_abort+0x50/0x60 [smc] smc_listen_find_device+0x75c/0x790 [smc] smc_listen_work+0x368/0x8a0 [smc] process_one_work+0x1b8/0x420 worker_thread+0x158/0x510 kthread+0x114/0x118 It is caused by repeated release of LGR/link refcnt. One suspect is that smc_conn_free() is called repeatedly because some smc_conn_free() from server listening path are not protected by sock lock. e.g. Calls under socklock | smc_listen_work ------------------------------------------------------- lock_sock(sk) | smc_conn_abort smc_conn_free | \- smc_conn_free \- smcr_link_put | \- smcr_link_put (duplicated) release_sock(sk) So here add sock lock protection in smc_listen_work() path, making it exclusive with other connection operations. Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc") Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Co-developed-by: Kai <KaiShen@linux.alibaba.com> Signed-off-by: Kai <KaiShen@linux.alibaba.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-03net/smc: initialize close_work early to avoid warningWen Gu
We encountered a warning that close_work was canceled before initialization. WARNING: CPU: 7 PID: 111103 at kernel/workqueue.c:3047 __flush_work+0x19e/0x1b0 Workqueue: events smc_lgr_terminate_work [smc] RIP: 0010:__flush_work+0x19e/0x1b0 Call Trace: ? __wake_up_common+0x7a/0x190 ? work_busy+0x80/0x80 __cancel_work_timer+0xe3/0x160 smc_close_cancel_work+0x1a/0x70 [smc] smc_close_active_abort+0x207/0x360 [smc] __smc_lgr_terminate.part.38+0xc8/0x180 [smc] process_one_work+0x19e/0x340 worker_thread+0x30/0x370 ? process_one_work+0x340/0x340 kthread+0x117/0x130 ? __kthread_cancel_work+0x50/0x50 ret_from_fork+0x22/0x30 This is because when smc_close_cancel_work is triggered, e.g. the RDMA driver is rmmod and the LGR is terminated, the conn->close_work is flushed before initialization, resulting in WARN_ON(!work->func). __smc_lgr_terminate | smc_connect_{rdma|ism} ------------------------------------------------------------- | smc_conn_create | \- smc_lgr_register_conn for conn in lgr->conns_all | \- smc_conn_kill | \- smc_close_active_abort | \- smc_close_cancel_work | \- cancel_work_sync | \- __flush_work | (close_work) | | smc_close_init | \- INIT_WORK(&close_work) So fix this by initializing close_work before establishing the connection. Fixes: 46c28dbd4c23 ("net/smc: no socket state changes in tasklet context") Fixes: 413498440e30 ("net/smc: add SMC-D support in af_smc") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-03tipc: Fix use-after-free of kernel socket in cleanup_bearer().Kuniyuki Iwashima
syzkaller reported a use-after-free of UDP kernel socket in cleanup_bearer() without repro. [0][1] When bearer_disable() calls tipc_udp_disable(), cleanup of the UDP kernel socket is deferred by work calling cleanup_bearer(). tipc_net_stop() waits for such works to finish by checking tipc_net(net)->wq_count. However, the work decrements the count too early before releasing the kernel socket, unblocking cleanup_net() and resulting in use-after-free. Let's move the decrement after releasing the socket in cleanup_bearer(). [0]: ref_tracker: net notrefcnt@000000009b3d1faf has 1/1 users at sk_alloc+0x438/0x608 inet_create+0x4c8/0xcb0 __sock_create+0x350/0x6b8 sock_create_kern+0x58/0x78 udp_sock_create4+0x68/0x398 udp_sock_create+0x88/0xc8 tipc_udp_enable+0x5e8/0x848 __tipc_nl_bearer_enable+0x84c/0xed8 tipc_nl_bearer_enable+0x38/0x60 genl_family_rcv_msg_doit+0x170/0x248 genl_rcv_msg+0x400/0x5b0 netlink_rcv_skb+0x1dc/0x398 genl_rcv+0x44/0x68 netlink_unicast+0x678/0x8b0 netlink_sendmsg+0x5e4/0x898 ____sys_sendmsg+0x500/0x830 [1]: BUG: KMSAN: use-after-free in udp_hashslot include/net/udp.h:85 [inline] BUG: KMSAN: use-after-free in udp_lib_unhash+0x3b8/0x930 net/ipv4/udp.c:1979 udp_hashslot include/net/udp.h:85 [inline] udp_lib_unhash+0x3b8/0x930 net/ipv4/udp.c:1979 sk_common_release+0xaf/0x3f0 net/core/sock.c:3820 inet_release+0x1e0/0x260 net/ipv4/af_inet.c:437 inet6_release+0x6f/0xd0 net/ipv6/af_inet6.c:489 __sock_release net/socket.c:658 [inline] sock_release+0xa0/0x210 net/socket.c:686 cleanup_bearer+0x42d/0x4c0 net/tipc/udp_media.c:819 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xcaf/0x1c90 kernel/workqueue.c:3310 worker_thread+0xf6c/0x1510 kernel/workqueue.c:3391 kthread+0x531/0x6b0 kernel/kthread.c:389 ret_from_fork+0x60/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244 Uninit was created at: slab_free_hook mm/slub.c:2269 [inline] slab_free mm/slub.c:4580 [inline] kmem_cache_free+0x207/0xc40 mm/slub.c:4682 net_free net/core/net_namespace.c:454 [inline] cleanup_net+0x16f2/0x19d0 net/core/net_namespace.c:647 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xcaf/0x1c90 kernel/workqueue.c:3310 worker_thread+0xf6c/0x1510 kernel/workqueue.c:3391 kthread+0x531/0x6b0 kernel/kthread.c:389 ret_from_fork+0x60/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244 CPU: 0 UID: 0 PID: 54 Comm: kworker/0:2 Not tainted 6.12.0-rc1-00131-gf66ebf37d69c #7 91723d6f74857f70725e1583cba3cf4adc716cfa Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: events cleanup_bearer Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241127050512.28438-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-03dccp: Fix memory leak in dccp_feat_change_recvIvan Solodovnikov
If dccp_feat_push_confirm() fails after new value for SP feature was accepted without reconciliation ('entry == NULL' branch), memory allocated for that value with dccp_feat_clone_sp_val() is never freed. Here is the kmemleak stack for this: unreferenced object 0xffff88801d4ab488 (size 8): comm "syz-executor310", pid 1127, jiffies 4295085598 (age 41.666s) hex dump (first 8 bytes): 01 b4 4a 1d 80 88 ff ff ..J..... backtrace: [<00000000db7cabfe>] kmemdup+0x23/0x50 mm/util.c:128 [<0000000019b38405>] kmemdup include/linux/string.h:465 [inline] [<0000000019b38405>] dccp_feat_clone_sp_val net/dccp/feat.c:371 [inline] [<0000000019b38405>] dccp_feat_clone_sp_val net/dccp/feat.c:367 [inline] [<0000000019b38405>] dccp_feat_change_recv net/dccp/feat.c:1145 [inline] [<0000000019b38405>] dccp_feat_parse_options+0x1196/0x2180 net/dccp/feat.c:1416 [<00000000b1f6d94a>] dccp_parse_options+0xa2a/0x1260 net/dccp/options.c:125 [<0000000030d7b621>] dccp_rcv_state_process+0x197/0x13d0 net/dccp/input.c:650 [<000000001f74c72e>] dccp_v4_do_rcv+0xf9/0x1a0 net/dccp/ipv4.c:688 [<00000000a6c24128>] sk_backlog_rcv include/net/sock.h:1041 [inline] [<00000000a6c24128>] __release_sock+0x139/0x3b0 net/core/sock.c:2570 [<00000000cf1f3a53>] release_sock+0x54/0x1b0 net/core/sock.c:3111 [<000000008422fa23>] inet_wait_for_connect net/ipv4/af_inet.c:603 [inline] [<000000008422fa23>] __inet_stream_connect+0x5d0/0xf70 net/ipv4/af_inet.c:696 [<0000000015b6f64d>] inet_stream_connect+0x53/0xa0 net/ipv4/af_inet.c:735 [<0000000010122488>] __sys_connect_file+0x15c/0x1a0 net/socket.c:1865 [<00000000b4b70023>] __sys_connect+0x165/0x1a0 net/socket.c:1882 [<00000000f4cb3815>] __do_sys_connect net/socket.c:1892 [inline] [<00000000f4cb3815>] __se_sys_connect net/socket.c:1889 [inline] [<00000000f4cb3815>] __x64_sys_connect+0x6e/0xb0 net/socket.c:1889 [<00000000e7b1e839>] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 [<0000000055e91434>] entry_SYSCALL_64_after_hwframe+0x67/0xd1 Clean up the allocated memory in case of dccp_feat_push_confirm() failure and bail out with an error reset code. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: e77b8363b2ea ("dccp: Process incoming Change feature-negotiation options") Signed-off-by: Ivan Solodovnikov <solodovnikov.ia@phystech.edu> Link: https://patch.msgid.link/20241126143902.190853-1-solodovnikov.ia@phystech.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-02net/ipv6: release expired exception dst cached in socketJiri Wiesner
Dst objects get leaked in ip6_negative_advice() when this function is executed for an expired IPv6 route located in the exception table. There are several conditions that must be fulfilled for the leak to occur: * an ICMPv6 packet indicating a change of the MTU for the path is received, resulting in an exception dst being created * a TCP connection that uses the exception dst for routing packets must start timing out so that TCP begins retransmissions * after the exception dst expires, the FIB6 garbage collector must not run before TCP executes ip6_negative_advice() for the expired exception dst When TCP executes ip6_negative_advice() for an exception dst that has expired and if no other socket holds a reference to the exception dst, the refcount of the exception dst is 2, which corresponds to the increment made by dst_init() and the increment made by the TCP socket for which the connection is timing out. The refcount made by the socket is never released. The refcount of the dst is decremented in sk_dst_reset() but that decrement is counteracted by a dst_hold() intentionally placed just before the sk_dst_reset() in ip6_negative_advice(). After ip6_negative_advice() has finished, there is no other object tied to the dst. The socket lost its reference stored in sk_dst_cache and the dst is no longer in the exception table. The exception dst becomes a leaked object. As a result of this dst leak, an unbalanced refcount is reported for the loopback device of a net namespace being destroyed under kernels that do not contain e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev"): unregister_netdevice: waiting for lo to become free. Usage count = 2 Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The patch that introduced the dst_hold() in ip6_negative_advice() was 92f1655aa2b22 ("net: fix __dst_negative_advice() race"). But 92f1655aa2b22 merely refactored the code with regards to the dst refcount so the issue was present even before 92f1655aa2b22. The bug was introduced in 54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually expired.") where the expired cached route is deleted and the sk_dst_cache member of the socket is set to NULL by calling dst_negative_advice() but the refcount belonging to the socket is left unbalanced. The IPv4 version - ipv4_negative_advice() - is not affected by this bug. When the TCP connection times out ipv4_negative_advice() merely resets the sk_dst_cache of the socket while decrementing the refcount of the exception dst. Fixes: 92f1655aa2b22 ("net: fix __dst_negative_advice() race") Fixes: 54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually expired.") Link: https://lore.kernel.org/netdev/20241113105611.GA6723@incl/T/#u Signed-off-by: Jiri Wiesner <jwiesner@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241128085950.GA4505@incl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-02net: phy: microchip: Reset LAN88xx PHY to ensure clean link state on ↵Oleksij Rempel
LAN7800/7850 Fix outdated MII_LPA data in the LAN88xx PHY, which is used in LAN7800 and LAN7850 USB Ethernet controllers. Due to a hardware limitation, the PHY cannot reliably update link status after parallel detection when the link partner does not support auto-negotiation. To mitigate this, add a PHY reset in `lan88xx_link_change_notify()` when `phydev->state` is `PHY_NOLINK`, ensuring the PHY starts in a clean state and reports accurate fixed link parallel detection results. Fixes: 792aec47d59d9 ("add microchip LAN88xx phy driver") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20241125084050.414352-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-02Merge tag 'linux-can-fixes-for-6.13-20241202' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2024-12-02 The first patch is by me and allows the use of sleeping GPIOs to set termination GPIOs. Alexander Kozhinov fixes the gs_usb driver to use the endpoints provided by the usb endpoint descriptions instead of hard coded ones. Dario Binacchi contributes 11 statistics related patches for various CAN driver. A potential use after free in the hi311x is fixed. The statistics for the c_can, sun4i_can, hi311x, m_can, ifi_canfd, sja1000, sun4i_can, ems_usb, f81604 are fixed: update statistics even if the allocation of the error skb fails and fix the incrementing of the rx,tx error counters. A patch by me fixes the workaround for DS80000789E 6 erratum in the mcp251xfd driver. The last patch is by Dmitry Antipov, targets the j1939 CAN protocol and fixes a skb reference counting issue. * tag 'linux-can-fixes-for-6.13-20241202' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: j1939: j1939_session_new(): fix skb reference counting can: mcp251xfd: mcp251xfd_get_tef_len(): work around erratum DS80000789E 6. can: f81604: f81604_handle_can_bus_errors(): fix {rx,tx}_errors statistics can: ems_usb: ems_usb_rx_err(): fix {rx,tx}_errors statistics can: sun4i_can: sun4i_can_err(): fix {rx,tx}_errors statistics can: sja1000: sja1000_err(): fix {rx,tx}_errors statistics can: hi311x: hi3110_can_ist(): fix {rx,tx}_errors statistics can: ifi_canfd: ifi_canfd_handle_lec_err(): fix {rx,tx}_errors statistics can: m_can: m_can_handle_lec_err(): fix {rx,tx}_errors statistics can: hi311x: hi3110_can_ist(): update state error statistics if skb allocation fails can: hi311x: hi3110_can_ist(): fix potential use-after-free can: sun4i_can: sun4i_can_err(): call can_change_state() even if cf is NULL can: c_can: c_can_handle_bus_err(): update statistics if skb allocation fails can: gs_usb: add usb endpoint address detection at driver probe step can: dev: can_set_termination(): allow sleeping GPIOs ==================== Link: https://patch.msgid.link/20241202090040.1110280-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-02MAINTAINERS: list PTP drivers under networkingJakub Kicinski
PTP patches go via the netdev trees, add drivers/ptp/ to the networking entry so that get_maintainer.pl --scm lists those trees above Linus's tree. Thanks to the real entry using drivers/ptp/* the original entry will still be considered more specific / higher prio. Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://patch.msgid.link/20241130214100.125325-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-02module: Convert symbol namespace to string literalPeter Zijlstra
Clean up the existing export namespace code along the same lines of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ && $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]*\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-02platform/x86: asus-nb-wmi: Ignore unknown event 0xCFArmin Wolf
On the Asus X541UAK an unknown event 0xCF is emited when the charger is plugged in. This is caused by the following AML code: If (ACPS ()) { ACPF = One Local0 = 0x58 If (ATKP) { ^^^^ATKD.IANE (0xCF) } } Else { ACPF = Zero Local0 = 0x57 } Notify (AC0, 0x80) // Status Change If (ATKP) { ^^^^ATKD.IANE (Local0) } Sleep (0x64) PNOT () Sleep (0x0A) NBAT (0x80) Ignore the 0xCF event to silence the unknown event warning. Reported-by: Pau Espin Pedrol <pespin@espeweb.net> Closes: https://lore.kernel.org/platform-driver-x86/54d4860b-ec9c-4992-acf6-db3f90388293@espeweb.net Signed-off-by: Armin Wolf <W_Armin@gmx.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20241123224700.18530-1-W_Armin@gmx.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-02platform/x86: asus-wmi: Ignore return value when writing thermal policyArmin Wolf
On some machines like the ASUS Vivobook S14 writing the thermal policy returns the currently writen thermal policy instead of an error code. Ignore the return code to avoid falsely returning an error when the thermal policy was written successfully. Reported-by: auslands-kv@gmx.de Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219517 Fixes: 2daa86e78c49 ("platform/x86: asus_wmi: Support throttle thermal policy") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20241124171941.29789-1-W_Armin@gmx.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>