summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-28KVM: x86: Snapshot the host's DEBUGCTL in common x86Sean Christopherson
Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in common x86, so that SVM can also use the snapshot. Opportunistically change the field to a u64. While bits 63:32 are reserved on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: stable@vger.kernel.org Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250227222411.3490595-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-28KVM: SVM: Suppress DEBUGCTL.BTF on AMDSean Christopherson
Mark BTF as reserved in DEBUGCTL on AMD, as KVM doesn't actually support BTF, and fully enabling BTF virtualization is non-trivial due to interactions with the emulator, guest_debug, #DB interception, nested SVM, etc. Don't inject #GP if the guest attempts to set BTF, as there's no way to communicate lack of support to the guest, and instead suppress the flag and treat the WRMSR as (partially) unsupported. In short, make KVM behave the same on AMD and Intel (VMX already squashes BTF). Note, due to other bugs in KVM's handling of DEBUGCTL, the only way BTF has "worked" in any capacity is if the guest simultaneously enables LBRs. Reported-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: stable@vger.kernel.org Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250227222411.3490595-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-28KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective valueSean Christopherson
Drop bits 5:2 from the guest's effective DEBUGCTL value, as AMD changed the architectural behavior of the bits and broke backwards compatibility. On CPUs without BusLockTrap (or at least, in APMs from before ~2023), bits 5:2 controlled the behavior of external pins: Performance-Monitoring/Breakpoint Pin-Control (PBi)—Bits 5:2, read/write. Software uses thesebits to control the type of information reported by the four external performance-monitoring/breakpoint pins on the processor. When a PBi bit is cleared to 0, the corresponding external pin (BPi) reports performance-monitor information. When a PBi bit is set to 1, the corresponding external pin (BPi) reports breakpoint information. With the introduction of BusLockTrap, presumably to be compatible with Intel CPUs, AMD redefined bit 2 to be BLCKDB: Bus Lock #DB Trap (BLCKDB)—Bit 2, read/write. Software sets this bit to enable generation of a #DB trap following successful execution of a bus lock when CPL is > 0. and redefined bits 5:3 (and bit 6) as "6:3 Reserved MBZ". Ideally, KVM would treat bits 5:2 as reserved. Defer that change to a feature cleanup to avoid breaking existing guest in LTS kernels. For now, drop the bits to retain backwards compatibility (of a sort). Note, dropping bits 5:2 is still a guest-visible change, e.g. if the guest is enabling LBRs *and* the legacy PBi bits, then the state of the PBi bits is visible to the guest, whereas now the guest will always see '0'. Reported-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: stable@vger.kernel.org Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250227222411.3490595-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-28KVM: selftests: Assert that STI blocking isn't set after event injectionSean Christopherson
Add an L1 (guest) assert to the nested exceptions test to verify that KVM doesn't put VMRUN in an STI shadow (AMD CPUs bleed the shadow into the guest's int_state if a #VMEXIT occurs before VMRUN fully completes). Add a similar assert to the VMX side as well, because why not. Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250224165442.2338294-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-28KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadowSean Christopherson
Enable/disable local IRQs, i.e. set/clear RFLAGS.IF, in the common svm_vcpu_enter_exit() just after/before guest_state_{enter,exit}_irqoff() so that VMRUN is not executed in an STI shadow. AMD CPUs have a quirk (some would say "bug"), where the STI shadow bleeds into the guest's intr_state field if a #VMEXIT occurs during injection of an event, i.e. if the VMRUN doesn't complete before the subsequent #VMEXIT. The spurious "interrupts masked" state is relatively benign, as it only occurs during event injection and is transient. Because KVM is already injecting an event, the guest can't be in HLT, and if KVM is querying IRQ blocking for injection, then KVM would need to force an immediate exit anyways since injecting multiple events is impossible. However, because KVM copies int_state verbatim from vmcb02 to vmcb12, the spurious STI shadow is visible to L1 when running a nested VM, which can trip sanity checks, e.g. in VMware's VMM. Hoist the STI+CLI all the way to C code, as the aforementioned calls to guest_state_{enter,exit}_irqoff() already inform lockdep that IRQs are enabled/disabled, and taking a fault on VMRUN with RFLAGS.IF=1 is already possible. I.e. if there's kernel code that is confused by running with RFLAGS.IF=1, then it's already a problem. In practice, since GIF=0 also blocks NMIs, the only change in exposure to non-KVM code (relative to surrounding VMRUN with STI+CLI) is exception handling code, and except for the kvm_rebooting=1 case, all exception in the core VM-Enter/VM-Exit path are fatal. Use the "raw" variants to enable/disable IRQs to avoid tracing in the "no instrumentation" code; the guest state helpers also take care of tracing IRQ state. Oppurtunstically document why KVM needs to do STI in the first place. Reported-by: Doug Covelli <doug.covelli@broadcom.com> Closes: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4uzJQ@mail.gmail.com Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly") Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250224165442.2338294-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-28Merge tag 'io_uring-6.14-20250228' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fix from Jens Axboe: "Just a single fix headed for stable, ensuring that msg_control is properly saved in compat mode as well" * tag 'io_uring-6.14-20250228' of git://git.kernel.dk/linux: io_uring/net: save msg_control for compat
2025-02-28Merge tag 'efi-fixes-for-v6.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: "Another couple of EFI fixes for v6.14. Only James's patch stands out, as it implements a workaround for odd behavior in fwupd in user space, which creates EFI variables by touching a file in efivarfs, clearing the immutable bit (which gets set automatically for $reasons) and then opening it again for writing, none of which is really necessary. The fwupd author and LVFS maintainer is already rolling out a fix for this on the fwupd side, and suggested that the workaround in this PR could be backed out again during the next cycle. (There is a semantic mismatch in efivarfs where some essential variable attributes are stored in the first 4 bytes of the file, and so zero length files cannot exist, as they cannot be written back to the underlying variable store. So now, they are dropped once the last reference is released.) Summary: - Fix CPER error record parsing bugs - Fix a couple of efivarfs issues that were introduced in the merge window - Fix an issue in the early remapping code of the MOKvar table" * tag 'efi-fixes-for-v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/mokvar-table: Avoid repeated map/unmap of the same page efi: Don't map the entire mokvar table to determine its size efivarfs: allow creation of zero length files efivarfs: Defer PM notifier registration until .fill_super efi/cper: Fix cper_arm_ctx_info alignment efi/cper: Fix cper_ia_proc_ctx alignment
2025-02-28Merge tag 'i2c-host-fixes-6.14-rc5' of ↵Wolfram Sang
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current i2c-host-fixes for v6.14-rc5 - npcm fixes interrupt initialization sequence. - ls2x fixes frequency setting. - amd-asf re-enables interrupts properly at irq handler's exit.
2025-02-28gpiolib: Fix Oops in gpiod_direction_input_nonotify()Dan Carpenter
The gpiod_direction_input_nonotify() function is supposed to return zero if the direction for the pin is input. But instead it accidentally returns GPIO_LINE_DIRECTION_IN (1) which will be cast into an ERR_PTR() in gpiochip_request_own_desc(). The callers dereference it and it leads to a crash. I changed gpiod_direction_output_raw_commit() just for consistency but returning GPIO_LINE_DIRECTION_OUT (0) is fine. Cc: stable@vger.kernel.org Fixes: 9d846b1aebbe ("gpiolib: check the return value of gpio_chip::get_direction()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/254f3925-3015-4c9d-aac5-bb9b4b2cd2c5@stanley.mountain [Bartosz: moved the variable declarations to the top of the functions] Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-28block: fix 'kmem_cache of name 'bio-108' already exists'Ming Lei
Device mapper bioset often has big bio_slab size, which can be more than 1000, then 8byte can't hold the slab name any more, cause the kmem_cache allocation warning of 'kmem_cache of name 'bio-108' already exists'. Fix the warning by extending bio_slab->name to 12 bytes, but fix output of /proc/slabinfo Reported-by: Guangwu Zhang <guazhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250228132656.2838008-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28iommu/vt-d: Fix suspicious RCU usageLu Baolu
Commit <d74169ceb0d2> ("iommu/vt-d: Allocate DMAR fault interrupts locally") moved the call to enable_drhd_fault_handling() to a code path that does not hold any lock while traversing the drhd list. Fix it by ensuring the dmar_global_lock lock is held when traversing the drhd list. Without this fix, the following warning is triggered: ============================= WARNING: suspicious RCU usage 6.14.0-rc3 #55 Not tainted ----------------------------- drivers/iommu/intel/dmar.c:2046 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 2 locks held by cpuhp/1/23: #0: ffffffff84a67c50 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 #1: ffffffff84a6a380 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 stack backtrace: CPU: 1 UID: 0 PID: 23 Comm: cpuhp/1 Not tainted 6.14.0-rc3 #55 Call Trace: <TASK> dump_stack_lvl+0xb7/0xd0 lockdep_rcu_suspicious+0x159/0x1f0 ? __pfx_enable_drhd_fault_handling+0x10/0x10 enable_drhd_fault_handling+0x151/0x180 cpuhp_invoke_callback+0x1df/0x990 cpuhp_thread_fun+0x1ea/0x2c0 smpboot_thread_fn+0x1f5/0x2e0 ? __pfx_smpboot_thread_fn+0x10/0x10 kthread+0x12a/0x2d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x4a/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Holding the lock in enable_drhd_fault_handling() triggers a lockdep splat about a possible deadlock between dmar_global_lock and cpu_hotplug_lock. This is avoided by not holding dmar_global_lock when calling iommu_device_register(), which initiates the device probe process. Fixes: d74169ceb0d2 ("iommu/vt-d: Allocate DMAR fault interrupts locally") Reported-and-tested-by: Ido Schimmel <idosch@nvidia.com> Closes: https://lore.kernel.org/linux-iommu/Zx9OwdLIc_VoQ0-a@shredder.mtl.com/ Tested-by: Breno Leitao <leitao@debian.org> Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250218022422.2315082-1-baolu.lu@linux.intel.com Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-28iommu/vt-d: Remove device comparison in context_setup_pass_through_cbJerry Snitselaar
Remove the device comparison check in context_setup_pass_through_cb. pci_for_each_dma_alias already makes a decision on whether the callback function should be called for a device. With the check in place it will fail to create context entries for aliases as it walks up to the root bus. Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain") Closes: https://lore.kernel.org/linux-iommu/82499eb6-00b7-4f83-879a-e97b4144f576@linux.intel.com/ Cc: stable@vger.kernel.org Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20250224180316.140123-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-28iommu/amd: Preserve default DTE fields when updating Host Page Table RootAlejandro Jimenez
When updating the page table root field on the DTE, avoid overwriting any bits that are already set. The earlier call to make_clear_dte() writes default values that all DTEs must have set (currently DTE[V]), and those must be preserved. Currently this doesn't cause problems since the page table root update is the first field that is set after make_clear_dte() is called, and DTE_FLAG_V is set again later along with the permission bits (IR/IW). Remove this redundant assignment too. Fixes: fd5dff9de4be ("iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers") Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250106191413.3107140-1-alejandro.j.jimenez@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-28Merge patch series "Remove accesses to page->index from ceph"Christian Brauner
Remove page->index access from ceph. * patches from https://lore.kernel.org/r/20250217185119.430193-1-willy@infradead.org: fs: Remove page_mkwrite_check_truncate() ceph: Pass a folio to ceph_allocate_page_array() ceph: Convert ceph_move_dirty_page_in_page_array() to move_dirty_folio_in_page_array() ceph: Remove uses of page from ceph_process_folio_batch() ceph: Convert ceph_check_page_before_write() to use a folio ceph: Convert writepage_nounlock() to write_folio_nounlock() ceph: Convert ceph_readdir_cache_control to store a folio ceph: Convert ceph_find_incompatible() to take a folio ceph: Use a folio in ceph_page_mkwrite() ceph: Remove ceph_writepage() Link: https://lore.kernel.org/r/20250217185119.430193-1-willy@infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28fs: Remove page_mkwrite_check_truncate()Matthew Wilcox (Oracle)
All callers of this function have now been converted to use folio_mkwrite_check_truncate(). Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250221204421.3590340-1-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: Pass a folio to ceph_allocate_page_array()Matthew Wilcox (Oracle)
Remove two accesses to page->index. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250217185119.430193-10-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: Convert ceph_move_dirty_page_in_page_array() to ↵Matthew Wilcox (Oracle)
move_dirty_folio_in_page_array() Shorten the name of this internal function by dropping the 'ceph_' prefix and pass in a folio instead of a page. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250217185119.430193-9-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: Remove uses of page from ceph_process_folio_batch()Matthew Wilcox (Oracle)
Remove uses of page->index and deprecated page APIs. Saves a lot of hidden calls to compound_head(). Also convert is_page_index_contiguous() to is_folio_index_contiguous() and make its arguments const. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250217185119.430193-8-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: Convert ceph_check_page_before_write() to use a folioMatthew Wilcox (Oracle)
Remove the conversion back to a struct page and just use the folio passed in. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250217185119.430193-7-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: Convert writepage_nounlock() to write_folio_nounlock()Matthew Wilcox (Oracle)
Remove references to page->index, page->mapping, thp_size(), page_offset() and other page APIs in favour of their more efficient folio replacements. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250217185119.430193-6-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: Convert ceph_readdir_cache_control to store a folioMatthew Wilcox (Oracle)
Pass a folio around instead of a page. This removes an access to page->index and a few hidden calls to compound_head(). Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250217185119.430193-5-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: Convert ceph_find_incompatible() to take a folioMatthew Wilcox (Oracle)
Both callers already have the folio. Pass it in and use it throughout. Removes some hidden calls to compound_head() and a reference to page->mapping. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250217185119.430193-4-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: Use a folio in ceph_page_mkwrite()Matthew Wilcox (Oracle)
Convert the passed page to a folio and use it throughout ceph_page_mkwrite(). Removes the last call to page_mkwrite_check_truncate(), the last call to offset_in_thp() and one of the last calls to thp_size(). Saves a few calls to compound_head(). Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250217185119.430193-3-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: Remove ceph_writepage()Matthew Wilcox (Oracle)
Ceph already has a writepages operation which is preferred over writepage in all situations except for page migration. By adding a migrate_folio operation, there will be no situations in which ->writepage should be called. filemap_migrate_folio() is an appropriate operation to use because the ceph data stored in folio->private does not contain any reference to the memory address of the folio. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250217185119.430193-2-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28Merge patch series "ceph: fix generic/421 test failure"Christian Brauner
Viacheslav Dubeyko <slava@dubeyko.com> says: From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> The generic/421 fails to finish because of the issue: Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.894678] INFO: task kworker/u48:0:11 blocked for more than 122 seconds. Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895403] Not tainted 6.13.0-rc5+ #1 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896633] task:kworker/u48:0 state:D stack:0 pid:11 tgid:11 ppid:2 flags:0x00004000 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896641] Workqueue: writeback wb_workfn (flush-ceph-24) Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897614] Call Trace: Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897620] <TASK> Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897629] __schedule+0x443/0x16b0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897637] schedule+0x2b/0x140 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897640] io_schedule+0x4c/0x80 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897643] folio_wait_bit_common+0x11b/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897646] ? _raw_spin_unlock_irq+0xe/0x50 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897652] ? __pfx_wake_page_function+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897655] __folio_lock+0x17/0x30 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897658] ceph_writepages_start+0xca9/0x1fb0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897663] ? fsnotify_remove_queued_event+0x2f/0x40 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897668] do_writepages+0xd2/0x240 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897672] __writeback_single_inode+0x44/0x350 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897675] writeback_sb_inodes+0x25c/0x550 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897680] wb_writeback+0x89/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897683] ? finish_task_switch.isra.0+0x97/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897687] wb_workfn+0xb5/0x410 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897689] process_one_work+0x188/0x3d0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897692] worker_thread+0x2b5/0x3c0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897694] ? __pfx_worker_thread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897696] kthread+0xe1/0x120 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897699] ? __pfx_kthread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897701] ret_from_fork+0x43/0x70 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897705] ? __pfx_kthread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897707] ret_from_fork_asm+0x1a/0x30 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897711] </TASK> There are several issues here: (1) ceph_kill_sb() doesn't wait ending of flushing all dirty folios/pages because of racy nature of mdsc->stopping_blockers. As a result, mdsc->stopping becomes CEPH_MDSC_STOPPING_FLUSHED too early. (2) The ceph_inc_osd_stopping_blocker(fsc->mdsc) fails to increment mdsc->stopping_blockers. Finally, already locked folios/pages are never been unlocked and the logic tries to lock the same page second time. (3) The folio_batch with found dirty pages by filemap_get_folios_tag() is not processed properly. And this is why some number of dirty pages simply never processed and we have dirty folios/pages after unmount anyway. This patchset is refactoring the ceph_writepages_start() method and it fixes the issues by means of: (1) introducing dirty_folios counter and flush_end_wq waiting queue in struct ceph_mds_client; (2) ceph_dirty_folio() increments the dirty_folios counter; (3) writepages_finish() decrements the dirty_folios counter and wake up all waiters on the queue if dirty_folios counter is equal or lesser than zero; (4) adding in ceph_kill_sb() method the logic of checking the value of dirty_folios counter and waiting if it is bigger than zero; (5) adding ceph_inc_osd_stopping_blocker() call in the beginning of the ceph_writepages_start() and ceph_dec_osd_stopping_blocker() at the end of the ceph_writepages_start() with the goal to resolve the racy nature of mdsc->stopping_blockers. sudo ./check generic/421 FSTYP -- ceph PLATFORM -- Linux/x86_64 ceph-testing-0001 6.13.0+ #137 SMP PREEMPT_DYNAMIC Mon Feb 3 20:30:08 UTC 2025 MKFS_OPTIONS -- 127.0.0.1:40551:/scratch MOUNT_OPTIONS -- -o name=fs,secret=<secret>,ms_mode=crc,nowsync,copyfrom 127.0.0.1:40551:/scratch /mnt/scratch generic/421 7s ... 4s Ran: generic/421 Passed all 1 tests * patches from https://lore.kernel.org/r/20250205000249.123054-1-slava@dubeyko.com: ceph: fix generic/421 test failure ceph: introduce ceph_submit_write() method ceph: introduce ceph_process_folio_batch() method ceph: extend ceph_writeback_ctl for ceph_writepages_start() refactoring Link: https://lore.kernel.org/r/20250205000249.123054-1-slava@dubeyko.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: fix generic/421 test failureViacheslav Dubeyko
The generic/421 fails to finish because of the issue: Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.894678] INFO: task kworker/u48:0:11 blocked for more than 122 seconds. Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895403] Not tainted 6.13.0-rc5+ #1 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896633] task:kworker/u48:0 state:D stack:0 pid:11 tgid:11 ppid:2 flags:0x00004000 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896641] Workqueue: writeback wb_workfn (flush-ceph-24) Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897614] Call Trace: Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897620] <TASK> Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897629] __schedule+0x443/0x16b0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897637] schedule+0x2b/0x140 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897640] io_schedule+0x4c/0x80 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897643] folio_wait_bit_common+0x11b/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897646] ? _raw_spin_unlock_irq+0xe/0x50 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897652] ? __pfx_wake_page_function+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897655] __folio_lock+0x17/0x30 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897658] ceph_writepages_start+0xca9/0x1fb0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897663] ? fsnotify_remove_queued_event+0x2f/0x40 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897668] do_writepages+0xd2/0x240 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897672] __writeback_single_inode+0x44/0x350 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897675] writeback_sb_inodes+0x25c/0x550 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897680] wb_writeback+0x89/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897683] ? finish_task_switch.isra.0+0x97/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897687] wb_workfn+0xb5/0x410 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897689] process_one_work+0x188/0x3d0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897692] worker_thread+0x2b5/0x3c0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897694] ? __pfx_worker_thread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897696] kthread+0xe1/0x120 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897699] ? __pfx_kthread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897701] ret_from_fork+0x43/0x70 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897705] ? __pfx_kthread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897707] ret_from_fork_asm+0x1a/0x30 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897711] </TASK> There are several issues here: (1) ceph_kill_sb() doesn't wait ending of flushing all dirty folios/pages because of racy nature of mdsc->stopping_blockers. As a result, mdsc->stopping becomes CEPH_MDSC_STOPPING_FLUSHED too early. (2) The ceph_inc_osd_stopping_blocker(fsc->mdsc) fails to increment mdsc->stopping_blockers. Finally, already locked folios/pages are never been unlocked and the logic tries to lock the same page second time. (3) The folio_batch with found dirty pages by filemap_get_folios_tag() is not processed properly. And this is why some number of dirty pages simply never processed and we have dirty folios/pages after unmount anyway. This patch fixes the issues by means of: (1) introducing dirty_folios counter and flush_end_wq waiting queue in struct ceph_mds_client; (2) ceph_dirty_folio() increments the dirty_folios counter; (3) writepages_finish() decrements the dirty_folios counter and wake up all waiters on the queue if dirty_folios counter is equal or lesser than zero; (4) adding in ceph_kill_sb() method the logic of checking the value of dirty_folios counter and waiting if it is bigger than zero; (5) adding ceph_inc_osd_stopping_blocker() call in the beginning of the ceph_writepages_start() and ceph_dec_osd_stopping_blocker() at the end of the ceph_writepages_start() with the goal to resolve the racy nature of mdsc->stopping_blockers. sudo ./check generic/421 FSTYP -- ceph PLATFORM -- Linux/x86_64 ceph-testing-0001 6.13.0+ #137 SMP PREEMPT_DYNAMIC Mon Feb 3 20:30:08 UTC 2025 MKFS_OPTIONS -- 127.0.0.1:40551:/scratch MOUNT_OPTIONS -- -o name=fs,secret=<secret>,ms_mode=crc,nowsync,copyfrom 127.0.0.1:40551:/scratch /mnt/scratch generic/421 7s ... 4s Ran: generic/421 Passed all 1 tests Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Link: https://lore.kernel.org/r/20250205000249.123054-5-slava@dubeyko.com Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: introduce ceph_submit_write() methodViacheslav Dubeyko
Final responsibility of ceph_writepages_start() is to submit write requests for processed dirty folios/pages. The ceph_submit_write() summarize all this logic in one method. The generic/421 fails to finish because of the issue: Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.894678] INFO: task kworker/u48:0:11 blocked for more than 122 seconds. Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895403] Not tainted 6.13.0-rc5+ #1 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896633] task:kworker/u48:0 state:D stack:0 pid:11 tgid:11 ppid:2 flags:0x00004000 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896641] Workqueue: writeback wb_workfn (flush-ceph-24) Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897614] Call Trace: Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897620] <TASK> Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897629] __schedule+0x443/0x16b0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897637] schedule+0x2b/0x140 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897640] io_schedule+0x4c/0x80 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897643] folio_wait_bit_common+0x11b/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897646] ? _raw_spin_unlock_irq+0xe/0x50 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897652] ? __pfx_wake_page_function+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897655] __folio_lock+0x17/0x30 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897658] ceph_writepages_start+0xca9/0x1fb0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897663] ? fsnotify_remove_queued_event+0x2f/0x40 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897668] do_writepages+0xd2/0x240 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897672] __writeback_single_inode+0x44/0x350 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897675] writeback_sb_inodes+0x25c/0x550 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897680] wb_writeback+0x89/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897683] ? finish_task_switch.isra.0+0x97/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897687] wb_workfn+0xb5/0x410 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897689] process_one_work+0x188/0x3d0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897692] worker_thread+0x2b5/0x3c0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897694] ? __pfx_worker_thread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897696] kthread+0xe1/0x120 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897699] ? __pfx_kthread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897701] ret_from_fork+0x43/0x70 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897705] ? __pfx_kthread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897707] ret_from_fork_asm+0x1a/0x30 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897711] </TASK> There are two problems here: if (!ceph_inc_osd_stopping_blocker(fsc->mdsc)) { rc = -EIO; goto release_folios; } (1) ceph_kill_sb() doesn't wait ending of flushing all dirty folios/pages because of racy nature of mdsc->stopping_blockers. As a result, mdsc->stopping becomes CEPH_MDSC_STOPPING_FLUSHED too early. (2) The ceph_inc_osd_stopping_blocker(fsc->mdsc) fails to increment mdsc->stopping_blockers. Finally, already locked folios/pages are never been unlocked and the logic tries to lock the same page second time. This patch implements refactoring of ceph_submit_write() and also it solves the second issue. Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Link: https://lore.kernel.org/r/20250205000249.123054-4-slava@dubeyko.com Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: introduce ceph_process_folio_batch() methodViacheslav Dubeyko
First step of ceph_writepages_start() logic is of finding the dirty memory folios and processing it. This patch introduces ceph_process_folio_batch() method that moves this logic into dedicated method. The ceph_writepages_start() has this logic: if (ceph_wbc.locked_pages == 0) lock_page(page); /* first page */ else if (!trylock_page(page)) break; <skipped> if (folio_test_writeback(folio) || folio_test_private_2(folio) /* [DEPRECATED] */) { if (wbc->sync_mode == WB_SYNC_NONE) { doutc(cl, "%p under writeback\n", folio); folio_unlock(folio); continue; } doutc(cl, "waiting on writeback %p\n", folio); folio_wait_writeback(folio); folio_wait_private_2(folio); /* [DEPRECATED] */ } The problem here that folio/page is locked here at first and it is by set_page_writeback(page) later before submitting the write request. The folio/page is unlocked by writepages_finish() after finishing the write request. It means that logic of checking folio_test_writeback() and folio_wait_writeback() never works because page is locked and it cannot be locked again until write request completion. However, for majority of folios/pages the trylock_page() is used. As a result, multiple threads can try to lock the same folios/pages multiple times even if they are under writeback already. It makes this logic more compute intensive than it is necessary. This patch changes this logic: if (folio_test_writeback(folio) || folio_test_private_2(folio) /* [DEPRECATED] */) { if (wbc->sync_mode == WB_SYNC_NONE) { doutc(cl, "%p under writeback\n", folio); folio_unlock(folio); continue; } doutc(cl, "waiting on writeback %p\n", folio); folio_wait_writeback(folio); folio_wait_private_2(folio); /* [DEPRECATED] */ } if (ceph_wbc.locked_pages == 0) lock_page(page); /* first page */ else if (!trylock_page(page)) break; This logic should exclude the ignoring of writeback state of folios/pages. Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Link: https://lore.kernel.org/r/20250205000249.123054-3-slava@dubeyko.com Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ceph: extend ceph_writeback_ctl for ceph_writepages_start() refactoringViacheslav Dubeyko
The ceph_writepages_start() has unreasonably huge size and complex logic that makes this method hard to understand. Current state of the method's logic makes bug fix really hard task. This patch extends the struct ceph_writeback_ctl with the goal to make ceph_writepages_start() method more compact and easy to understand by means of deep refactoring. Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Link: https://lore.kernel.org/r/20250205000249.123054-2-slava@dubeyko.com Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28ALSA: hda: Fix speakers on ASUS EXPERTBOOK P5405CSA 1.0Daniel Bárta
After some digging around I have found that this laptop has Cirrus's smart aplifiers connected to SPI bus (spi1-CSC3551:00-cs35l41-hda). To get them correctly detected and working I had to modify patch_realtek.c with ASUS EXPERTBOOK P5405CSA 1.0 SystemID (0x1043, 0x1f63) and add corresponding hda_quirk (ALC245_FIXUP_CS35L41_SPI_2). Signed-off-by: Daniel Bárta <daniel.barta@trustlab.cz> Link: https://patch.msgid.link/20250227161256.18061-2-daniel.barta@trustlab.cz Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-02-28ALSA: hda/realtek: Fix Asus Z13 2025 audioAntheas Kapenekakis
Use the basic quirk for this type of amplifier. Sound works in speakers, headphones, and microphone. Whereas none worked before. Tested-by: Kyle Gospodnetich <me@kylegospodneti.ch> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev> Link: https://patch.msgid.link/20250227175107.33432-3-lkml@antheas.dev Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-02-28ALSA: hda/realtek: Remove (revert) duplicate Ally X configAntheas Kapenekakis
In commit 1e9c708dc3ae ("ALSA: hda/tas2781: Add new quirk for Lenovo, ASUS, Dell projects") Baojun adds a bunch of projects to the file, including for the Ally X. Turns out the initial Ally X was not sorted properly, so the kernel had 2 quirks for it. The previous quirk overrode the new one due to being earlier and they are different. When AB testing, the normal pin fixup seems to work ok but causes a bit of a minor popping. Given the other config is more complicated and may cause undefined behavior, revert it. Fixes: 1e9c708dc3ae ("ALSA: hda/tas2781: Add new quirk for Lenovo, ASUS, Dell projects") Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev> Link: https://patch.msgid.link/20250227175107.33432-2-lkml@antheas.dev Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-02-28leds: leds-st1202: Fix NULL pointer access on race conditionManuel Fombuena
st1202_dt_init() calls devm_led_classdev_register_ext() before the internal data structures are properly set up, so the LEDs become visible to user space while being partially initialized, leading to a window where trying to access them causes a NULL pointer access. Move devm_led_classdev_register_ext() from DT initialization to the end of the probe function when DT and hardware are fully initialized and ready to interact with user space. Fixes: 259230378c65 ("leds: Add LED1202 I2C driver") Signed-off-by: Manuel Fombuena <fombuena@outlook.com> Link: https://lore.kernel.org/r/CWLP123MB54732771AC0CE5491B3C84DCC5C32@CWLP123MB5473.GBRP123.PROD.OUTLOOK.COM Signed-off-by: Lee Jones <lee@kernel.org>
2025-02-27Merge tag 'drm-fixes-2025-02-28' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "This week's fixes pull, amdgpu mostly, with some xe and a few misc others, the fb defio fix is bit of a change, but it avoids some nasty NULL pointer crashes due to defio assuming page backing in places it didn't have pages. amdgpu: - Legacy dpm suspend/resume fix - Runtime PM fix for DELL G5 SE - MAINTAINERS updates - Enforce Isolation fixes - mailmap update - EDID reading i2c fix - PSR fix - eDP fix - HPD interrupt handling fix - Clear memory fix amdkfd: - MQD handling fix vkms: - fix rounding error imagination: - header fix nouveau: - connector status fix fb/defio: - NULL ptr fix for defio drivers i915: - Fix encoder HW state readout for DP UHBR MST xe: - OA uapi fix (Umesh) - Userptr related fixes - Remove a duplicated register entry - Scheduler related fix to prevent exec races when freeing it" * tag 'drm-fixes-2025-02-28' of https://gitlab.freedesktop.org/drm/kernel: (25 commits) drm/fbdev-dma: Add shadow buffering for deferred I/O drm/nouveau: Do not override forced connector status drm/i915/dp_mst: Fix encoder HW state readout for UHBR MST drm/xe: cancel pending job timer before freeing scheduler drm/xe/regs: remove a duplicate definition for RING_CTL_SIZE(size) drm/imagination: remove unnecessary header include path drm/amdgpu: init return value in amdgpu_ttm_clear_buffer drm/amd/display: Fix HPD after gpu reset drm/amd/display: add a quirk to enable eDP0 on DP1 drm/amd/display: Disable PSR-SU on eDP panels MAINTAINERS: Update AMDGPU DML maintainers info drm/amd/display: restore edid reading from a given i2c adapter mailmap: Add entry for Rodrigo Siqueira MAINTAINERS: Change my role from Maintainer to Reviewer drm/amdgpu/mes: keep enforce isolation up to date drm/amdgpu/gfx: only call mes for enforce isolation if supported MAINTAINERS: update amdgpu maintainers list drm/amdgpu: disable BAR resize on Dell G5 SE drm/amdkfd: Preserve cp_hqd_pq_control on update_mqd amdgpu/pm/legacy: fix suspend/resume issues ...
2025-02-27stmmac: loongson: Pass correct arg to PCI functionPhilipp Stanner
pcim_iomap_regions() should receive the driver's name as its third parameter, not the PCI device's name. Define the driver name with a macro and use it at the appropriate places, including pcim_iomap_regions(). Cc: stable@vger.kernel.org # v5.14+ Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Tested-by: Henry Chen <chenx97@aosc.io> Link: https://patch.msgid.link/20250226085208.97891-2-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27nvmet-tcp: Fix a possible sporadic response drops in weakly ordered archMeir Elisha
The order in which queue->cmd and rcv_state are updated is crucial. If these assignments are reordered by the compiler, the worker might not get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the the correct reordering, set rcv_state using smp_store_release(). Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error") Signed-off-by: Meir Elisha <meir.elisha@volumez.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-27nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu()Maurizio Lombardi
nvme_tcp_recv_pdu() doesn't check the validity of the header length. When header digests are enabled, a target might send a packet with an invalid header length (e.g. 255), causing nvme_tcp_verify_hdgst() to access memory outside the allocated area and cause memory corruptions by overwriting it with the calculated digest. Fix this by rejecting packets with an unexpected header length. Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-27ftrace: Avoid potential division by zero in function_stat_show()Nikolay Kuratov
Check whether denominator expression x * (x - 1) * 1000 mod {2^32, 2^64} produce zero and skip stddev computation in that case. For now don't care about rec->counter * rec->counter overflow because rec->time * rec->time overflow will likely happen earlier. Cc: stable@vger.kernel.org Cc: Wen Yang <wenyang@linux.alibaba.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250206090156.1561783-1-kniv@yandex-team.ru Fixes: e31f7939c1c27 ("ftrace: Avoid potential division by zero in function profiler") Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-27selftests/ftrace: Let fprobe test consider already enabled functionsHeiko Carstens
The fprobe test fails on Fedora 41 since the fprobe test assumption that the number of enabled_functions is zero before the test starts is not necessarily true. Some user space tools, like systemd, add BPF programs that attach to functions. Those will show up in the enabled_functions table and must be taken into account by the fprobe test. Therefore count the number of lines of enabled_functions before tests start, and use that as base when comparing expected results. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/20250226142703.910860-1-hca@linux.ibm.com Fixes: e85c5e9792b9 ("selftests/ftrace: Update fprobe test to check enabled_functions file") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-27tracing: Fix bad hist from corrupting named_triggers listSteven Rostedt
The following commands causes a crash: ~# cd /sys/kernel/tracing/events/rcu/rcu_callback ~# echo 'hist:name=bad:keys=common_pid:onmax(bogus).save(common_pid)' > trigger bash: echo: write error: Invalid argument ~# echo 'hist:name=bad:keys=common_pid' > trigger Because the following occurs: event_trigger_write() { trigger_process_regex() { event_hist_trigger_parse() { data = event_trigger_alloc(..); event_trigger_register(.., data) { cmd_ops->reg(.., data, ..) [hist_register_trigger()] { data->ops->init() [event_hist_trigger_init()] { save_named_trigger(name, data) { list_add(&data->named_list, &named_triggers); } } } } ret = create_actions(); (return -EINVAL) if (ret) goto out_unreg; [..] ret = hist_trigger_enable(data, ...) { list_add_tail_rcu(&data->list, &file->triggers); <<<---- SKIPPED!!! (this is important!) [..] out_unreg: event_hist_unregister(.., data) { cmd_ops->unreg(.., data, ..) [hist_unregister_trigger()] { list_for_each_entry(iter, &file->triggers, list) { if (!hist_trigger_match(data, iter, named_data, false)) <- never matches continue; [..] test = iter; } if (test && test->ops->free) <<<-- test is NULL test->ops->free(test) [event_hist_trigger_free()] { [..] if (data->name) del_named_trigger(data) { list_del(&data->named_list); <<<<-- NEVER gets removed! } } } } [..] kfree(data); <<<-- frees item but it is still on list The next time a hist with name is registered, it causes an u-a-f bug and the kernel can crash. Move the code around such that if event_trigger_register() succeeds, the next thing called is hist_trigger_enable() which adds it to the list. A bunch of actions is called if get_named_trigger_data() returns false. But that doesn't need to be called after event_trigger_register(), so it can be moved up, allowing event_trigger_register() to be called just before hist_trigger_enable() keeping them together and allowing the file->triggers to be properly populated. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250227163944.1c37f85f@gandalf.local.home Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers") Reported-by: Tomas Glozar <tglozar@redhat.com> Tested-by: Tomas Glozar <tglozar@redhat.com> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Closes: https://lore.kernel.org/all/CAP4=nvTsxjckSBTz=Oe_UYh8keD9_sZC4i++4h72mJLic4_W4A@mail.gmail.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-27mailmap: remove unwanted entry for Antonio QuartulliAntonio Quartulli
antonio@openvpn.net is still used for sending patches under the OpenVPN Inc. umbrella, therefore this address should not be re-mapped. Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20250227-b4-ovpn-v20-1-93f363310834@openvpn.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28Merge tag 'drm-xe-fixes-2025-02-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes uAPI: - OA uapi fix (Umesh) Driver: - Userptr related fixes (Auld) - Remove a duplicated register entry (Mingong) - Scheduler related fix to prevent exec races when freeing it (Tejas) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Z8CSqJre1VCjPXt2@intel.com
2025-02-28Merge tag 'drm-intel-fixes-2025-02-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix encoder HW state readout for DP UHBR MST (Imre) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Z8CRM7XzlerbWSJy@intel.com
2025-02-27nvme-tcp: Fix a C2HTermReq error messageMaurizio Lombardi
In H2CTermReq, a FES with value 0x05 means "R2T Limit Exceeded"; but in C2HTermReq the same value has a different meaning (Data Transfer Limit Exceeded). Fixes: 84e009042d0f ("nvme-tcp: add basic support for the C2HTermReq PDU") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-27nvmet: remove old function prototypeMaurizio Lombardi
nvmet_subsys_nsid_exists() doesn't exist anymore Fixes: 74d16965d7ac ("nvmet-loop: avoid using mutex in IO hotpath") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-28Merge tag 'drm-misc-fixes-2025-02-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Fix a rounding error in vkms, a header fix for img, a connector status fix for nouveau, and a NULL pointer dereference fix for deferred IO drivers. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250227-antique-robust-earthworm-09dfd1@houat
2025-02-27Bluetooth: Add check for mgmt_alloc_skb() in mgmt_device_connected()Haoxiang Li
Add check for the return value of mgmt_alloc_skb() in mgmt_device_connected() to prevent null pointer dereference. Fixes: e96741437ef0 ("Bluetooth: mgmt: Make use of mgmt_send_event_skb in MGMT_EV_DEVICE_CONNECTED") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-02-27Bluetooth: Add check for mgmt_alloc_skb() in mgmt_remote_name()Haoxiang Li
Add check for the return value of mgmt_alloc_skb() in mgmt_remote_name() to prevent null pointer dereference. Fixes: ba17bb62ce41 ("Bluetooth: Fix skb allocation in mgmt_remote_name() & mgmt_device_connected()") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-02-27bluetooth: btusb: Initialize .owner field of force_poll_sync_fopsSalah Triki
Initialize .owner field of force_poll_sync_fops to THIS_MODULE in order to prevent btusb from being unloaded while its operations are in use. Fixes: 800fe5ec302e ("Bluetooth: btusb: Add support for queuing during polling interval") Signed-off-by: Salah Triki <salah.triki@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-02-28Merge tag 'amd-drm-fixes-6.14-2025-02-26' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.14-2025-02-26: amdgpu: - Legacy dpm suspend/resume fix - Runtime PM fix for DELL G5 SE - MAINTAINERS updates - Enforce Isolation fixes - mailmap update - EDID reading i2c fix - PSR fix - eDP fix - HPD interrupt handling fix - Clear memory fix amdkfd: - MQD handling fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226200342.3685347-1-alexander.deucher@amd.com