summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "Loongarch: - Clear LLBCTL if secondary mmu mapping changes - Add hypercall service support for usermode VMM x86: - Add a comment to kvm_mmu_do_page_fault() to explain why KVM performs a direct call to kvm_tdp_page_fault() when RETPOLINE is enabled - Ensure that all SEV code is compiled out when disabled in Kconfig, even if building with less brilliant compilers - Remove a redundant TLB flush on AMD processors when guest CR4.PGE changes - Use str_enabled_disabled() to replace open coded strings - Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's APICv cache prior to every VM-Enter - Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities instead of just those where KVM needs to manage state and/or explicitly enable the feature in hardware. Along the way, refactor the code to make it easier to add features, and to make it more self-documenting how KVM is handling each feature - Rework KVM's handling of VM-Exits during event vectoring; this plugs holes where KVM unintentionally puts the vCPU into infinite loops in some scenarios (e.g. if emulation is triggered by the exit), and brings parity between VMX and SVM - Add pending request and interrupt injection information to the kvm_exit and kvm_entry tracepoints respectively - Fix a relatively benign flaw where KVM would end up redoing RDPKRU when loading guest/host PKRU, due to a refactoring of the kernel helpers that didn't account for KVM's pre-checking of the need to do WRPKRU - Make the completion of hypercalls go through the complete_hypercall function pointer argument, no matter if the hypercall exits to userspace or not. Previously, the code assumed that KVM_HC_MAP_GPA_RANGE specifically went to userspace, and all the others did not; the new code need not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at all whether there was an exit to userspace or not - As part of enabling TDX virtual machines, support support separation of private/shared EPT into separate roots. When TDX will be enabled, operations on private pages will need to go through the privileged TDX Module via SEAMCALLs; as a result, they are limited and relatively slow compared to reading a PTE. The patches included in 6.14 allow KVM to keep a mirror of the private EPT in host memory, and define entries in kvm_x86_ops to operate on external page tables such as the TDX private EPT - The recently introduced conversion of the NX-page reclamation kthread to vhost_task moved the task under the main process. The task is created as soon as KVM_CREATE_VM was invoked and this, of course, broke userspace that didn't expect to see any child task of the VM process until it started creating its own userspace threads. In particular crosvm refuses to fork() if procfs shows any child task, so unbreak it by creating the task lazily. This is arguably a userspace bug, as there can be other kinds of legitimate worker tasks and they wouldn't impede fork(); but it's not like userspace has a way to distinguish kernel worker tasks right now. Should they show as "Kthread: 1" in proc/.../status? x86 - Intel: - Fix a bug where KVM updates hardware's APICv cache of the highest ISR bit while L2 is active, while ultimately results in a hardware-accelerated L1 EOI effectively being lost - Honor event priority when emulating Posted Interrupt delivery during nested VM-Enter by queueing KVM_REQ_EVENT instead of immediately handling the interrupt - Rework KVM's processing of the Page-Modification Logging buffer to reap entries in the same order they were created, i.e. to mark gfns dirty in the same order that hardware marked the page/PTE dirty - Misc cleanups Generic: - Cleanup and harden kvm_set_memory_region(); add proper lockdep assertions when setting memory regions and add a dedicated API for setting KVM-internal memory regions. The API can then explicitly disallow all flags for KVM-internal memory regions - Explicitly verify the target vCPU is online in kvm_get_vcpu() to fix a bug where KVM would return a pointer to a vCPU prior to it being fully online, and give kvm_for_each_vcpu() similar treatment to fix a similar flaw - Wait for a vCPU to come online prior to executing a vCPU ioctl, to fix a bug where userspace could coerce KVM into handling the ioctl on a vCPU that isn't yet onlined - Gracefully handle xarray insertion failures; even though such failures are impossible in practice after xa_reserve(), reserving an entry is always followed by xa_store() which does not know (or differentiate) whether there was an xa_reserve() before or not RISC-V: - Zabha, Svvptc, and Ziccrse extension support for guests. None of them require anything in KVM except for detecting them and marking them as supported; Zabha adds byte and halfword atomic operations, while the others are markers for specific operation of the TLB and of LL/SC instructions respectively - Virtualize SBI system suspend extension for Guest/VM - Support firmware counters which can be used by the guests to collect statistics about traps that occur in the host Selftests: - Rework vcpu_get_reg() to return a value instead of using an out-param, and update all affected arch code accordingly - Convert the max_guest_memory_test into a more generic mmu_stress_test. The basic gist of the "conversion" is to have the test do mprotect() on guest memory while vCPUs are accessing said memory, e.g. to verify KVM and mmu_notifiers are working as intended - Play nice with treewrite builds of unsupported architectures, e.g. arm (32-bit), as KVM selftests' Makefile doesn't do anything to ensure the target architecture is actually one KVM selftests supports - Use the kernel's $(ARCH) definition instead of the target triple for arch specific directories, e.g. arm64 instead of aarch64, mainly so as not to be different from the rest of the kernel - Ensure that format strings for logging statements are checked by the compiler even when the logging statement itself is disabled - Attempt to whack the last LLC references/misses mole in the Intel PMU counters test by adding a data load and doing CLFLUSH{OPT} on the data instead of the code being executed. It seems that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters - Fix a flaw in the Intel PMU counters test where it asserts that events are counting correctly without actually knowing what the events count given the underlying hardware; this can happen if Intel reuses a formerly microarchitecture-specific event encoding as an architectural event, as was the case for Top-Down Slots" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (151 commits) kvm: defer huge page recovery vhost task to later KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault() KVM: Disallow all flags for KVM-internal memslots KVM: x86: Drop double-underscores from __kvm_set_memory_region() KVM: Add a dedicated API for setting KVM-internal memslots KVM: Assert slots_lock is held when setting memory regions KVM: Open code kvm_set_memory_region() into its sole caller (ioctl() API) LoongArch: KVM: Add hypercall service support for usermode VMM LoongArch: KVM: Clear LLBCTL if secondary mmu mapping is changed KVM: SVM: Use str_enabled_disabled() helper in svm_hardware_setup() KVM: VMX: read the PML log in the same order as it was written KVM: VMX: refactor PML terminology KVM: VMX: Fix comment of handle_vmx_instruction() KVM: VMX: Reinstate __exit attribute for vmx_exit() KVM: SVM: Use str_enabled_disabled() helper in sev_hardware_setup() KVM: x86: Avoid double RDPKRU when loading host/guest PKRU KVM: x86: Use LVT_TIMER instead of an open coded literal RISC-V: KVM: Add new exit statstics for redirected traps RISC-V: KVM: Update firmware counters for various events RISC-V: KVM: Redirect instruction access fault trap to guest ...
2025-01-25Merge tag 'hyperv-next-signed-20250123' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Introduce a new set of Hyper-V headers in include/hyperv and replace the old hyperv-tlfs.h with the new headers (Nuno Das Neves) - Fixes for the Hyper-V VTL mode (Roman Kisel) - Fixes for cpu mask usage in Hyper-V code (Michael Kelley) - Document the guest VM hibernation behaviour (Michael Kelley) - Miscellaneous fixes and cleanups (Jacob Pan, John Starks, Naman Jain) * tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Documentation: hyperv: Add overview of guest VM hibernation hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id() hyperv: Do not overlap the hvcall IO areas in get_vtl() hyperv: Enable the hypercall output page for the VTL mode hv_balloon: Fallback to generic_online_page() for non-HV hot added mem Drivers: hv: vmbus: Log on missing offers if any Drivers: hv: vmbus: Wait for boot-time offers during boot and resume uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup iommu/hyper-v: Don't assume cpu_possible_mask is dense Drivers: hv: Don't assume cpu_possible_mask is dense x86/hyperv: Don't assume cpu_possible_mask is dense hyperv: Remove the now unused hyperv-tlfs.h files hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h hyperv: Add new Hyper-V headers in include/hyperv hyperv: Clean up unnecessary #includes hyperv: Move hv_connection_id to hyperv-tlfs.h
2025-01-25LoongArch: Adjust SETUP_SLEEP and SETUP_WAKEUPHuacai Chen
SETUP_SLEEP should only save the GPR context, which is symmetric to SETUP_WAKEUP, so move the acpi_saved_sp handling out of SETUP_SLEEP. Move "addi.d sp, sp, PT_SIZE" into SETUP_WAKEUP for the same reason. No functional changes. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Refactor bug_handler() implementationHuacai Chen
1. Early return for user mode triggered exception with all types. 2. Give a chance to call fixup_exception() for default types (like S390). Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Add pgprot_nx() implementationHuacai Chen
Commit cca98e9f8b5ebcd964 ("mm: enforce that vmap can't map pages executable") enforces the W^X protection by not allowing remapping existing pages as executable. Add LoongArch bits so that LoongArch can benefit the same protection. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Correct the __switch_to() prototype in commentsHuacai Chen
Correct the __switch_to() prototype in comments, keep it be the same as the declaration in switch_to.h. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Correct the cacheinfo sharing informationHuacai Chen
SMT cores and their sibling cores share the same L1 and L2 private caches (of course last level cache is also shared), so correct the cacheinfo sharing information to let shared_cpu_map correctly reflect this relationship. Below is the output of "lscpu" on Loongson-3A6000 (4 cores, 8 threads). 1. Before patch: L1d: 512 KiB (8 instances) L1i: 512 KiB (8 instances) L2: 2 MiB (8 instances) L3: 16 MiB (1 instance) 2. After patch: L1d: 256 KiB (4 instances) L1i: 256 KiB (4 instances) L2: 1 MiB (4 instances) L3: 16 MiB (1 instance) Reported-by: Chao Li <lichao@loongson.cn> Signed-off-by: Juxin Gao <gaojuxin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Derive timer max_delta from PRCFG1's timer_bitsJiaxun Yang
As per arch spec, maximum timer bits is configurable and should not be hardcoded in any way. Probe timer bits from PRCFG1 and use that to determine the clockevent's max_delta to be conformance. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Disable FIX_EARLYCON_MEM when ARCH_IOREMAP is enabledJiaxun Yang
When ARCH_IOREMAP is enabled, we are using always accessible DMW for ioremap(). It makes no sense to create a dedicated mapping for earlycon given that we can access the region via DMW. Disable FIX_EARLYCON_MEM when ARCH_IOREMAP is selected. This can ease debugging for early mapping issues. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Migrate to the generic rule for built-in DTBMasahiro Yamada
Commit 654102df2ac2 ("kbuild: add generic support for built-in boot DTBs") introduced generic support for built-in DTBs. Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25Merge tag 'irq-core-2025-01-21' into loongarch-nextHuacai Chen
LoongArch architecture changes for 6.14 depend on the irq-core changes (AVECINTC fixes) to work well for NUMA, so merge them to create a base.
2025-01-25kdb: Remove unused flags stackDr. David Alan Gilbert
kdb_restore_flags() and kdb_save_flags() were added in 2010 by commit 5d5314d6795f ("kdb: core for kgdb back end (1 of 2)") but have remained unused. Remove them, and their associated storage. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20250112012049.319515-1-linux@treblig.org Signed-off-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
2025-01-25kdb: use kmap_local_page()Zhang Heng
Use kmap_local_page() instead of kmap_atomic() which has been deprecated. Signed-off-by: Zhang Heng <zhangheng@kylinos.cn> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20241223085420.1815930-1-zhangheng@kylinos.cn Signed-off-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
2025-01-24ocfs2: use str_yes_no() and str_no_yes() helper functionsThorsten Blum
Remove hard-coded strings by using the str_yes_no() and str_no_yes() helper functions. Link: https://lkml.kernel.org/r/20250117091335.1189-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24include/linux/lz4.h: add some missing macrosGao Xiang
Currently, LZ4_DISTANCE_MAX and LZ4_DECOMPRESS_INPLACE_MARGIN are defined in the erofs subsystem for LZ4 in-place decompression, which is somewhat unsuitable since they should belong to the LZ4 itself and may change with future LZ4 codebase updates. Move them to include/linux/lz4.h to match the upstream LZ4 library [1]. No logic changes. [1] https://github.com/lz4/lz4/blob/v1.10.0/lib/lz4.h#L670 Link: https://lkml.kernel.org/r/20250114130454.1191150-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Yann Collet <yann.collet.73@gmail.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Chao Yu <chao@kernel.org> Cc: Yue Hu <zbestahu@gmail.com> Cc; Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Sandeep Dhavale <dhavale@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24Xarray: use xa_mark_t in xas_squash_marks() to keep code consistentKemeng Shi
Besides xas_squash_marks(), all functions use xa_mark_t type to iterate all possible marks. Use xa_mark_t in xas_squash_marks() to keep code consistent. Link: https://lkml.kernel.org/r/20241213122523.12764-6-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24Xarray: remove repeat check in xas_squash_marks()Kemeng Shi
Caller of xas_squash_marks() has ensured xas->xa_sibs is non-zero. Just remove repeat check of xas->xa_sibs in xas_squash_marks(). Link: https://lkml.kernel.org/r/20241213122523.12764-5-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24Xarray: distinguish large entries correctly in xas_split_alloc()Kemeng Shi
We don't support large entries which expand two more level xa_node in split. For case "xas->xa_shift + 2 * XA_CHUNK_SHIFT == order", we also need two level of xa_node to expand. Distinguish entry as large entry in case "xas->xa_shift + 2 * XA_CHUNK_SHIFT == order". As max order of folio in pagecache (MAX_PAGECACHE_ORDER) is <= (XA_CHUNK_SHIFT * 2 - 1), this change is more likely a cleanup... Link: https://lkml.kernel.org/r/20241213122523.12764-4-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24Xarray: move forward index correctly in xas_pause()Kemeng Shi
After xas_load(), xas->index could point to mid of found multi-index entry and xas->index's bits under node->shift maybe non-zero. The afterward xas_pause() will move forward xas->index with xa->node->shift with bits under node->shift un-masked and thus skip some index unexpectedly. Consider following case: Assume XA_CHUNK_SHIFT is 4. xa_store_range(xa, 16, 31, ...) xa_store(xa, 32, ...) XA_STATE(xas, xa, 17); xas_for_each(&xas,...) xas_load(&xas) /* xas->index = 17, xas->xa_offset = 1, xas->xa_node->xa_shift = 4 */ xas_pause() /* xas->index = 33, xas->xa_offset = 2, xas->xa_node->xa_shift = 4 */ As we can see, index of 32 is skipped unexpectedly. Fix this by mask bit under node->xa_shift when move forward index in xas_pause(). For now, this will not cause serious problems. Only minor problem like cachestat return less number of page status could happen. Link: https://lkml.kernel.org/r/20241213122523.12764-3-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24Xarray: do not return sibling entries from xas_find_marked()Kemeng Shi
Patch series "Fixes and cleanups to xarray", v5. This series contains some random fixes and cleanups to xarray. Patch 1-2 are fixes and patch 3-6 are cleanups. More details can be found in respective patches. This patch (of 5): Similar to issue fixed in commit cbc02854331ed ("XArray: Do not return sibling entries from xa_load()"), we may return sibling entries from xas_find_marked as following: Thread A: Thread B: xa_store_range(xa, entry, 6, 7, gfp); xa_set_mark(xa, 6, mark) XA_STATE(xas, xa, 6); xas_find_marked(&xas, 7, mark); offset = xas_find_chunk(xas, advance, mark); [offset is 6 which points to a valid entry] xa_store_range(xa, entry, 4, 7, gfp); entry = xa_entry(xa, node, 6); [entry is a sibling of 4] if (!xa_is_node(entry)) return entry; Skip sibling entry like xas_find() does to protect caller from seeing sibling entry from xas_find_marked() or caller may use sibling entry as a valid entry and crash the kernel. Besides, load_race() test is modified to catch mentioned issue and modified load_race() only passes after this fix is merged. Here is an example how this bug could be triggerred in tmpfs which enables large folio in mapping: Let's take a look at involved racer: 1. How pages could be created and dirtied in shmem file. write ksys_write vfs_write new_sync_write shmem_file_write_iter generic_perform_write shmem_write_begin shmem_get_folio shmem_allowable_huge_orders shmem_alloc_and_add_folios shmem_alloc_folio __folio_set_locked shmem_add_to_page_cache XA_STATE_ORDER(..., index, order) xax_store() shmem_write_end folio_mark_dirty() 2. How dirty pages could be deleted in shmem file. ioctl do_vfs_ioctl file_ioctl ioctl_preallocate vfs_fallocate shmem_fallocate shmem_truncate_range shmem_undo_range truncate_inode_folio filemap_remove_folio page_cache_delete xas_store(&xas, NULL); 3. How dirty pages could be lockless searched sync_file_range ksys_sync_file_range __filemap_fdatawrite_range filemap_fdatawrite_wbc do_writepages writeback_use_writepage writeback_iter writeback_get_folio filemap_get_folios_tag find_get_entry folio = xas_find_marked() folio_try_get(folio) Kernel will crash as following: 1.Create 2.Search 3.Delete /* write page 2,3 */ write ... shmem_write_begin XA_STATE_ORDER(xas, i_pages, index = 2, order = 1) xa_store(&xas, folio) shmem_write_end folio_mark_dirty() /* sync page 2 and page 3 */ sync_file_range ... find_get_entry folio = xas_find_marked() /* offset will be 2 */ offset = xas_find_chunk() /* delete page 2 and page 3 */ ioctl ... xas_store(&xas, NULL); /* write page 0-3 */ write ... shmem_write_begin XA_STATE_ORDER(xas, i_pages, index = 0, order = 2) xa_store(&xas, folio) shmem_write_end folio_mark_dirty(folio) /* get sibling entry from offset 2 */ entry = xa_entry(.., 2) /* use sibling entry as folio and crash kernel */ folio_try_get(folio) Link: https://lkml.kernel.org/r/20241213122523.12764-1-shikemeng@huaweicloud.com Link: https://lkml.kernel.org/r/20241213122523.12764-2-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Mattew Wilcox <willy@infradead.org> [English fixes] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24ipc/util.c: complete the kernel-doc function descriptionsRandy Dunlap
Move the function descriptive comments so that they conform to kernel-doc format, eliminating the kernel-doc warnings. util.c:618: warning: missing initial short description on line: * ipc_obtain_object_idr util.c:640: warning: missing initial short description on line: * ipc_obtain_object_check Link: https://lkml.kernel.org/r/20250111062905.910576-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24gcov: clang: use correct function param namesRandy Dunlap
Fix the function parameter names to match the function so that the kernel-doc warnings disappear. clang.c:273: warning: Function parameter or struct member 'dst' not described in 'gcov_info_add' clang.c:273: warning: Function parameter or struct member 'src' not described in 'gcov_info_add' clang.c:273: warning: Excess function parameter 'dest' description in 'gcov_info_add' clang.c:273: warning: Excess function parameter 'source' description in 'gcov_info_add' Link: https://lkml.kernel.org/r/20250111062944.910638-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24latencytop: use correct kernel-doc format for func paramsRandy Dunlap
Use a ':' instead of a '-' after function parameters to eliminate kernel-doc warnings. kernel/latencytop.c:177: warning: Function parameter or struct member 'tsk' not described in '__account_scheduler_latency' ../kernel/latencytop.c:177: warning: Function parameter or struct member 'usecs' not described in '__account_scheduler_latency' ../kernel/latencytop.c:177: warning: Function parameter or struct member 'inter' not described in '__account_scheduler_latency' Link: https://lkml.kernel.org/r/20250111063019.910730-1-rdunlap@infradead.org Fixes: ad0b0fd554df ("sched, latencytop: incorporate review feedback from Andrew Morton") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24minmax.h: remove some #defines that are only expanded onceDavid Laight
The bodies of __signed_type_use() and __unsigned_type_use() are much the same size as their names - so put the bodies in the only line that expands them. Similarly __signed_type() is defined separately for 64bit and then used exactly once just below. Change the test for __signed_type from CONFIG_64BIT to one based on gcc defined macros so that the code is valid if it gets used outside of a kernel build. Link: https://lkml.kernel.org/r/9386d1ebb8974fbabbed2635160c3975@AcuMS.aculab.com Signed-off-by: David Laight <david.laight@aculab.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24minmax.h: simplify the variants of clamp()David Laight
Always pass a 'type' through to __clamp_once(), pass '__auto_type' from clamp() itself. The expansion of __types_ok3() is reasonable so it isn't worth the added complexity of avoiding it when a fixed type is used for all three values. Link: https://lkml.kernel.org/r/8f69f4deac014f558bab186444bac2e8@AcuMS.aculab.com Signed-off-by: David Laight <david.laight@aculab.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24minmax.h: move all the clamp() definitions after the min/max() onesDavid Laight
At some point the definitions for clamp() got added in the middle of the ones for min() and max(). Re-order the definitions so they are more sensibly grouped. Link: https://lkml.kernel.org/r/8bb285818e4846469121c8abc3dfb6e2@AcuMS.aculab.com Signed-off-by: David Laight <david.laight@aculab.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()David Laight
Use BUILD_BUG_ON_MSG(statically_true(ulo > uhi), ...) for the sanity check of the bounds in clamp(). Gives better error coverage and one less expansion of the arguments. Link: https://lkml.kernel.org/r/34d53778977747f19cce2abb287bb3e6@AcuMS.aculab.com Signed-off-by: David Laight <david.laight@aculab.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24minmax.h: reduce the #define expansion of min(), max() and clamp()David Laight
Since the test for signed values being non-negative only relies on __builtion_constant_p() (not is_constexpr()) it can use the 'ux' variable instead of the caller supplied expression. This means that the #define parameters are only expanded twice. Once in the code and once quoted in the error message. Link: https://lkml.kernel.org/r/051afc171806425da991908ed8688a98@AcuMS.aculab.com Signed-off-by: David Laight <david.laight@aculab.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24minmax.h: update some commentsDavid Laight
- Change three to several. - Remove the comment about retaining constant expressions, no longer true. - Realign to nearer 80 columns and break on major punctiation. - Add a leading comment to the block before __signed_type() and __is_nonneg() Otherwise the block explaining the cast is a bit 'floating'. Reword the rest of that comment to improve readability. Link: https://lkml.kernel.org/r/85b050c81c1d4076aeb91a6cded45fee@AcuMS.aculab.com Signed-off-by: David Laight <david.laight@aculab.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24minmax.h: add whitespace around operators and after commasDavid Laight
Patch series "minmax.h: Cleanups and minor optimisations". Some tidyups and minor changes to minmax.h. This patch (of 7): Link: https://lkml.kernel.org/r/c50365d214e04f9ba256d417c8bebbc0@AcuMS.aculab.com Link: https://lkml.kernel.org/r/f04b2e1310244f62826267346fde0553@AcuMS.aculab.com Signed-off-by: David Laight <david.laight@aculab.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24nilfs2: do not update mtime of renamed directory that is not movedRyusuke Konishi
A minor issue with nilfs_rename, originating from an old ext2 implementation, is that the mtime is updated even if the rename target is a directory and it is renamed within the same directory, rather than moved to a different directory. In this case, the child directory being renamed does not change in any way, so changing its mtime is unnecessary according to the specification, and can unnecessarily confuse backup tools. In ext2, this issue was fixed by commit 39fe7557b4d6 ("ext2: Do not update mtime of a moved directory") and a few subsequent fixes, but it remained in nilfs2. Fix this issue by not calling nilfs_set_link(), which rewrites the inode number of the directory entry that refers to the parent directory, when the move target is a directory and the source and destination are the same directory. Here, the directory to be moved only needs to be read if the inode number of the parent directory is rewritten with nilfs_set_link, so also adjust the execution conditions of the preparation work to avoid unnecessary directory reads. Link: https://lkml.kernel.org/r/20250111143518.7901-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24nilfs2: handle errors that nilfs_prepare_chunk() may returnRyusuke Konishi
Patch series "nilfs2: fix issues with rename operations". This series fixes BUG_ON check failures reported by syzbot around rename operations, and a minor behavioral issue where the mtime of a child directory changes when it is renamed instead of moved. This patch (of 2): The directory manipulation routines nilfs_set_link() and nilfs_delete_entry() rewrite the directory entry in the folio/page previously read by nilfs_find_entry(), so error handling is omitted on the assumption that nilfs_prepare_chunk(), which prepares the buffer for rewriting, will always succeed for these. And if an error is returned, it triggers the legacy BUG_ON() checks in each routine. This assumption is wrong, as proven by syzbot: the buffer layer called by nilfs_prepare_chunk() may call nilfs_get_block() if necessary, which may fail due to metadata corruption or other reasons. This has been there all along, but improved sanity checks and error handling may have made it more reproducible in fuzzing tests. Fix this issue by adding missing error paths in nilfs_set_link(), nilfs_delete_entry(), and their caller nilfs_rename(). Link: https://lkml.kernel.org/r/20250111143518.7901-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20250111143518.7901-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+32c3706ebf5d95046ea1@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=32c3706ebf5d95046ea1 Reported-by: syzbot+1097e95f134f37d9395c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1097e95f134f37d9395c Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24CREDITS: fix spelling mistakeTanya Agarwal
Fix spelling error identified using codespell tool. Link: https://lkml.kernel.org/r/20250111194709.51133-1-tanyaagarwal25699@gmail.com Signed-off-by: Tanya Agarwal <tanyaagarwal25699@gmail.com> Cc: Anup Sharma <anupnewsmail@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24nilfs2: revise the return value description style for consistency.Ryusuke Konishi
Also for comments that do not cause kernel-doc warnings (those that list multiple error codes), revise the return value description style to match Brian G.'s suggestion of "..., or one of the following negative error codes on failure:". Link: https://lkml.kernel.org/r/CAAq45aNh1qV8P6XgDhKeNstT=PvcPUaCXsAF-f9rvmzznsZL5A@mail.gmail.com Link: https://lkml.kernel.org/r/20250110010530.21872-8-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: "Brian G ." <gissf1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24nilfs2: add missing return value kernel-doc descriptionsRyusuke Konishi
There are a number of kernel-doc comments for functions that are missing return values, which also causes a number of warnings when the kernel-doc script is run with the "-Wall" option. Fix this issue by adding proper return value descriptions, and improve code maintainability. Link: https://lkml.kernel.org/r/20250110010530.21872-7-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: "Brian G ." <gissf1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24nilfs2: correct return value kernel-doc descriptions for the restRyusuke Konishi
Similar to the previous changes to fix return value descriptions, this fixes the format of the return value descriptions of functions for the rest. Link: https://lkml.kernel.org/r/20250110010530.21872-6-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: "Brian G ." <gissf1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24nilfs2: correct return value kernel-doc descriptions for metadata filesRyusuke Konishi
Similar to the previous changes to fix return value descriptions, this fixes the format of the return value descriptions for metadata file functions other than sufile. Link: https://lkml.kernel.org/r/20250110010530.21872-5-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: "Brian G ." <gissf1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24nilfs2: correct return value kernel-doc descriptions for sufileRyusuke Konishi
Similar to the previous changes to fix return value descriptions, this fixes the format of the return value descriptions of functions for sufile-related functions, eliminating a dozen warnings emitted by the kernel-doc script. Link: https://lkml.kernel.org/r/20250110010530.21872-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: "Brian G ." <gissf1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24nilfs2: correct return value kernel-doc descriptions for bmap functionsRyusuke Konishi
Similar to the previous patch to fix the ioctl return value descriptions, this fixes the format of the return value descriptions for bmap (and btree)-related functions, which was causing the kernel-doc script to emit a number of warnings. Link: https://lkml.kernel.org/r/20250110010530.21872-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: "Brian G ." <gissf1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24nilfs2: correct return value kernel-doc descriptions for ioctl functionsRyusuke Konishi
Patch series "nilfs2: fix kernel-doc comments for function return values", v2. This series fixes the inadequacies in the return value descriptions in nilfs2's kernel-doc comments (mainly incorrect formatting), as well as the lack of return value descriptions themselves, and fixes most of the remaining warnings that are output when the kernel-doc script is run with the "-Wall" option. This patch (of 7): In the kernel-doc comments for functions, there are many cases where the format of the return value description is inaccurate, such as "Return Value: ...", which causes many warnings to be output when the kernel-doc script is executed with the "-Wall" option. This fixes such incorrectly formatted return value descriptions for ioctl functions. Link: https://lkml.kernel.org/r/20250110010530.21872-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20250110010530.21872-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: "Brian G ." <gissf1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24checkpatch: don't warn about extra parentheses in staging/Dan Carpenter
This "Unnecessary parentheses" warning is disabled for drivers/staging unless the --strict option is used. Really, we don't want it at all even if the --strict option is used. Link: https://lkml.kernel.org/r/c7278d21-d96c-4c1e-b3bf-f82b8decc5df@stanley.mountain Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Andy Whitcroft <apw@canonical.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24hung_task: add task->flags, blocked by coredump to logOxana Kharitonova
Resending this patch as I haven't received feedback on my initial submission https://lore.kernel.org/all/20241204182953.10854-1-oxana@cloudflare.com/ For the processes which are terminated abnormally the kernel can provide a coredump if enabled. When the coredump is performed, the process and all its threads are put into the D state (TASK_UNINTERRUPTIBLE | TASK_FREEZABLE). On the other hand, we have kernel thread khungtaskd which monitors the processes in the D state. If the task stuck in the D state more than kernel.hung_task_timeout_secs, the hung_task alert appears in the kernel log. The higher memory usage of a process, the longer it takes to create coredump, the longer tasks are in the D state. We have hung_task alerts for the processes with memory usage above 10Gb. Although, our kernel.hung_task_timeout_secs is 10 sec when the default is 120 sec. Adding additional information to the log that the task is blocked by coredump will help with monitoring. Another approach might be to completely filter out alerts for such tasks, but in that case we would lose transparency about what is putting pressure on some system resources, e.g. we saw an increase in I/O when coredump occurs due its writing to disk. Additionally, it would be helpful to have task_struct->flags in the log from the function sched_show_task(). Currently it prints task_struct->thread_info->flags, this seems misleading as the line starts with "task:xxxx". [akpm@linux-foundation.org: fix printk control string] Link: https://lkml.kernel.org/r/20250110160328.64947-1-oxana@cloudflare.com Signed-off-by: Oxana Kharitonova <oxana@cloudflare.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Segall <bsegall@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24checkpatch: remove migrated RCU APIs from deprecated_apisDavid Reaver
The deprecated_apis map was created in [1] so checkpatch would flag deprecated RCU APIs. These deprecated APIs have since been removed from the kernel. This patch removes them from this map so checkpatch doesn't waste time looking for them, and so readers of checkpatch looking for deprecated APIs don't waste time searching for them. Link: https://lore.kernel.org/all/20181111192904.3199-13-paulmck@linux.ibm.com/ [1] Link: https://lkml.kernel.org/r/20250108192456.47871-1-me@davidreaver.com Signed-off-by: David Reaver <me@davidreaver.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24nilfs2: protect access to buffers with no active referencesRyusuke Konishi
nilfs_lookup_dirty_data_buffers(), which iterates through the buffers attached to dirty data folios/pages, accesses the attached buffers without locking the folios/pages. For data cache, nilfs_clear_folio_dirty() may be called asynchronously when the file system degenerates to read only, so nilfs_lookup_dirty_data_buffers() still has the potential to cause use after free issues when buffers lose the protection of their dirty state midway due to this asynchronous clearing and are unintentionally freed by try_to_free_buffers(). Eliminate this race issue by adjusting the lock section in this function. Link: https://lkml.kernel.org/r/20250107200202.6432-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24nilfs2: do not force clear folio if buffer is referencedRyusuke Konishi
Patch series "nilfs2: protect busy buffer heads from being force-cleared". This series fixes the buffer head state inconsistency issues reported by syzbot that occurs when the filesystem is corrupted and falls back to read-only, and the associated buffer head use-after-free issue. This patch (of 2): Syzbot has reported that after nilfs2 detects filesystem corruption and falls back to read-only, inconsistencies in the buffer state may occur. One of the inconsistencies is that when nilfs2 calls mark_buffer_dirty() to set a data or metadata buffer as dirty, but it detects that the buffer is not in the uptodate state: WARNING: CPU: 0 PID: 6049 at fs/buffer.c:1177 mark_buffer_dirty+0x2e5/0x520 fs/buffer.c:1177 ... Call Trace: <TASK> nilfs_palloc_commit_alloc_entry+0x4b/0x160 fs/nilfs2/alloc.c:598 nilfs_ifile_create_inode+0x1dd/0x3a0 fs/nilfs2/ifile.c:73 nilfs_new_inode+0x254/0x830 fs/nilfs2/inode.c:344 nilfs_mkdir+0x10d/0x340 fs/nilfs2/namei.c:218 vfs_mkdir+0x2f9/0x4f0 fs/namei.c:4257 do_mkdirat+0x264/0x3a0 fs/namei.c:4280 __do_sys_mkdirat fs/namei.c:4295 [inline] __se_sys_mkdirat fs/namei.c:4293 [inline] __x64_sys_mkdirat+0x87/0xa0 fs/namei.c:4293 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The other is when nilfs_btree_propagate(), which propagates the dirty state to the ancestor nodes of a b-tree that point to a dirty buffer, detects that the origin buffer is not dirty, even though it should be: WARNING: CPU: 0 PID: 5245 at fs/nilfs2/btree.c:2089 nilfs_btree_propagate+0xc79/0xdf0 fs/nilfs2/btree.c:2089 ... Call Trace: <TASK> nilfs_bmap_propagate+0x75/0x120 fs/nilfs2/bmap.c:345 nilfs_collect_file_data+0x4d/0xd0 fs/nilfs2/segment.c:587 nilfs_segctor_apply_buffers+0x184/0x340 fs/nilfs2/segment.c:1006 nilfs_segctor_scan_file+0x28c/0xa50 fs/nilfs2/segment.c:1045 nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1216 [inline] nilfs_segctor_collect fs/nilfs2/segment.c:1540 [inline] nilfs_segctor_do_construct+0x1c28/0x6b90 fs/nilfs2/segment.c:2115 nilfs_segctor_construct+0x181/0x6b0 fs/nilfs2/segment.c:2479 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2587 [inline] nilfs_segctor_thread+0x69e/0xe80 fs/nilfs2/segment.c:2701 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Both of these issues are caused by the callbacks that handle the page/folio write requests, forcibly clear various states, including the working state of the buffers they hold, at unexpected times when they detect read-only fallback. Fix these issues by checking if the buffer is referenced before clearing the page/folio state, and skipping the clear if it is. Link: https://lkml.kernel.org/r/20250107200202.6432-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20250107200202.6432-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b2b14916b77acf8626d7 Reported-by: syzbot+d98fd19acd08b36ff422@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=d98fd19acd08b36ff422 Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Tested-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24ocfs2: remove parameter parent_fe_bh from __ocfs2_mknod_lockedSu Yue
The parameter is not used in __ocfs2_mknod_locked(). So remove it. No functional change. Link: https://lkml.kernel.org/r/20250106140634.92241-1-glass.su@suse.com Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24ocfs2: mark dquot as inactive if failed to start trans while releasing dquotSu Yue
While running fstests generic/329, the kernel workqueue quota_release_workfn is dead looping in calling ocfs2_release_dquot(). The ocfs2 state is already readonly but ocfs2_release_dquot wants to start a transaction but fails and returns. ===================================================================== [ 2918.123602 ][ T275 ] On-disk corruption discovered. Please run fsck.ocfs2 once the filesystem is unmounted. [ 2918.124034 ][ T275 ] (kworker/u135:1,275,11):ocfs2_release_dquot:765 ERROR: status = -30 [ 2918.124452 ][ T275 ] (kworker/u135:1,275,11):ocfs2_release_dquot:795 ERROR: status = -30 [ 2918.124883 ][ T275 ] (kworker/u135:1,275,11):ocfs2_start_trans:357 ERROR: status = -30 [ 2918.125276 ][ T275 ] OCFS2: abort (device dm-0): ocfs2_start_trans: Detected aborted journal [ 2918.125710 ][ T275 ] On-disk corruption discovered. Please run fsck.ocfs2 once the filesystem is unmounted. ===================================================================== ocfs2_release_dquot() is much like dquot_release(), which is called by ext4 to handle similar situation. So here fix it by marking the dquot as inactive like what dquot_release() does. Link: https://lkml.kernel.org/r/20250106140653.92292-1-glass.su@suse.com Fixes: 9e33d69f553a ("ocfs2: Implementation of local and global quota file handling") Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24ocfs2: check tl->count of truncate log inode in ocfs2_get_truncate_log_infoSu Yue
syz reported: (syz-executor404,5313,0):ocfs2_truncate_log_append:5874 ERROR: bug expression: tl_count > ocfs2_truncate_recs_per_inode(osb->sb) || tl_count == 0 (syz-executor404,5313,0):ocfs2_truncate_log_append:5874 ERROR: Truncate record count on #77 invalid wanted 39, actual 2087 ------------[ cut here ]------------ kernel BUG at fs/ocfs2/alloc.c:5874! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5313 Comm: syz-executor404 Not tainted 6.12.0-rc5-syzkaller-00299-g11066801dd4b #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:ocfs2_truncate_log_append+0x9a8/0x9c0 fs/ocfs2/alloc.c:5868 RSP: 0018:ffffc9000cf16f40 EFLAGS: 00010292 RAX: b4b54f1d10640800 RBX: 0000000000000027 RCX: b4b54f1d10640800 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: ffffc9000cf17070 R08: ffffffff8174a14c R09: 1ffff11003f8519a R10: dffffc0000000000 R11: ffffed1003f8519b R12: 1ffff110085f5f58 R13: ffffff3800000000 R14: 000000000000004d R15: ffff8880438f0008 FS: 00005555722df380(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000002000f000 CR3: 000000004010e000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ocfs2_remove_btree_range+0x1303/0x1860 fs/ocfs2/alloc.c:5789 ocfs2_remove_inode_range+0xff3/0x29f0 fs/ocfs2/file.c:1907 ocfs2_reflink_remap_extent fs/ocfs2/refcounttree.c:4537 [inline] ocfs2_reflink_remap_blocks+0xcd4/0x1f30 fs/ocfs2/refcounttree.c:4684 ocfs2_remap_file_range+0x5fa/0x8d0 fs/ocfs2/file.c:2736 vfs_copy_file_range+0xc07/0x1510 fs/read_write.c:1615 __do_sys_copy_file_range fs/read_write.c:1705 [inline] __se_sys_copy_file_range+0x3f2/0x5d0 fs/read_write.c:1668 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fd327167af9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 61 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe6b8e22e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000146 RAX: ffffffffffffffda RBX: 00007fd3271b005e RCX: 00007fd327167af9 RDX: 0000000000000006 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00007fd3271de610 R08: 000000000000d8c2 R09: 0000000000000000 R10: 0000000020000640 R11: 0000000000000246 R12: 0000000000000001 R13: 00007ffe6b8e24b8 R14: 0000000000000001 R15: 0000000000000001 </TASK> The fuzz image has a truncate log inode whose tl_count is bigger than ocfs2_truncate_recs_per_inode() so it triggers the BUG in ocfs2_truncate_log_append(). As what the check in ocfs2_truncate_log_append() does, just do same check into ocfs2_get_truncate_log_info when truncate log inode is reading in so we can bail out earlier. Link: https://lkml.kernel.org/r/20250108024119.60313-1-glass.su@suse.com Signed-off-by: Su Yue <glass.su@suse.com> Reported-by: Liebes Wang <wanghaichi0403@gmail.com> Link: https://lore.kernel.org/ocfs2-devel/CADCV8souQhdP0RdQF1U7KTWtuHDfpn+3LnTt-EEuMmB-pMRrgQ@mail.gmail.com/T/#u Reported-by: syzbot+a66542ca5ebb4233b563@syzkaller.appspotmail.com Tested-by: syzbot+a66542ca5ebb4233b563@syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24mailmap: update entry for Linus LüssingLinus Lüssing
Mapping another old, obsolete work email address to my primary one. Link: https://lkml.kernel.org/r/20250108035840.25194-1-linus.luessing@c0d3.blue Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24ocfs2: correct l_next_free_rec in online checkJulian Sun
Correct the value of l_next_free_rec to l_count during the online check, as done in the check_el() function in ocfs2_tools. Link: https://lkml.kernel.org/r/20250106023432.1320904-2-sunjunchao2870@gmail.com Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>