Age | Commit message (Collapse) | Author |
|
Let's factor it out into include/linux/crash_dump.h, from where we can
use it also outside of vmcore.c later.
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-8-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Let's factor it out into include/linux/crash_dump.h, from where we can
use it also outside of vmcore.c later.
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-7-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
These vmcore defines are not related to /proc/kcore, move them out.
We'll move "struct vmcoredd_node" to vmcore.c, because it is only used
internally. While "struct vmcore" is only used internally for now,
we're planning on using it from inline functions in crash_dump.h next,
so move it to crash_dump.h.
While at it, rename "struct vmcore" to "struct vmcore_range", which is a
more suitable name and will make the usage of it outside of vmcore.c
clearer.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-6-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Let's use "vmcore: " as a prefix, converting the single "Kdump:
vmcore not initialized" one to effectively be "vmcore: not initialized".
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-5-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
The vmcoredd_update_size() call and its effects (size/offset changes) are
currently completely unsynchronized, and will cause trouble when
performed concurrently, or when done while someone is already reading the
vmcore.
Let's protect all vmcore modifications by the vmcore_mutex, disallow vmcore
modifications while the vmcore is open, and warn on vmcore
modifications after the vmcore was already opened once: modifications
while the vmcore is open are unsafe, and modifications after the vmcore
was opened indicates trouble. Properly synchronize against concurrent
opening of the vmcore.
No need to grab the mutex during mmap()/read(): after we opened the
vmcore, modifications are impossible.
It's worth noting that modifications after the vmcore was opened are
completely unexpected, so failing if open, and warning if already opened
(+closed again) is good enough.
This change not only handles concurrent adding of device dumps +
concurrent reading of the vmcore properly, it also prepares for other
mechanisms that will modify the vmcore.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-4-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Now that we have a mutex that synchronizes against opening of the vmcore,
let's use that one to replace vmcoredd_mutex: there is no need to have
two separate ones.
This is a preparation for properly preventing vmcore modifications
after the vmcore was opened.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-3-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
We want to protect vmcore modifications from concurrent opening of
the vmcore, and also serialize vmcore modification.
(a) We can currently modify the vmcore after it was opened. This can happen
if a vmcoredd is added after the vmcore module was initialized and
already opened by user space. We want to fix that and prepare for
new code wanting to serialize against concurrent opening.
(b) To handle it cleanly we need to protect the modifications against
concurrent opening. As the modifications end up allocating memory and
can sleep, we cannot rely on the spinlock.
Let's convert the spinlock into a mutex to prepare for further changes.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-2-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
In xfs_inactive(), xfs_reflink_cancel_cow_range() is called
without error handling, risking unnoticed failures and
inconsistent behavior compared to other parts of the code.
Fix this issue by adding an error handling for the
xfs_reflink_cancel_cow_range(), improving code robustness.
Fixes: 6231848c3aa5 ("xfs: check for cow blocks before trying to clear them")
Cc: stable@vger.kernel.org # v4.17
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfs_dax_write_iomap_end
In xfs_dax_write_iomap_end(), directly return the result of
xfs_reflink_cancel_cow_range() when !written, ensuring proper
error propagation and improving code robustness.
Fixes: ea6c49b784f0 ("xfs: support CoW in fsdax mode")
Cc: stable@vger.kernel.org # v6.0
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
STATUS_NOT_A_REPARSE_POINT indicates that object does not have reparse point
buffer attached, for example returned by FSCTL_GET_REPARSE_POINT.
Currently STATUS_NOT_A_REPARSE_POINT is translated to -EIO. Change it to
-ENODATA which better describe the situation when no reparse point is set.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We're currently debugging issues with rebalance, where it's not making
progress as quickly as it should be (or sometimes not at all).
Add the full data_update to the move_extent_finish tracepoint, so we can
check that the replicas we wrote match what we were supposed to do.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
System performance is particularly sensitive to journal write latency,
the number of outstanding journal writes is bounded and we can't issue
journal flushes until other journal writes have completed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
- "Allocate and free frozen pages" from Matthew Wilcox reorganizes
the page allocator so we end up with the ability to allocate and
free zero-refcount pages. So that callers (ie, slab) can avoid a
refcount inc & dec
- "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
use large folios other than PMD-sized ones
- "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
and fixes for this small built-in kernel selftest
- "mas_anode_descend() related cleanup" from Wei Yang tidies up part
of the mapletree code
- "mm: fix format issues and param types" from Keren Sun implements a
few minor code cleanups
- "simplify split calculation" from Wei Yang provides a few fixes and
a test for the mapletree code
- "mm/vma: make more mmap logic userland testable" from Lorenzo
Stoakes continues the work of moving vma-related code into the
(relatively) new mm/vma.c
- "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
Hildenbrand cleans up and rationalizes handling of gfp flags in the
page allocator
- "readahead: Reintroduce fix for improper RA window sizing" from Jan
Kara is a second attempt at fixing a readahead window sizing issue.
It should reduce the amount of unnecessary reading
- "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
addresses an issue where "huge" amounts of pte pagetables are
accumulated:
https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
Qi's series addresses this windup by synchronously freeing PTE
memory within the context of madvise(MADV_DONTNEED)
- "selftest/mm: Remove warnings found by adding compiler flags" from
Muhammad Usama Anjum fixes some build warnings in the selftests
code when optional compiler warnings are enabled
- "mm: don't use __GFP_HARDWALL when migrating remote pages" from
David Hildenbrand tightens the allocator's observance of
__GFP_HARDWALL
- "pkeys kselftests improvements" from Kevin Brodsky implements
various fixes and cleanups in the MM selftests code, mainly
pertaining to the pkeys tests
- "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
estimate application working set size
- "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
provides some cleanups to memcg's hugetlb charging logic
- "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
removes the global swap cgroup lock. A speedup of 10% for a
tmpfs-based kernel build was demonstrated
- "zram: split page type read/write handling" from Sergey Senozhatsky
has several fixes and cleaups for zram in the area of
zram_write_page(). A watchdog softlockup warning was eliminated
- "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
Brodsky cleans up the pagetable destructor implementations. A rare
use-after-free race is fixed
- "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
simplifies and cleans up the debugging code in the VMA merging
logic
- "Account page tables at all levels" from Kevin Brodsky cleans up
and regularizes the pagetable ctor/dtor handling. This results in
improvements in accounting accuracy
- "mm/damon: replace most damon_callback usages in sysfs with new
core functions" from SeongJae Park cleans up and generalizes
DAMON's sysfs file interface logic
- "mm/damon: enable page level properties based monitoring" from
SeongJae Park increases the amount of information which is
presented in response to DAMOS actions
- "mm/damon: remove DAMON debugfs interface" from SeongJae Park
removes DAMON's long-deprecated debugfs interfaces. Thus the
migration to sysfs is completed
- "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
Peter Xu cleans up and generalizes the hugetlb reservation
accounting
- "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
removes a never-used feature of the alloc_pages_bulk() interface
- "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
extends DAMOS filters to support not only exclusion (rejecting),
but also inclusion (allowing) behavior
- "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
introduces a new memory descriptor for zswap.zpool that currently
overlaps with struct page for now. This is part of the effort to
reduce the size of struct page and to enable dynamic allocation of
memory descriptors
- "mm, swap: rework of swap allocator locks" from Kairui Song redoes
and simplifies the swap allocator locking. A speedup of 400% was
demonstrated for one workload. As was a 35% reduction for kernel
build time with swap-on-zram
- "mm: update mips to use do_mmap(), make mmap_region() internal"
from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
mmap_region() can be made MM-internal
- "mm/mglru: performance optimizations" from Yu Zhao fixes a few
MGLRU regressions and otherwise improves MGLRU performance
- "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
Park updates DAMON documentation
- "Cleanup for memfd_create()" from Isaac Manjarres does that thing
- "mm: hugetlb+THP folio and migration cleanups" from David
Hildenbrand provides various cleanups in the areas of hugetlb
folios, THP folios and migration
- "Uncached buffered IO" from Jens Axboe implements the new
RWF_DONTCACHE flag which provides synchronous dropbehind for
pagecache reading and writing. To permite userspace to address
issues with massive buildup of useless pagecache when
reading/writing fast devices
- "selftests/mm: virtual_address_range: Reduce memory" from Thomas
Weißschuh fixes and optimizes some of the MM selftests"
* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm/compaction: fix UBSAN shift-out-of-bounds warning
s390/mm: add missing ctor/dtor on page table upgrade
kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
tools: add VM_WARN_ON_VMG definition
mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
seqlock: add missing parameter documentation for raw_seqcount_try_begin()
mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
mm/page_alloc: remove the incorrect and misleading comment
zram: remove zcomp_stream_put() from write_incompressible_page()
mm: separate move/undo parts from migrate_pages_batch()
mm/kfence: use str_write_read() helper in get_access_type()
selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
selftests/mm: vm_util: split up /proc/self/smaps parsing
selftests/mm: virtual_address_range: unmap chunks after validation
selftests/mm: virtual_address_range: mmap() without PROT_WRITE
selftests/memfd/memfd_test: fix possible NULL pointer dereference
mm: add FGP_DONTCACHE folio creation flag
mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Mainly individually changelogged singleton patches. The patch series
in this pull are:
- "lib min_heap: Improve min_heap safety, testing, and documentation"
from Kuan-Wei Chiu provides various tightenings to the min_heap
library code
- "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms
some cleanup and Rust preparation in the xarray library code
- "Update reference to include/asm-<arch>" from Geert Uytterhoeven
fixes pathnames in some code comments
- "Converge on using secs_to_jiffies()" from Easwar Hariharan uses
the new secs_to_jiffies() in various places where that is
appropriate
- "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
switches two filesystems to the new mount API
- "Convert ocfs2 to use folios" from Matthew Wilcox does that
- "Remove get_task_comm() and print task comm directly" from Yafang
Shao removes now-unneeded calls to get_task_comm() in various
places
- "squashfs: reduce memory usage and update docs" from Phillip
Lougher implements some memory savings in squashfs and performs
some maintainability work
- "lib: clarify comparison function requirements" from Kuan-Wei Chiu
tightens the sort code's behaviour and adds some maintenance work
- "nilfs2: protect busy buffer heads from being force-cleared" from
Ryusuke Konishi fixes an issues in nlifs when the fs is presented
with a corrupted image
- "nilfs2: fix kernel-doc comments for function return values" from
Ryusuke Konishi fixes some nilfs kerneldoc
- "nilfs2: fix issues with rename operations" from Ryusuke Konishi
addresses some nilfs BUG_ONs which syzbot was able to trigger
- "minmax.h: Cleanups and minor optimisations" from David Laight does
some maintenance work on the min/max library code
- "Fixes and cleanups to xarray" from Kemeng Shi does maintenance
work on the xarray library code"
* tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits)
ocfs2: use str_yes_no() and str_no_yes() helper functions
include/linux/lz4.h: add some missing macros
Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent
Xarray: remove repeat check in xas_squash_marks()
Xarray: distinguish large entries correctly in xas_split_alloc()
Xarray: move forward index correctly in xas_pause()
Xarray: do not return sibling entries from xas_find_marked()
ipc/util.c: complete the kernel-doc function descriptions
gcov: clang: use correct function param names
latencytop: use correct kernel-doc format for func params
minmax.h: remove some #defines that are only expanded once
minmax.h: simplify the variants of clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: update some comments
minmax.h: add whitespace around operators and after commas
nilfs2: do not update mtime of renamed directory that is not moved
nilfs2: handle errors that nilfs_prepare_chunk() may return
CREDITS: fix spelling mistake
...
|
|
In /proc/<pid>/ksm_stat, add two extra ksm involvement items including
KSM_mergeable and KSM_merge_any. It helps administrators to better know
the system's KSM behavior at process level.
ksm_merge_any: yes/no
whether the process'mm is added by prctl() into the candidate list
of KSM or not, and fully enabled at process level.
ksm_mergeable: yes/no
whether any VMAs of the process'mm are currently applicable to KSM.
Purpose
=======
These two items are just to improve the observability of KSM at process
level, so that users can know if a certain process has enabled KSM.
For example, if without these two items, when we look at
/proc/<pid>/ksm_stat and there's no merging pages found, We are not sure
whether it is because KSM was not enabled or because KSM did not
successfully merge any pages.
Although "mg" in /proc/<pid>/smaps indicate VM_MERGEABLE, it's opaque
and not very obvious for non professionals.
[akpm@linux-foundation.org: wording tweaks, per David and akpm]
Link: https://lkml.kernel.org/r/20250110174034304QOb8eDoqtFkp3_t8mqnqc@zte.com.cn
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Cc: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove highest_bit and lowest_bit. After the HDD allocation path has been
removed, the only purpose of these two fields is to determine whether the
device is full or not, which can instead be determined by checking the
inuse_pages.
Link: https://lkml.kernel.org/r/20250113175732.48099-6-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chis Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The previous commit removed the page_list argument from
alloc_pages_bulk_noprof() along with the alloc_pages_bulk_list() function.
Now that only the *_array() flavour of the API remains, we can do the
following renaming (along with the _noprof() ones):
alloc_pages_bulk_array -> alloc_pages_bulk
alloc_pages_bulk_array_mempolicy -> alloc_pages_bulk_mempolicy
alloc_pages_bulk_array_node -> alloc_pages_bulk_node
Link: https://lkml.kernel.org/r/275a3bbc0be20fbe9002297d60045e67ab3d4ada.1734991165.git.luizcap@redhat.com
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The old name "avoid_reserve" can be too generic and can be used wrongly in
the new call sites that want to allocate a hugetlb folio.
It's confusing on two things: (1) whether one can opt-in to avoid global
reservation, and (2) whether it should take more than one count.
In reality, this flag is only used in an extremely hacky path, in an
extremely hacky way in hugetlb CoW path only, and always use with 1 saying
"skip global reservation". Rename the flag to avoid future abuse of this
flag, making it a boolean so as to reflect its true representation that
it's not a counter. To make it even harder to abuse, add a comment above
the function to explain it.
Link: https://lkml.kernel.org/r/20250107204002.2683356-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Ackerley Tng <ackerleytng@google.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As of now during link list corruption it prints about cluprit address and
its wrong value, but sometime it is not enough to catch the actual issue
point.
If it prints allocation and free path of that corrupted node, it will be a
lot easier to find and fix the issues.
Adding the same information when data mismatch is found in link list
debug data:
[ 14.243055] slab kmalloc-32 start ffff0000cda19320 data offset 32 pointer offset 8 size 32 allocated at add_to_list+0x28/0xb0
[ 14.245259] __kmalloc_cache_noprof+0x1c4/0x358
[ 14.245572] add_to_list+0x28/0xb0
...
[ 14.248632] do_el0_svc_compat+0x1c/0x34
[ 14.249018] el0_svc_compat+0x2c/0x80
[ 14.249244] Free path:
[ 14.249410] kfree+0x24c/0x2f0
[ 14.249724] do_force_corruption+0xbc/0x100
...
[ 14.252266] el0_svc_common.constprop.0+0x40/0xe0
[ 14.252540] do_el0_svc_compat+0x1c/0x34
[ 14.252763] el0_svc_compat+0x2c/0x80
[ 14.253071] ------------[ cut here ]------------
[ 14.253303] list_del corruption. next->prev should be ffff0000cda192a8, but was 6b6b6b6b6b6b6b6b. (next=ffff0000cda19348)
[ 14.254255] WARNING: CPU: 3 PID: 84 at lib/list_debug.c:65 __list_del_entry_valid_or_report+0x158/0x164
Moved prototype of mem_dump_obj() to bug.h, as mm.h can not be included in
bug.h.
Link: https://lkml.kernel.org/r/20241230101043.53773-1-maninder1.s@samsung.com
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Rohit Thapliyal <r.thapliyal@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"Still no new features for this cycle, as some ongoing improvements
remain premature for now.
This includes a micro-optimization for the superblock checksum, along
with minor bugfixes and code cleanups, as usual:
- Micro-optimize superblock checksum
- Avoid overly large bvecs[] for file-backed mounts
- Some leftover folio conversion in z_erofs_bind_cache()
- Minor bugfixes and cleanups"
* tag 'erofs-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: refine z_erofs_get_extent_compressedlen()
erofs: remove dead code in erofs_fc_parse_param
erofs: return SHRINK_EMPTY if no objects to free
erofs: convert z_erofs_bind_cache() to folios
erofs: tidy up zdata.c
erofs: get rid of `z_erofs_next_pcluster_t`
erofs: simplify z_erofs_load_compact_lcluster()
erofs: fix potential return value overflow of z_erofs_shrink_scan()
erofs: shorten bvecs[] for file-backed mounts
erofs: micro-optimize superblock checksum
fs: erofs: xattr.c change kzalloc to kcalloc
|
|
Running the preempt tiering tests with a lower than normal journal
reclaim delay turned up a shutdown hang - a lost wakeup, caused because
flushing a journal pin (e.g. key cache/write buffer) can generate a new
journal pin.
The "simple" fix of adding the correct wakeup didn't work because of
ordering issues; if we flush btree node pins too aggressively before
other pins have completed, we end up spinning where each flush iteration
generates new work.
So to fix this correctly:
- The list of flushed journal pins is now broken out by type, so that
we can wait for key cache/write buffer pin flushing to complete
before flushing dirty btree nodes
- A new closure_waitlist is added for bch2_journal_flush_pins; this one
is only used under or when we're taking the journal lock, so it's
pretty cheap to add rigorously correct wakeups to journal_pin_set()
and journal_pin_drop().
Additionally, bch2_journal_seq_pins_to_text() is moved to
journal_reclaim.c, where it belongs, along with a bit of other small
renaming and refactoring.
Besides fixing the hang, the better ordering between key cache/write
buffer flushing and btree node flushing should help or fix the "unmount
taking excessively long" a few users have been noticing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix a regression from when these were switched to normal opts.h options.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Ratelimit them, and use the new bch2_write_op_error() helper that prints
path and file offset.
Reported-by: https://github.com/koverstreet/bcachefs/issues/819
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Remove hard-coded strings by using the str_yes_no() and str_no_yes()
helper functions.
Link: https://lkml.kernel.org/r/20250117091335.1189-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, LZ4_DISTANCE_MAX and LZ4_DECOMPRESS_INPLACE_MARGIN are
defined in the erofs subsystem for LZ4 in-place decompression, which is
somewhat unsuitable since they should belong to the LZ4 itself and
may change with future LZ4 codebase updates.
Move them to include/linux/lz4.h to match the upstream LZ4 library [1].
No logic changes.
[1] https://github.com/lz4/lz4/blob/v1.10.0/lib/lz4.h#L670
Link: https://lkml.kernel.org/r/20250114130454.1191150-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Yue Hu <zbestahu@gmail.com>
Cc; Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Sandeep Dhavale <dhavale@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A minor issue with nilfs_rename, originating from an old ext2
implementation, is that the mtime is updated even if the rename target is
a directory and it is renamed within the same directory, rather than moved
to a different directory.
In this case, the child directory being renamed does not change in any
way, so changing its mtime is unnecessary according to the specification,
and can unnecessarily confuse backup tools.
In ext2, this issue was fixed by commit 39fe7557b4d6 ("ext2: Do not update
mtime of a moved directory") and a few subsequent fixes, but it remained
in nilfs2.
Fix this issue by not calling nilfs_set_link(), which rewrites the inode
number of the directory entry that refers to the parent directory, when
the move target is a directory and the source and destination are the same
directory.
Here, the directory to be moved only needs to be read if the inode number
of the parent directory is rewritten with nilfs_set_link, so also adjust
the execution conditions of the preparation work to avoid unnecessary
directory reads.
Link: https://lkml.kernel.org/r/20250111143518.7901-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: fix issues with rename operations".
This series fixes BUG_ON check failures reported by syzbot around rename
operations, and a minor behavioral issue where the mtime of a child
directory changes when it is renamed instead of moved.
This patch (of 2):
The directory manipulation routines nilfs_set_link() and
nilfs_delete_entry() rewrite the directory entry in the folio/page
previously read by nilfs_find_entry(), so error handling is omitted on the
assumption that nilfs_prepare_chunk(), which prepares the buffer for
rewriting, will always succeed for these. And if an error is returned, it
triggers the legacy BUG_ON() checks in each routine.
This assumption is wrong, as proven by syzbot: the buffer layer called by
nilfs_prepare_chunk() may call nilfs_get_block() if necessary, which may
fail due to metadata corruption or other reasons. This has been there all
along, but improved sanity checks and error handling may have made it more
reproducible in fuzzing tests.
Fix this issue by adding missing error paths in nilfs_set_link(),
nilfs_delete_entry(), and their caller nilfs_rename().
Link: https://lkml.kernel.org/r/20250111143518.7901-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250111143518.7901-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+32c3706ebf5d95046ea1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=32c3706ebf5d95046ea1
Reported-by: syzbot+1097e95f134f37d9395c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1097e95f134f37d9395c
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Also for comments that do not cause kernel-doc warnings (those that list
multiple error codes), revise the return value description style to match
Brian G.'s suggestion of "..., or one of the following negative error
codes on failure:".
Link: https://lkml.kernel.org/r/CAAq45aNh1qV8P6XgDhKeNstT=PvcPUaCXsAF-f9rvmzznsZL5A@mail.gmail.com
Link: https://lkml.kernel.org/r/20250110010530.21872-8-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There are a number of kernel-doc comments for functions that are missing
return values, which also causes a number of warnings when the kernel-doc
script is run with the "-Wall" option.
Fix this issue by adding proper return value descriptions, and improve
code maintainability.
Link: https://lkml.kernel.org/r/20250110010530.21872-7-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions of functions for the
rest.
Link: https://lkml.kernel.org/r/20250110010530.21872-6-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions for metadata file
functions other than sufile.
Link: https://lkml.kernel.org/r/20250110010530.21872-5-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions of functions for
sufile-related functions, eliminating a dozen warnings emitted by the
kernel-doc script.
Link: https://lkml.kernel.org/r/20250110010530.21872-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous patch to fix the ioctl return value descriptions,
this fixes the format of the return value descriptions for bmap (and
btree)-related functions, which was causing the kernel-doc script to emit
a number of warnings.
Link: https://lkml.kernel.org/r/20250110010530.21872-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: fix kernel-doc comments for function return values",
v2.
This series fixes the inadequacies in the return value descriptions in
nilfs2's kernel-doc comments (mainly incorrect formatting), as well as the
lack of return value descriptions themselves, and fixes most of the
remaining warnings that are output when the kernel-doc script is run with
the "-Wall" option.
This patch (of 7):
In the kernel-doc comments for functions, there are many cases where the
format of the return value description is inaccurate, such as "Return
Value: ...", which causes many warnings to be output when the kernel-doc
script is executed with the "-Wall" option.
This fixes such incorrectly formatted return value descriptions for ioctl
functions.
Link: https://lkml.kernel.org/r/20250110010530.21872-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250110010530.21872-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
nilfs_lookup_dirty_data_buffers(), which iterates through the buffers
attached to dirty data folios/pages, accesses the attached buffers without
locking the folios/pages.
For data cache, nilfs_clear_folio_dirty() may be called asynchronously
when the file system degenerates to read only, so
nilfs_lookup_dirty_data_buffers() still has the potential to cause use
after free issues when buffers lose the protection of their dirty state
midway due to this asynchronous clearing and are unintentionally freed by
try_to_free_buffers().
Eliminate this race issue by adjusting the lock section in this function.
Link: https://lkml.kernel.org/r/20250107200202.6432-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: protect busy buffer heads from being force-cleared".
This series fixes the buffer head state inconsistency issues reported by
syzbot that occurs when the filesystem is corrupted and falls back to
read-only, and the associated buffer head use-after-free issue.
This patch (of 2):
Syzbot has reported that after nilfs2 detects filesystem corruption and
falls back to read-only, inconsistencies in the buffer state may occur.
One of the inconsistencies is that when nilfs2 calls mark_buffer_dirty()
to set a data or metadata buffer as dirty, but it detects that the buffer
is not in the uptodate state:
WARNING: CPU: 0 PID: 6049 at fs/buffer.c:1177 mark_buffer_dirty+0x2e5/0x520
fs/buffer.c:1177
...
Call Trace:
<TASK>
nilfs_palloc_commit_alloc_entry+0x4b/0x160 fs/nilfs2/alloc.c:598
nilfs_ifile_create_inode+0x1dd/0x3a0 fs/nilfs2/ifile.c:73
nilfs_new_inode+0x254/0x830 fs/nilfs2/inode.c:344
nilfs_mkdir+0x10d/0x340 fs/nilfs2/namei.c:218
vfs_mkdir+0x2f9/0x4f0 fs/namei.c:4257
do_mkdirat+0x264/0x3a0 fs/namei.c:4280
__do_sys_mkdirat fs/namei.c:4295 [inline]
__se_sys_mkdirat fs/namei.c:4293 [inline]
__x64_sys_mkdirat+0x87/0xa0 fs/namei.c:4293
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The other is when nilfs_btree_propagate(), which propagates the dirty
state to the ancestor nodes of a b-tree that point to a dirty buffer,
detects that the origin buffer is not dirty, even though it should be:
WARNING: CPU: 0 PID: 5245 at fs/nilfs2/btree.c:2089
nilfs_btree_propagate+0xc79/0xdf0 fs/nilfs2/btree.c:2089
...
Call Trace:
<TASK>
nilfs_bmap_propagate+0x75/0x120 fs/nilfs2/bmap.c:345
nilfs_collect_file_data+0x4d/0xd0 fs/nilfs2/segment.c:587
nilfs_segctor_apply_buffers+0x184/0x340 fs/nilfs2/segment.c:1006
nilfs_segctor_scan_file+0x28c/0xa50 fs/nilfs2/segment.c:1045
nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1216 [inline]
nilfs_segctor_collect fs/nilfs2/segment.c:1540 [inline]
nilfs_segctor_do_construct+0x1c28/0x6b90 fs/nilfs2/segment.c:2115
nilfs_segctor_construct+0x181/0x6b0 fs/nilfs2/segment.c:2479
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2587 [inline]
nilfs_segctor_thread+0x69e/0xe80 fs/nilfs2/segment.c:2701
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Both of these issues are caused by the callbacks that handle the
page/folio write requests, forcibly clear various states, including the
working state of the buffers they hold, at unexpected times when they
detect read-only fallback.
Fix these issues by checking if the buffer is referenced before clearing
the page/folio state, and skipping the clear if it is.
Link: https://lkml.kernel.org/r/20250107200202.6432-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250107200202.6432-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b2b14916b77acf8626d7
Reported-by: syzbot+d98fd19acd08b36ff422@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=d98fd19acd08b36ff422
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Tested-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The parameter is not used in __ocfs2_mknod_locked(). So remove it.
No functional change.
Link: https://lkml.kernel.org/r/20250106140634.92241-1-glass.su@suse.com
Signed-off-by: Su Yue <glass.su@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
While running fstests generic/329, the kernel workqueue
quota_release_workfn is dead looping in calling ocfs2_release_dquot().
The ocfs2 state is already readonly but ocfs2_release_dquot wants to
start a transaction but fails and returns.
=====================================================================
[ 2918.123602 ][ T275 ] On-disk corruption discovered. Please run
fsck.ocfs2 once the filesystem is unmounted.
[ 2918.124034 ][ T275 ] (kworker/u135:1,275,11):ocfs2_release_dquot:765
ERROR: status = -30
[ 2918.124452 ][ T275 ] (kworker/u135:1,275,11):ocfs2_release_dquot:795
ERROR: status = -30
[ 2918.124883 ][ T275 ] (kworker/u135:1,275,11):ocfs2_start_trans:357
ERROR: status = -30
[ 2918.125276 ][ T275 ] OCFS2: abort (device dm-0): ocfs2_start_trans:
Detected aborted journal
[ 2918.125710 ][ T275 ] On-disk corruption discovered. Please run
fsck.ocfs2 once the filesystem is unmounted.
=====================================================================
ocfs2_release_dquot() is much like dquot_release(), which is called by
ext4 to handle similar situation. So here fix it by marking the dquot as
inactive like what dquot_release() does.
Link: https://lkml.kernel.org/r/20250106140653.92292-1-glass.su@suse.com
Fixes: 9e33d69f553a ("ocfs2: Implementation of local and global quota file handling")
Signed-off-by: Su Yue <glass.su@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
syz reported:
(syz-executor404,5313,0):ocfs2_truncate_log_append:5874 ERROR: bug
expression: tl_count > ocfs2_truncate_recs_per_inode(osb->sb) ||
tl_count == 0
(syz-executor404,5313,0):ocfs2_truncate_log_append:5874 ERROR: Truncate
record count on #77 invalid wanted 39, actual 2087
------------[ cut here ]------------
kernel BUG at fs/ocfs2/alloc.c:5874!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5313 Comm: syz-executor404 Not tainted
6.12.0-rc5-syzkaller-00299-g11066801dd4b #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:ocfs2_truncate_log_append+0x9a8/0x9c0 fs/ocfs2/alloc.c:5868
RSP: 0018:ffffc9000cf16f40 EFLAGS: 00010292
RAX: b4b54f1d10640800 RBX: 0000000000000027 RCX: b4b54f1d10640800
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffffc9000cf17070 R08: ffffffff8174a14c R09: 1ffff11003f8519a
R10: dffffc0000000000 R11: ffffed1003f8519b R12: 1ffff110085f5f58
R13: ffffff3800000000 R14: 000000000000004d R15: ffff8880438f0008
FS: 00005555722df380(0000) GS:ffff88801fc00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002000f000 CR3: 000000004010e000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ocfs2_remove_btree_range+0x1303/0x1860 fs/ocfs2/alloc.c:5789
ocfs2_remove_inode_range+0xff3/0x29f0 fs/ocfs2/file.c:1907
ocfs2_reflink_remap_extent fs/ocfs2/refcounttree.c:4537 [inline]
ocfs2_reflink_remap_blocks+0xcd4/0x1f30 fs/ocfs2/refcounttree.c:4684
ocfs2_remap_file_range+0x5fa/0x8d0 fs/ocfs2/file.c:2736
vfs_copy_file_range+0xc07/0x1510 fs/read_write.c:1615
__do_sys_copy_file_range fs/read_write.c:1705 [inline]
__se_sys_copy_file_range+0x3f2/0x5d0 fs/read_write.c:1668
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd327167af9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 61 17 00 00 90 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe6b8e22e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000146
RAX: ffffffffffffffda RBX: 00007fd3271b005e RCX: 00007fd327167af9
RDX: 0000000000000006 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007fd3271de610 R08: 000000000000d8c2 R09: 0000000000000000
R10: 0000000020000640 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffe6b8e24b8 R14: 0000000000000001 R15: 0000000000000001
</TASK>
The fuzz image has a truncate log inode whose tl_count is bigger than
ocfs2_truncate_recs_per_inode() so it triggers the BUG in
ocfs2_truncate_log_append().
As what the check in ocfs2_truncate_log_append() does, just do same check
into ocfs2_get_truncate_log_info when truncate log inode is reading in so
we can bail out earlier.
Link: https://lkml.kernel.org/r/20250108024119.60313-1-glass.su@suse.com
Signed-off-by: Su Yue <glass.su@suse.com>
Reported-by: Liebes Wang <wanghaichi0403@gmail.com>
Link: https://lore.kernel.org/ocfs2-devel/CADCV8souQhdP0RdQF1U7KTWtuHDfpn+3LnTt-EEuMmB-pMRrgQ@mail.gmail.com/T/#u
Reported-by: syzbot+a66542ca5ebb4233b563@syzkaller.appspotmail.com
Tested-by: syzbot+a66542ca5ebb4233b563@syzkaller.appspotmail.com
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Correct the value of l_next_free_rec to l_count during the online check,
as done in the check_el() function in ocfs2_tools.
Link: https://lkml.kernel.org/r/20250106023432.1320904-2-sunjunchao2870@gmail.com
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Recently syzbot reported a use-after-free issue[1].
The root cause of the problem is that the journal inode recorded in this
file system image is corrupted. The value of
"di->id2.i_list.l_next_free_rec" is 8193, which is greater than the value
of "di->id2.i_list.l_count" (19).
To solve this problem, an additional check should be added within
ocfs2_get_clusters_nocache(). If the check fails, an error will be
returned and the file system will be set to read-only.
[1]: https://lore.kernel.org/all/67577778.050a0220.a30f1.01bc.GAE@google.com/T/
Link: https://lkml.kernel.org/r/20250106023432.1320904-1-sunjunchao2870@gmail.com
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Reported-by: syzbot+2313dda4dc4885c93578@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2313dda4dc4885c93578
Tested-by: syzbot+2313dda4dc4885c93578@syzkaller.appspotmail.com
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
squashfs_fill_page is only used in this file, so make it static.
Use kmap_local instead of kmap_atomic, and return a bool so that
the caller can use folio_end_read() which saves an atomic operation
over calling folio_mark_uptodate() followed by folio_unlock().
[willy@infradead.org: fix polarity of "uptodate" Thanks to Ryan for testing]
Link: https://lkml.kernel.org/r/20250110163300.3346321-2-willy@infradead.org
Link: https://lkml.kernel.org/r/20241220224634.723899-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove accesses to page->index and page->mapping. Also use folio
APIs where available. This code still assumes order 0 folios.
[dan.carpenter@linaro.org: fix a NULL vs IS_ERR() bug]
Link: https://lkml.kernel.org/r/7b7f44d6-9153-4d7c-b65b-2d78febe6c7a@stanley.mountain
Link: https://lkml.kernel.org/r/20241220224634.723899-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove a few accesses to page->mapping.
Link: https://lkml.kernel.org/r/20241220224634.723899-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove an access to page->mapping.
Link: https://lkml.kernel.org/r/20241220224634.723899-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use modern folio APIs where they exist and convert back to struct
page for the internal functions.
Link: https://lkml.kernel.org/r/20241220224634.723899-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Update the compression algorithms supported, and the Squashfs website
location.
Link: https://lkml.kernel.org/r/20241229233752.54481-5-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If Squashfs has been configured to directly read datablocks into the page
cache (SQUASHFS_FILE_DIRECT), then the read_page cache is unnecessary.
This improvement is due to the following two commits, which added the
ability to read datablocks into the page cache when pages were missing,
enabling the fallback which used an intermediate buffer to be removed.
commit f268eedddf359 ("squashfs: extend "page actor" to handle missing pages")
commit 1bb1a07afad97 ("squashfs: don't use intermediate buffer if pages missing")
This reduces the amount of memory used when mounting a filesystem by
block_size * maximum number of threads.
Link: https://lkml.kernel.org/r/20241229233752.54481-3-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|