Age | Commit message (Collapse) | Author |
|
With maple_tree supporting vma tree traversal under RCU and per-vma locks,
/proc/pid/maps can be read while holding individual vma locks instead of
locking the entire address space.
A completely lockless approach (walking vma tree under RCU) would be quite
complex with the main issue being get_vma_name() using callbacks which
might not work correctly with a stable vma copy, requiring original
(unstable) vma - see special_mapping_name() for example.
When per-vma lock acquisition fails, we take the mmap_lock for reading,
lock the vma, release the mmap_lock and continue. This fallback to mmap
read lock guarantees the reader to make forward progress even during lock
contention. This will interfere with the writer but for a very short time
while we are acquiring the per-vma lock and only when there was contention
on the vma reader is interested in.
We shouldn't see a repeated fallback to mmap read locks in practice, as
this require a very unlikely series of lock contentions (for instance due
to repeated vma split operations). However even if this did somehow
happen, we would still progress.
One case requiring special handling is when a vma changes between the time
it was found and the time it got locked. A problematic case would be if a
vma got shrunk so that its vm_start moved higher in the address space and
a new vma was installed at the beginning:
reader found: |--------VMA A--------|
VMA is modified: |-VMA B-|----VMA A----|
reader locks modified VMA A
reader reports VMA A: | gap |----VMA A----|
This would result in reporting a gap in the address space that does not
exist. To prevent this we retry the lookup after locking the vma, however
we do that only when we identify a gap and detect that the address space
was changed after we found the vma.
This change is designed to reduce mmap_lock contention and prevent a
process reading /proc/pid/maps files (often a low priority task, such as
monitoring/data collection services) from blocking address space updates.
Note that this change has a userspace visible disadvantage: it allows for
sub-page data tearing as opposed to the previous mechanism where data
tearing could happen only between pages of generated output data. Since
current userspace considers data tearing between pages to be acceptable,
we assume is will be able to handle sub-page data tearing as well.
Link: https://lkml.kernel.org/r/20250719182854.3166724-7-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Ye Bin <yebin10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Back in 2.6 era, last_addr used to be stored in seq_file->version
variable, which was unsigned long. As a result, sentinels to represent
gate vma and end of all vmas used unsigned values. In more recent kernels
we don't used seq_file->version anymore and therefore conversion from
loff_t into unsigned type is not needed. Similarly, sentinel values don't
need to be unsigned. Remove type conversion for set_file position and
change sentinel values to signed. While at it, change the hardcoded
sentinel values with named definitions for better documentation.
Link: https://lkml.kernel.org/r/20250719182854.3166724-6-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Jann Horn <jannh@google.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Ye Bin <yebin10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A race condition is possible in stable_page_flags() where user-space is
reading /proc/kpageflags concurrently to a folio split. This may lead to
oopses or BUG_ON()s being triggered.
To fix this, this commit uses snapshot_page() in stable_page_flags() so
that stable_page_flags() works with a stable page and folio snapshots
instead.
Note that stable_page_flags() makes use of some functions that require the
original page or folio pointer to work properly (eg. is_free_budy_page()
and folio_test_idle()). Since those functions can't be used on the page
snapshot, we replace their usage with flags that were set by
snapshot_page() for this purpose.
Link: https://lkml.kernel.org/r/52c16c0f00995a812a55980c2f26848a999a34ab.1752499009.git.luizcap@redhat.com
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, the call to folio_precise_page_mapcount() from kpage_read() can
race with a folio split. When the race happens we trigger a
VM_BUG_ON_FOLIO() in folio_entire_mapcount() (see splat below).
This commit fixes this race by using snapshot_page() so that we retrieve
the folio mapcount using a folio snapshot.
[ 2356.558576] page: refcount:1 mapcount:1 mapping:0000000000000000 index:0xffff85200 pfn:0x6f7c00
[ 2356.558748] memcg:ffff000651775780
[ 2356.558763] anon flags: 0xafffff60020838(uptodate|dirty|lru|owner_2|swapbacked|node=1|zone=2|lastcpupid=0xfffff)
[ 2356.558796] raw: 00afffff60020838 fffffdffdb5d0048 fffffdffdadf7fc8 ffff00064c1629c1
[ 2356.558817] raw: 0000000ffff85200 0000000000000000 0000000100000000 ffff000651775780
[ 2356.558839] page dumped because: VM_BUG_ON_FOLIO(!folio_test_large(folio))
[ 2356.558882] ------------[ cut here ]------------
[ 2356.558897] kernel BUG at ./include/linux/mm.h:1103!
[ 2356.558982] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 2356.564729] CPU: 8 UID: 0 PID: 1864 Comm: folio-split-rac Tainted: G S W 6.15.0+ #3 PREEMPT(voluntary)
[ 2356.566196] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[ 2356.566814] Hardware name: Red Hat KVM, BIOS edk2-20241117-3.el9 11/17/2024
[ 2356.567684] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2356.568563] pc : kpage_read.constprop.0+0x26c/0x290
[ 2356.569605] lr : kpage_read.constprop.0+0x26c/0x290
[ 2356.569992] sp : ffff80008fb739b0
[ 2356.570263] x29: ffff80008fb739b0 x28: ffff00064aa69580 x27: 00000000ff000000
[ 2356.570842] x26: 0000fffffffffff8 x25: ffff00064aa69580 x24: ffff80008fb73ae0
[ 2356.571411] x23: 0000000000000001 x22: 0000ffff86c6e8b8 x21: 0000000000000008
[ 2356.571978] x20: 00000000006f7c00 x19: 0000ffff86c6e8b8 x18: 0000000000000000
[ 2356.572581] x17: 3630303066666666 x16: 0000000000000003 x15: 0000000000001000
[ 2356.573217] x14: 00000000ffffffff x13: 0000000000000004 x12: 00aaaaaa00aaaaaa
[ 2356.577674] x11: 0000000000000000 x10: 00aaaaaa00aaaaaa x9 : ffffbf3afca6c300
[ 2356.578332] x8 : 0000000000000002 x7 : 0000000000000001 x6 : 0000000000000001
[ 2356.578984] x5 : ffff000c79812408 x4 : 0000000000000000 x3 : 0000000000000000
[ 2356.579635] x2 : 0000000000000000 x1 : ffff00064aa69580 x0 : 000000000000003e
[ 2356.580286] Call trace:
[ 2356.580524] kpage_read.constprop.0+0x26c/0x290 (P)
[ 2356.580982] kpagecount_read+0x28/0x40
[ 2356.581336] proc_reg_read+0x38/0x100
[ 2356.581681] vfs_read+0xcc/0x320
[ 2356.581992] ksys_read+0x74/0x118
[ 2356.582306] __arm64_sys_read+0x24/0x38
[ 2356.582668] invoke_syscall+0x70/0x100
[ 2356.583022] el0_svc_common.constprop.0+0x48/0xf8
[ 2356.583456] do_el0_svc+0x28/0x40
[ 2356.583930] el0_svc+0x38/0x118
[ 2356.584328] el0t_64_sync_handler+0x144/0x168
[ 2356.584883] el0t_64_sync+0x19c/0x1a0
[ 2356.585350] Code: aa0103e0 9003a541 91082021 97f813fc (d4210000)
[ 2356.586130] ---[ end trace 0000000000000000 ]---
[ 2356.587377] note: folio-split-rac[1864] exited with irqs disabled
[ 2356.588050] note: folio-split-rac[1864] exited with preempt_count 1
Link: https://lkml.kernel.org/r/1c05cc725b90962d56323ff2e28e9cc3ae397b68.1752499009.git.luizcap@redhat.com
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
Reported-by: syzbot+3d7dc5eaba6b932f8535@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67812fbd.050a0220.d0267.0030.GAE@google.com/Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Right now it appears that the code is relying upon the returned
destination address having bits outside PAGE_MASK to indicate whether an
error value is specified, and decrementing the increased refcount on the
uffd ctx if so.
This is not a safe means of determining an error value, so instead, be
specific. It makes far more sense to do so in a dedicated error path, so
add mremap_userfaultfd_fail() for this purpose and use this when an error
arises.
A vm_userfaultfd_ctx is not established until we are at the point where
mremap_userfaultfd_prep() is invoked in copy_vma_and_data(), so this is a
no-op until this happens.
That is - uffd remap notification only occurs if the VMA is actually moved
- at which point a UFFD_EVENT_REMAP event is raised.
No errors can occur after this point currently, though it's certainly not
guaranteed this will always remain the case, and we mustn't rely on this.
However, the reason for needing to handle this case is that, when an error
arises on a VMA move at the point of adjusting page tables, we revert this
operation, and propagate the error.
At this point, it is not correct to raise a uffd remap event, and we must
handle it.
This refactoring makes it abundantly clear what we are doing.
We assume vrm->new_addr is always valid, which a prior change made the
case even for mremap() invocations which don't move the VMA, however given
no uffd context would be set up in this case it's immaterial to this
change anyway.
No functional change intended.
Link: https://lkml.kernel.org/r/a70e8a1f7bce9f43d1431065b414e0f212297297.1752770784.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Stop using the obsolete write_cache_pages and use writeback_iter directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc8).
Conflicts:
drivers/net/ethernet/microsoft/mana/gdma_main.c
9669ddda18fb ("net: mana: Fix warnings for missing export.h header inclusion")
755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au
Adjacent changes:
drivers/net/ethernet/ti/icssg/icssg_prueth.h
6e86fb73de0f ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG")
ffe8a4909176 ("net: ti: icssg-prueth: Read firmware-names from device tree")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The function ksmbd_vfs_kern_path_locked() seems to serve two functions
and as a result has an odd interface.
On success it returns with the parent directory locked and with write
access on that filesystem requested, but it may have crossed over a
mount point to return the path, which makes the lock and the write
access irrelevant.
This patches separates the functionality into two functions:
- ksmbd_vfs_kern_path() does not lock the parent, does not request
write access, but does cross mount points
- ksmbd_vfs_kern_path_locked() does not cross mount points but
does lock the parent and request write access.
The parent_path parameter is no longer needed. For the _locked case
the final path is sufficient to drop write access and to unlock the
parent (using path->dentry->d_parent which is safe while the lock is
held).
There were 3 caller of ksmbd_vfs_kern_path_locked().
- smb2_create_link() needs to remove the target if it existed and
needs the lock and the write-access, so it continues to use
ksmbd_vfs_kern_path_locked(). It would not make sense to
cross mount points in this case.
- smb2_open() is the only user that needs to cross mount points
and it has no need for the lock or write access, so it now uses
ksmbd_vfs_kern_path()
- smb2_creat() does not need to cross mountpoints as it is accessing
a file that it has just created on *this* filesystem. But also it
does not need the lock or write access because by the time
ksmbd_vfs_kern_path_locked() was called it has already created the
file. So it could use either interface. It is simplest to use
ksmbd_vfs_kern_path().
ksmbd_vfs_kern_path_unlock() is still needed after
ksmbd_vfs_kern_path_locked() but it doesn't require the parent_path any
more. After a successful call to ksmbd_vfs_kern_path(), only path_put()
is needed to release the path.
Signed-off-by: NeilBrown <neil@brown.name>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
ri_buf just holds a pointer/len pair and is not a log iovec used for
writing to the log. Switch to use a kvec instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
These buffers are not directly logged, just use a kvec and remove the
xlog_copy_from_iovec helper only used here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The lv_size member counts the size of the entire allocation, rename it to
lv_alloc_size to make that clear.
The lv_buf_len member tracks how much of lv_buf has been used up
to format the log item, rename it to lv_buf_used to make that more clear.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Split out handling of ordered items into a single branch in
xlog_cil_insert_format_items so that the rest of the code becomes more
clear.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
By the time xfs_cil_prepare_item is called, the old lv is still pointed
to by the log item. Take it from there instead of spreading the old lv
logic over xlog_cil_insert_format_items and xfs_cil_prepare_item.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The call to the event xfs_reflink_cow_enospc was removed when the COW
handling was merged into xfs_file_iomap_begin_delay, but the trace event
itself was not. Remove it.
Fixes: db46e604adf8 ("xfs: merge COW handling into xfs_file_iomap_begin_delay")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The trace event xfs_discard_rtrelax was added but never used. Remove it.
Fixes: a330cae8a7147 ("xfs: Remove header files which are included more than once")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The trace event xfs_log_cil_return was added but never used. Remove it.
Fixes: c1220522ef405 ("xfs: grant heads track byte counts, not LSNs")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The tracepoint trace_xfs_dqreclaim_dirty was removed with other code
removed from xfs_qm_dquot_isolate() but the defined tracepoint was not.
Fixes: d62016b1a2df ("xfs: avoid dquot buffer pin deadlock")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Replace the deprecated strncpy() with memtostr_pad(). This also avoids
the need for separate zeroing using memset(). Mark sb_fname buffer with
__nonstring as its size is XFSLABEL_MAX and so no terminating NULL for
sb_fname.
Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Fixes: e967dc40d501 ("xfs: return the allocated transaction from xfs_trans_alloc_empty")
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The top of the function comment is outdated, and the parts still correct
duplicate information in comment inside the function. Remove the top of
the function comment and instead improve a comment inside the function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Describe the rationale for the decisions a bit better.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
None of them actually needs the inode, the mount is enough.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
This member just tracks how much space we handed out for sequential
write required zones. Only for conventional space it actually is the
pointer where thing are written at, otherwise zone append manages
that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
i_used_blocks is a uint32_t, so use the same value for the local variable
caching it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Split up the XFS_IS_CORRUPT statement so that it immediately shows
if the reference counter overflowed or underflowed.
I ran into this quite a bit when developing the zoned allocator, and had
to reapply the patch for some work recently. We might as well just apply
it upstream given that freeing group is far removed from performance
critical code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Almost no users of the typedef left, kill it and switch the remaining
users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
XFS stopped using current->journal_info in commit f2e812c1522d ("xfs:
don't use current->journal_info"), so there is no point in saving and
restoring it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xchk_trans_alloc_empty can't return errors, so return the allocated
transaction directly instead of an output double pointer argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfs_trans_alloc_empty can't return errors, so return the allocated
transaction directly instead of an output double pointer argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfs_trans_roll uses xfs_trans_reserve to basically just call into
xfs_log_regrant while bypassing the reset of xfs_trans_reserve.
Open code the call to xfs_log_regrant in xfs_trans_roll and simplify
xfs_trans_reserve now that it never regrants and always asks for a log
reservation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfs_trans_alloc_empty only shares the very basic transaction structure
allocation and initialization with xfs_trans_alloc.
Split out a new __xfs_trans_alloc helper for that and otherwise decouple
xfs_trans_alloc_empty from xfs_trans_alloc.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfs_trans_reserve_more just tries to allocate additional blocks and/or
rtextents and is otherwise unrelated to the transaction reservation
logic. Open code the block and rtextent reservation in
xfs_trans_reserve_more to prepare for simplifying xfs_trans_reserve.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Instead of duplicating the empty transacaction reservation
definition.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Use cmp_int() to yield the result of a three-way-comparison instead of
performing subtractions with extra casts. Thus also rename the function
to make its name clearer in purpose.
Found by Linux Verification Center (linuxtesting.org).
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Perhaps that's just my silly imagination but 'diff' doesn't look good for
the name of a variable to hold a result of a three-way-comparison
(-1, 0, 1) which is what ->cmp_key_with_cur() does. It implies to contain
an actual difference between the two integer variables but that's not true
anymore after recent refactoring.
Declaring it as int64_t is also misleading now. Plain integer type is
more than enough.
Found by Linux Verification Center (linuxtesting.org).
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The net value of these functions is to determine the result of a
three-way-comparison between operands of the same type.
Simplify the code using cmp_int() to eliminate potential errors with
opencoded casts and subtractions. This also means we can change the return
value type of cmp_key_with_cur routines from int64_t to int and make the
interface a bit clearer.
Found by Linux Verification Center (linuxtesting.org).
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The net value of these functions is to determine the result of a
three-way-comparison between operands of the same type.
Simplify the code using cmp_int() to eliminate potential errors with
opencoded casts and subtractions. This also means we can change the return
value type of cmp_two_keys routines from int64_t to int and make the
interface a bit clearer.
Found by Linux Verification Center (linuxtesting.org).
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
key_diff routines compare a key value with a cursor value. Make the naming
to be a bit more self-descriptive.
Found by Linux Verification Center (linuxtesting.org).
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
One may think that diff_two_keys routines are used to compute the actual
difference between the arguments but they return a result of a
three-way-comparison of the passed operands. So it looks more appropriate
to denote them as cmp_two_keys.
Found by Linux Verification Center (linuxtesting.org).
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfs_xattr_class was accidentally created as a TRACE_EVENT() instead of a
class with DECLARE_EVENT_CLASS().
Note, TRACE_EVENT() is just defined as:
#define TRACE_EVENT(name, proto, args, tstruct, assign, print) \
DECLARE_EVENT_CLASS(name, \
PARAMS(proto), \
PARAMS(args), \
PARAMS(tstruct), \
PARAMS(assign), \
PARAMS(print)); \
DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args));
The difference between TRACE_EVENT() and DECLARE_EVENT_CLASS() is that
TRACE_EVENT() also creates an event with the class name.
Switch xfs_xattr_class over to being a class and not an event as it is not
called directly, and that event with the class name takes up unnecessary
memory.
Fixes: e47dcf113ae3 ("xfs: repair extended attributes")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The trace event xfs_file_compat_ioctl is only used when CONFIG_COMPAT is
configured in the build. As trace events can take up to 5K in memory for
text and meta data regardless if they are used, they should not be created
when unused. Add #ifdef CONFIG_COMPAT around the event so that it is only
created when that is configured.
Fixes: cca28fb83d9e6 ("xfs: split xfs_itrace_entry")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When the use of iomap_dio_rw was added, the calls to the trace events
xfs_end_io_direct_unwritten and xfs_end_io_direct_append were removed but
those trace events were not. As trace events can take up to 5K in memory
for text and meta data regardless if they are used or not, they should not
be created when not used. Remove the unused events.
Fixes: acdda3aae146 ("xfs: use iomap_dio_rw")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When the function xfs_flushinval_pages() was removed, it removed the only
caller to the trace event xfs_pagecache_inval. As trace events can take up
to 5K of memory in text and meta data each regardless if they are used or
not, they should not be created when unused. Remove the unused event.
Fixes: fb59581404ab ("xfs: remove xfs_flushinval_pages")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When the function xfs_alloc_space_available() was restructured, it removed
the only calls to the trace event xfs_alloc_near_nominleft. As trace
events take up to 5K of memory for text and meta data for each event, they
should not be created when not used. Remove this unused event.
Fixes: 54fee133ad59 ("xfs: adjust allocation length in xfs_alloc_space_available")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Trace events take up to 5K of memory in text and meta data regardless if
they are used or not. The call to the event xfs_alloc_near_error was
removed when the cursor data structure allocation was introduced. Remove
it as it is no longer used and is just wasting memory.
Fixes: f5e7dbea1e3e ("xfs: introduce allocation cursor data structure")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When xfs_attri_remove_iter() was removed, so was the call to the trace
event xfs_attr_node_removename. As trace events can take up to 5K in
memory for text and meta data regardless if they are used or not, they
should not be created when unused. Remove the unused event.
Fixes: 59782a236b622 ("xfs: remove xfs_attri_remove_iter")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Trace events can take up to 5K in memory for text and meta data per event
regardless if they are used or not, so they should not be defined when not
used. The events xfs_attr_fillstate and xfs_attr_refillstate are only
called in code that is #ifdef out and exists only for future reference.
Remove these unused events. If the code is needed again, then git history
can recover what the events were.
Suggested-by: Christoph Hellwig <hch@lst.de>
Fixes: 59782a236b622 ("xfs: remove xfs_attri_remove_iter")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When the function xfs_attr_rmtval_set() was removed, the call to the
corresponding trace event was also removed but the trace event itself was
not. As trace events can take up to 5K of memory in text and meta data
regardless if they are used or not they should not be created when not
used. Remove the unused trace event.
Fixes: 0e6acf29db6f ("xfs: Remove xfs_attr_rmtval_set")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When the clone/dedupe_file_rang common functions were refactored, it
removed the calls to the xfs_reflink_compare_extents and
xfs_reflink_compare_extents_error events. As each event can take up to 5K
in memory for text and meta data regardless if they are used or not, they
should not be created if they are not used. Remove these unused events.
Fixes: 876bec6f9bbf ("vfs: refactor clone/dedupe_file_range common functions")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The trace event xfs_ioctl_clone was added but never used. As trace events
can take up to 5K of memory in text and meta data regardless if they are
used or not, remove the unused trace event.
Fixes: 53aa1c34f4eb ("xfs: define tracepoints for reflink activities")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|