Age | Commit message (Collapse) | Author |
|
Object debugging generally needs special provisions for putting said
objects on the stack, which rhashtable does not have.
Reported-by: syzbot+bcc38a9556d0324c2ec2@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we think we're read-only but the VFS doesn't, fun will ensue.
And now that we know we have to be able to do this safely, just make
nochanges imply ro.
Reported-by: syzbot+a7d6ceaba099cc21dee4@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Link: https://lore.kernel.org/all/6822ab02.050a0220.f2294.00cb.GAE@google.com/T/
Reported-by: syzbot+2c3ef91c9523c3d1a25c@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_btree_lost_data() gets called on btree node read error, but the
error might be transient.
btree_node_scan is expensive, and there's no need to run it persistently
(marking it in the superblock as required to run) - check_topology
will run it if required, via bch2_get_scanned_nodes().
Running it non-persistently is fine, to avoid check_topology having to
rewind recovery to run it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_btree_increase_depth() was originally for disaster recovery, to get
some data back from the journal when a btree root was bad.
We don't need it for that purpose anymore; on bad btree root we'll
launch btree node scan and reconstruct all the interior nodes.
If there's a key in the journal for a depth that doesn't exists, and
it's not from check_topology/btree node scan, we should just ignore it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Invalidate pagecache after we write the new superblock and send a
uevent.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It seems excessive forced btree node rewrites can cause interior btree
updates to become wedged during recovery, before we're using the write
buffer for backpointer updates.
Add more flags so we can determine where these are coming from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We had a deadlock during recovery where interior btree updates became
wedged and all open_buckets were consumed; start adding more
introspection.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Log the specific error being corrected in the journal when we're
repairing, this helps greatly with 'bcachefs list_journal' analysis.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The next patch will add logging of the specific error being corrected in
repair paths to the journal; this means __bch2_fsck_err() can return
transaction restarts in places that previously weren't expecting them.
check_topology() is old code that doesn't use btree iterators for btree
node locking - it'll have to be rewritten in the future to work online.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
just set DCACHE_DONTCACHE for "don't cache" mounts...
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
makes simple_lookup() slightly cheaper there - no need for
simple_lookup() to set the flag and we want it on everything
on those anyway.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
No users left and anything that wants it would be better off just
setting DCACHE_DONTCACHE in their ->s_d_flags.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Default ->d_op being simple_dentry_operations is equivalent to leaving
it NULL and putting DCACHE_DONTCACHE into ->s_d_flags.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Do that before new dentry is visible anywhere. It does create
a new possible state for dentries present in ->d_children/->d_sib -
DCACHE_PAR_LOOKUP present, negative, unhashed, not in in-lookup
hash chains, refcount positive. Those are going to be skipped
by all tree-walkers (both d_walk() callbacks in fs/dcache.c and
explicit loops over children/sibling lists elsewhere) and
dput() is fine with those.
NOTE: dropping the final reference to a "normal" in-lookup dentry
(in in-lookup hash) is a bug - somebody must've forgotten to
call d_lookup_done() on it and bad things will happen. With those
it's OK; if/when we get around to making __dentry_kill() complain
about such breakage, remember that predicate to check should
*not* be just d_in_lookup(victim) but rather a combination of that
with !hlist_bl_unhashed(&victim->d_u.d_in_lookup_hash). Might
be worth considering later...
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Convert the last user (d_alloc_pseudo()) and be done with that.
Any out-of-tree filesystem using it should switch to d_splice_alias_ops()
or, better yet, check whether it really needs to have ->d_op vary among
its dentries.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
No need to mess with ->d_op at all. Note that ->d_delete that always
returns 1 is equivalent to having DCACHE_DONTCACHE in ->d_flags.
Later the same thing will be placed into ->s_d_flags of the filesystems
where we want that behaviour for all dentries; then the check in
simple_lookup() will at least get unlikely() slapped on it.
NOTE: there are only two filesystems where
* simple_lookup() might be called
* default ->d_op is non-NULL
* its ->d_delete() doesn't always return 1
If not for those, we could have simple_lookup() just set DCACHE_DONTCACHE
without even looking at ->d_op. Filesystems in question are btrfs
and tracefs; both have ->d_delete() returning 1 on anything fed to
simple_lookup(), so both would be fine with simple_lookup() setting
DCACHE_DONTCACHE regardless of ->d_op.
IOW, we might want to drop the check for ->d_op in simple_lookup();
it's definitely a separate story, though.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If a lookup in tracefs is done on a file that does not exist, it leaves a
dentry hanging around until memory pressure removes it. But eventfs
dentries should hang around as when their ref count goes to zero, it
requires more work to recreate it. For the rest of the tracefs dentries,
they hang around as their dentry is used as a descriptor for the tracing
system. But if a file lookup happens for a file in tracefs that does not
exist, it should be deleted.
Add a .d_delete callback that checks if dentry->fsdata is set or not. Only
eventfs dentries set fsdata so if it has content it should not be deleted
and should hang around in the cache.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and store it in ->s_d_flags, to be used by __d_alloc()
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
DCACHE_OP_PRUNE in ->d_flags at the time of d_set_d_op() should've
been treated the same as any other DCACHE_OP_... - we forgot to adjust
that WARN_ON() when DCACHE_OP_PRUNE had been introduced...
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
rather than locking the directory and using lookup_one(), just use
lookup_one_unlocked(). This keeps locking code centralised.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250608230952.20539-5-neil@brown.name
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The code in coda_readdir() is nearly identical to iterate_dir().
Differences are:
- iterate_dir() is killable
- iterate_dir() adds permission checking and accessing notifications
I believe these are not harmful for coda so it is best to use
iterate_dir() directly. This will allow locking changes without
touching the code in coda.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250608230952.20539-4-neil@brown.name
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The effect of lookup_one_qstr_excl_raw() can be achieved by passing
LOOKUP_CREATE() to lookup_one_qstr_excl() - we don't need a separate
function.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250608230952.20539-2-neil@brown.name
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The recent change from using d_hash_and_lookup() to using
try_lookup_noperm() inadvertently introduce a d_revalidate() call when
the lookup was successful. Steven French reports that this resulted in
worse than halving of performance in some cases.
Prior to the offending patch the only caller of try_lookup_noperm() was
autofs which does not need the d_revalidate(). So it is safe to remove
the d_revalidate() call providing we stop using try_lookup_noperm() to
implement lookup_noperm().
The "try_" in the name is strongly suggestive that the caller isn't
expecting much effort, so it seems reasonable to avoid the effort of
d_revalidate().
Fixes: 06c567403ae5 ("Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS")
Reported-by: Steve French <smfrench@gmail.com>
Link: https://lore.kernel.org/all/CAH2r5mu5SfBrdc2CFHwzft8=n9koPMk+Jzwpy-oUMx-wCRCesQ@mail.gmail.com/
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/174951744454.608730.18354002683881684261@noble.neil.brown.name
Tested-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Apart from the network and mount namespace all other namespaces expose a
stable inode number and userspace has been relying on that for a very
long time now. It's very much heavily used API. Align the mount
namespace and use a stable inode number from the reserved procfs inode
number space so this is consistent across all namespaces.
Link: https://lore.kernel.org/20250606-work-nsfs-v1-3-b8749c9a8844@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... to be used instead of manually assigning to ->s_d_op.
All in-tree filesystem converted (and field itself is renamed,
so any out-of-tree ones in need of conversion will be caught
by compiler).
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
->d_revalidate() is never called for root anyway...
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Uses of d_set_d_op() on live dentry can be very dangerous; it is going
to be withdrawn and replaced with saner things.
The best way for a filesystem is to have the default dentry_operations
set at mount time and be done with that - __d_alloc() will use that.
Currently there are two cases when d_set_d_op() is used on a live dentry -
one is procfs, which has several genuinely different dentry_operations
instances (different ->d_revalidate(), etc.) and another is
simple_lookup(), where we would be better off without overriding ->d_op.
For procfs we have d_set_d_op() calls followed by d_splice_alias();
provide a new helper (d_splice_alias_ops(inode, dentry, d_ops)) that would
combine those two, and do the d_set_d_op() part while under ->d_lock.
That eliminates one of the places where ->d_flags had been modified
without holding ->d_lock; current behaviour is not racy, but the reasons
for that are far too brittle. Better move to uniform locking rules and
simpler proof of correctness...
The next commit will convert procfs to use of that helper; it is not
exported and won't be until somebody comes up with convincing modular
user for it.
Again, the best approach is to have default ->d_op and let __d_alloc()
do the right thing; filesystem _may_ need non-uniform ->d_op (procfs
does), but there'd better be good reasons for that.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
It has two possible values - one for "forced lookup" entries, another
for the normal ones. We'd be better off with that as an explicit
flag anyway and in addition to that it opens some fun possibilities
with ->d_op and ->d_flags handling.
[moved PROC_ENTRY_FORCE_LOOKUP to include/linux/proc_fs.h, switched it
to an unused bit - there was a conflict]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This conversion moved the folio_unlock() to inside __write_node_folio(),
but missed one caller so we had a double-unlock on this path.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chao Yu <chao@kernel.org>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Reported-by: syzbot+c0dc46208750f063d0e0@syzkaller.appspotmail.com
Fixes: 80f31d2a7e5f (f2fs: return bool from __write_node_folio)
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Currently the function that does this takes a struct file_lock, but
__locks_wake_up_blocks() deals with both locks and leases. Currently
this works because both file_lock and file_lease have the file_lock_core
at the beginning of the struct, but it's fragile to rely on that.
Add a new locks_wake_up_waiter() function and call that from
__locks_wake_up_blocks().
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/20250602-filelock-6-16-v1-1-7da5b2c930fd@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rather than have the caller set the FMODE_NOWAIT flags for both output
files, move it to create_pipe_files() where other f_mode flags are set
anyway with stream_open(). With that, both __do_pipe_flags() and
io_pipe() can remove the manual setting of the NOWAIT flags.
No intended functional changes, just a code cleanup.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/1f0473f8-69f3-4eb1-aa77-3334c6a71d24@kernel.dk
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
'implemenation' --> 'implementation'.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/20250530173204.3611576-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If SMB 3.1.1 POSIX Extensions are available and negotiated, the client
should be able to use all characters and not remap anything. Currently, the
user has to explicitly request this behavior by specifying the "nomapposix"
mount option.
Link: https://lore.kernel.org/4195bb677b33d680e77549890a4f4dd3b474ceaf.camel@rx2.rx-server.de
Signed-off-by: Philipp Kerling <pkerling@casix.org>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
IOW, read_seqlock_excl() is sufficient there; no need to bother
with write_seqlock() (forcing all rename_lock readers into retry).
That leaves rename_lock taken for write only when we want to change
someone's parent or name.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanup from Thomas Gleixner:
"The delayed from_timer() API cleanup:
The renaming to the timer_*() namespace was delayed due massive
conflicts against Linux-next. Now that everything is upstream finish
the conversion"
* tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide, timers: Rename from_timer() to timer_container_of()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of x86 fixes:
- Cure IO bitmap inconsistencies
A failed fork cleans up all resources of the newly created thread
via exit_thread(). exit_thread() invokes io_bitmap_exit() which
does the IO bitmap cleanups, which unfortunately assume that the
cleanup is related to the current task, which is obviously bogus.
Make it work correctly
- A lockdep fix in the resctrl code removed the clearing of the
command buffer in two places, which keeps stale error messages
around. Bring them back.
- Remove unused trace events"
* tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex
x86/iopl: Cure TIF_IO_BITMAP inconsistencies
x86/fpu: Remove unused trace events
|
|
Pull mount fixes from Al Viro:
"Various mount-related bugfixes:
- split the do_move_mount() checks in subtree-of-our-ns and
entire-anon cases and adapt detached mount propagation selftest for
mount_setattr
- allow clone_private_mount() for a path on real rootfs
- fix a race in call of has_locked_children()
- fix move_mount propagation graph breakage by MOVE_MOUNT_SET_GROUP
- make sure clone_private_mnt() caller has CAP_SYS_ADMIN in the right
userns
- avoid false negatives in path_overmount()
- don't leak MNT_LOCKED from parent to child in finish_automount()
- do_change_type(): refuse to operate on unmounted/not ours mounts"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
do_change_type(): refuse to operate on unmounted/not ours mounts
clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns
selftests/mount_setattr: adapt detached mount propagation test
do_move_mount(): split the checks in subtree-of-our-ns and entire-anon cases
fs: allow clone_private_mount() for a path on real rootfs
fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
finish_automount(): don't leak MNT_LOCKED from parent to child
path_overmount(): avoid false negatives
fs/fhandle.c: fix a race in call of has_locked_children()
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- multichannel/reconnect fixes
- move smbdirect (smb over RDMA) defines to fs/smb/common so they will
be able to be used in the future more broadly, and a documentation
update explaining setting up smbdirect mounts
- update email address for Paulo
* tag '6.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal version number
MAINTAINERS, mailmap: Update Paulo Alcantara's email address
cifs: add documentation for smbdirect setup
cifs: do not disable interface polling on failure
cifs: serialize other channels when query server interfaces is pending
cifs: deal with the channel loading lag while picking channels
smb: client: make use of common smbdirect_socket_parameters
smb: smbdirect: introduce smbdirect_socket_parameters
smb: client: make use of common smbdirect_socket
smb: smbdirect: add smbdirect_socket.h
smb: client: make use of common smbdirect.h
smb: smbdirect: add smbdirect.h with public structures
smb: client: make use of common smbdirect_pdu.h
smb: smbdirect: add smbdirect_pdu.h with protocol definitions
|
|
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull JFFS2 and UBIFS fixes from Richard Weinberger:
"JFFS2:
- Correctly check return code of jffs2_prealloc_raw_node_refs()
UBIFS:
- Spelling fixes"
* tag 'ubifs-for-linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
jffs2: check jffs2_prealloc_raw_node_refs() result in few other places
jffs2: check that raw node were preallocated before writing summary
ubifs: Fix grammar in error message
|
|
Ensure that propagation settings can only be changed for mounts located
in the caller's mount namespace. This change aligns permission checking
with the rest of mount(2).
Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 07b20889e305 ("beginning of the shared-subtree proper")
Reported-by: "Orlando, Noah" <Noah.Orlando@deshaw.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|