summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
diff options
context:
space:
mode:
authorNeilBrown <neilb@suse.de>2025-02-27 12:32:53 +1100
committerChristian Brauner <brauner@kernel.org>2025-02-27 20:00:17 +0100
commit88d5baf69082e5b410296435008329676b687549 (patch)
treeeacae7fee7d59e13c94ed63b1c4eff8f104c849d /Documentation/filesystems
parent71628584df835970d25c334ea03c012daccec4c1 (diff)
Change inode_operations.mkdir to return struct dentry *
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/locking.rst2
-rw-r--r--Documentation/filesystems/porting.rst19
-rw-r--r--Documentation/filesystems/vfs.rst23
3 files changed, 41 insertions, 3 deletions
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index d20a32b77b60..0ec0bb6eb0fb 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -66,7 +66,7 @@ prototypes::
int (*link) (struct dentry *,struct inode *,struct dentry *);
int (*unlink) (struct inode *,struct dentry *);
int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *);
- int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
+ struct dentry *(*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
int (*rmdir) (struct inode *,struct dentry *);
int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t);
int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *,
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 3ed3f39ecf71..fe0581271d5b 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1178,3 +1178,22 @@ these conditions don't require explicit checks:
LOOKUP_EXCL now means "target must not exist". It can be combined with
LOOK_CREATE or LOOKUP_RENAME_TARGET.
+
+---
+
+** mandatory**
+
+->mkdir() now returns a 'struct dentry *'. If the created inode is
+found to already be in cache and have a dentry (often IS_ROOT()), it will
+need to be spliced into the given name in place of the given dentry.
+That dentry now needs to be returned. If the original dentry is used,
+NULL should be returned. Any error should be returned with
+ERR_PTR().
+
+In general, filesystems which use d_instantiate_new() to install the new
+inode can safely return NULL. Filesystems which may not have an I_NEW inode
+should use d_drop();d_splice_alias() and return the result of the latter.
+
+If a positive dentry cannot be returned for some reason, in-kernel
+clients such as cachefiles, nfsd, smb/server may not perform ideally but
+will fail-safe.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 31eea688609a..ae79c30b6c0c 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -495,7 +495,7 @@ As of kernel 2.6.22, the following members are defined:
int (*link) (struct dentry *,struct inode *,struct dentry *);
int (*unlink) (struct inode *,struct dentry *);
int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *);
- int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
+ struct dentry *(*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
int (*rmdir) (struct inode *,struct dentry *);
int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t);
int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *,
@@ -562,7 +562,26 @@ otherwise noted.
``mkdir``
called by the mkdir(2) system call. Only required if you want
to support creating subdirectories. You will probably need to
- call d_instantiate() just as you would in the create() method
+ call d_instantiate_new() just as you would in the create() method.
+
+ If d_instantiate_new() is not used and if the fh_to_dentry()
+ export operation is provided, or if the storage might be
+ accessible by another path (e.g. with a network filesystem)
+ then more care may be needed. Importantly d_instantate()
+ should not be used with an inode that is no longer I_NEW if there
+ any chance that the inode could already be attached to a dentry.
+ This is because of a hard rule in the VFS that a directory must
+ only ever have one dentry.
+
+ For example, if an NFS filesystem is mounted twice the new directory
+ could be visible on the other mount before it is on the original
+ mount, and a pair of name_to_handle_at(), open_by_handle_at()
+ calls could instantiate the directory inode with an IS_ROOT()
+ dentry before the first mkdir returns.
+
+ If there is any chance this could happen, then the new inode
+ should be d_drop()ed and attached with d_splice_alias(). The
+ returned dentry (if any) should be returned by ->mkdir().
``rmdir``
called by the rmdir(2) system call. Only required if you want