summaryrefslogtreecommitdiff
path: root/fs/overlayfs/inode.c
AgeCommit message (Collapse)Author
2024-05-10ovl: remove duplicate included headerThorsten Blum
Remove duplicate included header file linux/posix_acl.h Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-11-18fs: Pass AT_GETATTR_NOSEC flag to getattr interface functionStefan Berger
When vfs_getattr_nosec() calls a filesystem's getattr interface function then the 'nosec' should propagate into this function so that vfs_getattr_nosec() can again be called from the filesystem's gettattr rather than vfs_getattr(). The latter would add unnecessary security checks that the initial vfs_getattr_nosec() call wanted to avoid. Therefore, introduce the getattr flag GETATTR_NOSEC and allow to pass with the new getattr_flags parameter to the getattr interface function. In overlayfs and ecryptfs use this flag to determine which one of the two functions to call. In a recent code change introduced to IMA vfs_getattr_nosec() ended up calling vfs_getattr() in overlayfs, which in turn called security_inode_getattr() on an exiting process that did not have current->fs set anymore, which then caused a kernel NULL pointer dereference. With this change the call to security_inode_getattr() can be avoided, thus avoiding the NULL pointer dereference. Reported-by: <syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com> Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version") Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <linux-fsdevel@vger.kernel.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Tyler Hicks <code@tyhicks.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Suggested-by: Christian Brauner <brauner@kernel.org> Co-developed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Link: https://lore.kernel.org/r/20231002125733.1251467-1-stefanb@linux.vnet.ibm.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-31ovl: Move xattr support to new xattrs.c fileAmir Goldstein
This moves the code from super.c and inode.c, and makes ovl_xattr_get/set() static. This is in preparation for doing more work on xattrs support. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31ovl: reorder ovl_want_write() after ovl_inode_lock()Amir Goldstein
Make the locking order of ovl_inode_lock() strictly between the two vfs stacked layers, i.e.: - ovl vfs locks: sb_writers, inode_lock, ... - ovl_inode_lock - upper vfs locks: sb_writers, inode_lock, ... To that effect, move ovl_want_write() into the helpers ovl_nlink_start() and ovl_copy_up_start which currently take the ovl_inode_lock() after ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-18overlayfs: convert to new timestamp accessorsJeff Layton
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-58-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-30Merge tag 'ovl-update-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - add verification feature needed by composefs (Alexander Larsson) - improve integration of overlayfs and fanotify (Amir Goldstein) - fortify some overlayfs code (Andrea Righi) * tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: validate superblock in OVL_FS() ovl: make consistent use of OVL_FS() ovl: Kconfig: introduce CONFIG_OVERLAY_FS_DEBUG ovl: auto generate uuid for new overlay filesystems ovl: store persistent uuid/fsid with uuid=on ovl: add support for unique fsid per instance ovl: support encoding non-decodable file handles ovl: Handle verity during copy-up ovl: Validate verity xattr when resolving lowerdata ovl: Add versioned header for overlay.metacopy xattr ovl: Add framework for verity support
2023-08-12ovl: make consistent use of OVL_FS()Andrea Righi
Always use OVL_FS() to retrieve the corresponding struct ovl_fs from a struct super_block. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12ovl: support encoding non-decodable file handlesAmir Goldstein
When all layers support file handles, we support encoding non-decodable file handles (a.k.a. fid) even with nfs_export=off. When file handles do not need to be decoded, we do not need to copy up redirected lower directories on encode, and we encode also non-indexed upper with lower file handle, so fid will not change on copy up. This enables reporting fanotify events with file handles on overlayfs with default config/mount options. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-11fs: drop the timespec64 argument from update_timeJeff Layton
Now that all of the update_time operations are prepared for it, we can drop the timespec64 argument from the update_time operation. Do that and remove it from some associated functions like inode_update_time and inode_needs_update_time. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230807-mgctime-v7-8-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-19ovl: pass ovl_fs to xino helpersAmir Goldstein
Internal ovl methods should use ovl_fs and not sb as much as possible. Use a constant_table to translate from enum xino mode to string in preperation for new mount api option parsing. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-06-19ovl: prepare for lazy lookup of lowerdata inodeAmir Goldstein
Make the code handle the case of numlower > 1 and missing lowerdata dentry gracefully. Missing lowerdata dentry is an indication for lazy lookup of lowerdata and in that case the lowerdata_redirect path is stored in ovl_inode. Following commits will defer lookup and perform the lazy lookup on access. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19ovl: prepare to store lowerdata redirect for lazy lowerdata lookupAmir Goldstein
Prepare to allow ovl_lookup() to leave the last entry in a non-dir lowerstack empty to signify lazy lowerdata lookup. In this case, ovl_lookup() stores the redirect path from metacopy to lowerdata in ovl_inode, which is going to be used later to perform the lazy lowerdata lookup. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19ovl: deduplicate lowerdata and lowerstack[]Amir Goldstein
The ovl_inode contains a copy of lowerdata in lowerstack[], so the lowerdata inode member can be removed. Use accessors ovl_lowerdata*() to get the lowerdata whereever the member was accessed directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19ovl: deduplicate lowerpath and lowerstack[]Amir Goldstein
The ovl_inode contains a copy of lowerpath in lowerstack[0], so the lowerpath member can be removed. Use accessor ovl_lowerpath() to get the lowerpath whereever the member was accessed directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19ovl: move ovl_entry into ovl_inodeAmir Goldstein
The lower stacks of all the ovl inode aliases should be identical and there is redundant information in ovl_entry and ovl_inode. Move lowerstack into ovl_inode and keep only the OVL_E_FLAGS per overlay dentry. Following patches will deduplicate redundant ovl_inode fields. Note that for pure upper and negative dentries, OVL_E(dentry) may be NULL now, so it is imporatnt to use the ovl_numlower() accessor. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19ovl: fix null pointer dereference in ovl_get_acl_rcu()Zhihao Cheng
Following process: P1 P2 path_openat link_path_walk may_lookup inode_permission(rcu) ovl_permission acl_permission_check check_acl get_cached_acl_rcu ovl_get_inode_acl realinode = ovl_inode_real(ovl_inode) drop_cache __dentry_kill(ovl_dentry) iput(ovl_inode) ovl_destroy_inode(ovl_inode) dput(oi->__upperdentry) dentry_kill(upperdentry) dentry_unlink_inode upperdentry->d_inode = NULL ovl_inode_upper upperdentry = ovl_i_dentry_upper(ovl_inode) d_inode(upperdentry) // returns NULL IS_POSIXACL(realinode) // NULL pointer dereference , will trigger an null pointer dereference at realinode: [ 205.472797] BUG: kernel NULL pointer dereference, address: 0000000000000028 [ 205.476701] CPU: 2 PID: 2713 Comm: ls Not tainted 6.3.0-12064-g2edfa098e750-dirty #1216 [ 205.478754] RIP: 0010:do_ovl_get_acl+0x5d/0x300 [ 205.489584] Call Trace: [ 205.489812] <TASK> [ 205.490014] ovl_get_inode_acl+0x26/0x30 [ 205.490466] get_cached_acl_rcu+0x61/0xa0 [ 205.490908] generic_permission+0x1bf/0x4e0 [ 205.491447] ovl_permission+0x79/0x1b0 [ 205.491917] inode_permission+0x15e/0x2c0 [ 205.492425] link_path_walk+0x115/0x550 [ 205.493311] path_lookupat.isra.0+0xb2/0x200 [ 205.493803] filename_lookup+0xda/0x240 [ 205.495747] vfs_fstatat+0x7b/0xb0 Fetch a reproducer in [Link]. Use the helper ovl_i_path_realinode() to get realinode and then do non-nullptr checking. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217404 Fixes: 332f606b32b6 ("ovl: enable RCU'd ->get_acl()") Cc: <stable@vger.kernel.org> # v5.15 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Christian Brauner <brauner@kernel.org> Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19ovl: fix null pointer dereference in ovl_permission()Zhihao Cheng
Following process: P1 P2 path_lookupat link_path_walk inode_permission ovl_permission ovl_i_path_real(inode, &realpath) path->dentry = ovl_i_dentry_upper(inode) drop_cache __dentry_kill(ovl_dentry) iput(ovl_inode) ovl_destroy_inode(ovl_inode) dput(oi->__upperdentry) dentry_kill(upperdentry) dentry_unlink_inode upperdentry->d_inode = NULL realinode = d_inode(realpath.dentry) // return NULL inode_permission(realinode) inode->i_sb // NULL pointer dereference , will trigger an null pointer dereference at realinode: [ 335.664979] BUG: kernel NULL pointer dereference, address: 0000000000000002 [ 335.668032] CPU: 0 PID: 2592 Comm: ls Not tainted 6.3.0 [ 335.669956] RIP: 0010:inode_permission+0x33/0x2c0 [ 335.678939] Call Trace: [ 335.679165] <TASK> [ 335.679371] ovl_permission+0xde/0x320 [ 335.679723] inode_permission+0x15e/0x2c0 [ 335.680090] link_path_walk+0x115/0x550 [ 335.680771] path_lookupat.isra.0+0xb2/0x200 [ 335.681170] filename_lookup+0xda/0x240 [ 335.681922] vfs_statx+0xa6/0x1f0 [ 335.682233] vfs_fstatat+0x7b/0xb0 Fetch a reproducer in [Link]. Use the helper ovl_i_path_realinode() to get realinode and then do non-nullptr checking. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217405 Fixes: 4b7791b2e958 ("ovl: handle idmappings in ovl_permission()") Cc: <stable@vger.kernel.org> # v5.19 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Christian Brauner <brauner@kernel.org> Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-01-19fs: port vfs{g,u}id helpers to mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port privilege checking helpers to mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port inode_owner_or_capable() to mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->permission() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->fileattr_set() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->set_acl() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->get_acl() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->getattr() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->setattr() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-11-03ovl: call posix_acl_release() after error checkingChristian Brauner
The current placement of posix_acl_release() in ovl_set_or_remove_acl() means it can be called on an error pointer instead of actual acls. Fix this by moving the posix_acl_release() call after the error handling. Fixes: 0e641857322f ("ovl: implement set acl method") # mainline only Reported-by: syzbot+3f6ef1c4586bb6fd1f61@syzkaller.appspotmail.com Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-20ovl: use posix acl apiChristian Brauner
Now that posix acls have a proper api us it to copy them. All filesystems that can serve as lower or upper layers for overlayfs have gained support for the new posix acl api in previous patches. So switch all internal overlayfs codepaths for copying posix acls to the new posix acl api. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-20ovl: implement set acl methodChristian Brauner
The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. In order to build a type safe posix api around get and set acl we need all filesystem to implement get and set acl. Now that we have added get and set acl inode operations that allow easy access to the dentry we give overlayfs it's own get and set acl inode operations. The set acl inode operation is duplicates most of the ovl posix acl xattr handler. The main difference being that the set acl inode operation relies on the new posix acl api. Once the vfs has been switched over the custom posix acl xattr handler will be removed completely. Note, until the vfs has been switched to the new posix acl api this patch is a non-functional change. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-20ovl: implement get acl methodChristian Brauner
The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. In order to build a type safe posix api around get and set acl we need all filesystem to implement get and set acl. Now that we have added get and set acl inode operations that allow easy access to the dentry we give overlayfs it's own get and set acl inode operations. Since overlayfs is a stacking filesystem it will use the newly added posix acl api when retrieving posix acls from the relevant layer. Since overlayfs can also be mounted on top of idmapped layers. If idmapped layers are used overlayfs must take the layer's idmapping into account after it retrieved the posix acls from the relevant layer. Note, until the vfs has been switched to the new posix acl api this patch is a non-functional change. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-20fs: rename current get acl methodChristian Brauner
The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. The current inode operation for getting posix acls takes an inode argument but various filesystems (e.g., 9p, cifs, overlayfs) need access to the dentry. In contrast to the ->set_acl() inode operation we cannot simply extend ->get_acl() to take a dentry argument. The ->get_acl() inode operation is called from: acl_permission_check() -> check_acl() -> get_acl() which is part of generic_permission() which in turn is part of inode_permission(). Both generic_permission() and inode_permission() are called in the ->permission() handler of various filesystems (e.g., overlayfs). So simply passing a dentry argument to ->get_acl() would amount to also having to pass a dentry argument to ->permission(). We should avoid this unnecessary change. So instead of extending the existing inode operation rename it from ->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that passes a dentry argument and which filesystems that need access to the dentry can implement instead of ->get_inode_acl(). Filesystems like cifs which allow setting and getting posix acls but not using them for permission checking during lookup can simply not implement ->get_inode_acl(). This is intended to be a non-functional change. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-06Merge tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs constification updates from Al Viro: "whack-a-mole: constifying struct path *" * tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ecryptfs: constify path spufs: constify path nd_jump_link(): constify path audit_init_parent(): constify path __io_setxattr(): constify path do_proc_readlink(): constify path overlayfs: constify path fs/notify: constify path may_linkat(): constify path do_sys_name_to_handle(): constify path ->getprocattr(): attribute name is const char *, TYVM...
2022-09-01overlayfs: constify pathAl Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-17acl: handle idmapped mounts for idmapped filesystemsChristian Brauner
Ensure that POSIX ACLs checking, getting, and setting works correctly for filesystems mountable with a filesystem idmapping ("fs_idmapping") that want to support idmapped mounts ("mnt_idmapping"). Note that no filesystems mountable with an fs_idmapping do yet support idmapped mounts. This is required infrastructure work to unblock this. As we explained in detail in [1] the fs_idmapping is irrelevant for getxattr() and setxattr() when mapping the ACL_{GROUP,USER} {g,u}ids stored in the uapi struct posix_acl_xattr_entry in posix_acl_fix_xattr_{from,to}_user(). But for acl_permission_check() and posix_acl_{g,s}etxattr_idmapped_mnt() the fs_idmapping matters. acl_permission_check(): During lookup POSIX ACLs are retrieved directly via i_op->get_acl() and are returned via the kernel internal struct posix_acl which contains e_{g,u}id members of type k{g,u}id_t that already take the fs_idmapping into acccount. For example, a POSIX ACL stored with u4 on the backing store is mapped to k10000004 in the fs_idmapping. The mnt_idmapping remaps the POSIX ACL to k20000004. In order to do that the fs_idmapping needs to be taken into account but that doesn't happen yet (Again, this is a counterfactual currently as fuse doesn't support idmapped mounts currently. It's just used as a convenient example.): fs_idmapping: u0:k10000000:r65536 mnt_idmapping: u0:v20000000:r65536 ACL_USER: k10000004 acl_permission_check() -> check_acl() -> get_acl() -> i_op->get_acl() == fuse_get_acl() -> posix_acl_from_xattr(u0:k10000000:r65536 /* fs_idmapping */, ...) { k10000004 = make_kuid(u0:k10000000:r65536 /* fs_idmapping */, u4 /* ACL_USER */); } -> posix_acl_permission() { -1 = make_vfsuid(u0:v20000000:r65536 /* mnt_idmapping */, &init_user_ns, k10000004); vfsuid_eq_kuid(-1, k10000004 /* caller_fsuid */) } In order to correctly map from the fs_idmapping into mnt_idmapping we require the relevant fs_idmaping to be passed: acl_permission_check() -> check_acl() -> get_acl() -> i_op->get_acl() == fuse_get_acl() -> posix_acl_from_xattr(u0:k10000000:r65536 /* fs_idmapping */, ...) { k10000004 = make_kuid(u0:k10000000:r65536 /* fs_idmapping */, u4 /* ACL_USER */); } -> posix_acl_permission() { v20000004 = make_vfsuid(u0:v20000000:r65536 /* mnt_idmapping */, u0:k10000000:r65536 /* fs_idmapping */, k10000004); vfsuid_eq_kuid(v20000004, k10000004 /* caller_fsuid */) } The initial_idmapping is only correct for the current situation because all filesystems that currently support idmapped mounts do not support being mounted with an fs_idmapping. Note that ovl_get_acl() is used to retrieve the POSIX ACLs from the relevant lower layer and the lower layer's mnt_idmapping needs to be taken into account and so does the fs_idmapping. See 0c5fd887d2bb ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") for more details. For posix_acl_{g,s}etxattr_idmapped_mnt() it is not as obvious why the fs_idmapping matters as it is for acl_permission_check(). Especially because it doesn't matter for posix_acl_fix_xattr_{from,to}_user() (See [1] for more context.). Because posix_acl_{g,s}etxattr_idmapped_mnt() operate on the uapi struct posix_acl_xattr_entry which contains {g,u}id_t values and thus give the impression that the fs_idmapping is irrelevant as at this point appropriate {g,u}id_t values have seemlingly been generated. As we've stated multiple times this assumption is wrong and in fact the uapi struct posix_acl_xattr_entry is taking idmappings into account depending at what place it is operated on. posix_acl_getxattr_idmapped_mnt() When posix_acl_getxattr_idmapped_mnt() is called the values stored in the uapi struct posix_acl_xattr_entry are mapped according to the fs_idmapping. This happened when they were read from the backing store and then translated from struct posix_acl into the uapi struct posix_acl_xattr_entry during posix_acl_to_xattr(). In other words, the fs_idmapping matters as the values stored as {g,u}id_t in the uapi struct posix_acl_xattr_entry have been generated by it. So we need to take the fs_idmapping into account during make_vfsuid() in posix_acl_getxattr_idmapped_mnt(). posix_acl_setxattr_idmapped_mnt() When posix_acl_setxattr_idmapped_mnt() is called the values stored as {g,u}id_t in uapi struct posix_acl_xattr_entry are intended to be the values that ultimately get turned back into a k{g,u}id_t in posix_acl_from_xattr() (which turns the uapi struct posix_acl_xattr_entry into the kernel internal struct posix_acl). In other words, the fs_idmapping matters as the values stored as {g,u}id_t in the uapi struct posix_acl_xattr_entry are intended to be the values that will be undone in the fs_idmapping when writing to the backing store. So we need to take the fs_idmapping into account during from_vfsuid() in posix_acl_setxattr_idmapped_mnt(). Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Fixes: 0c5fd887d2bb ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") Cc: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Link: https://lore.kernel.org/r/20220816113514.43304-1-brauner@kernel.org
2022-08-08Merge tag 'ovl-update-6.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: "Just a small update" * tag 'ovl-update-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix spelling mistakes ovl: drop WARN_ON() dentry is NULL in ovl_encode_fh() ovl: improve ovl_get_acl() if POSIX ACL support is off ovl: fix some kernel-doc comments ovl: warn if trusted xattr creation fails
2022-07-28ovl: improve ovl_get_acl() if POSIX ACL support is offYang Xu
Provide a proper stub for the !CONFIG_FS_POSIX_ACL case. Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-07-15ovl: handle idmappings in ovl_get_acl()Christian Brauner
During permission checking overlayfs will call ovl_permission() -> generic_permission() -> acl_permission_check() -> check_acl() -> get_acl() -> inode->i_op->get_acl() == ovl_get_acl() -> get_acl() /* on the underlying filesystem */ -> inode->i_op->get_acl() == /*lower filesystem callback */ -> posix_acl_permission() passing through the get_acl() request to the underlying filesystem. Before returning these values to the VFS we need to take the idmapping of the relevant layer into account and translate any ACL_{GROUP,USER} values according to the idmapped mount. We cannot alter the ACLs returned from the relevant layer directly as that would alter the cached values filesystem wide for the lower filesystem. Instead we can clone the ACLs and then apply the relevant idmapping of the layer. This is obviously only relevant when idmapped layers are used. Link: https://lore.kernel.org/r/20220708090134.385160-4-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: linux-unionfs@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-04-28ovl: handle idmappings in ovl_xattr_{g,s}et()Christian Brauner
When retrieving xattrs from the upper or lower layers take the relevant mount's idmapping into account. We rely on the previously introduced ovl_i_path_real() helper to retrieve the relevant path. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-04-28ovl: handle idmappings in ovl_permission()Christian Brauner
Use the previously introduced ovl_i_path_real() helper to retrieve the relevant upper or lower path and take the mount's idmapping into account for the lower layer permission check. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-04-28ovl: use ovl_copy_{real,upper}attr() wrappersChristian Brauner
When copying inode attributes from the upper or lower layer to ovl inodes we need to take the upper or lower layer's mount's idmapping into account. In a lot of places we call ovl_copyattr() only on upper inodes and in some we call it on either upper or lower inodes. Split this into two separate helpers. The first one should only be called on upper inodes and is thus called ovl_copy_upperattr(). The second one can be called on upper or lower inodes. We add ovl_copy_realattr() for this task. The new helper makes use of the previously added ovl_i_path_real() helper. This is needed to support idmapped base layers with overlay. When overlay copies the inode information from an upper or lower layer to the relevant overlay inode it will apply the idmapping of the upper or lower layer when doing so. The ovl inode ownership will thus always correctly reflect the ownership of the idmapped upper or lower layer. All idmapping helpers are nops when no idmapped base layers are used. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-04-28ovl: store lower path in ovl_inodeAmir Goldstein
Create some ovl_i_* helpers to get real path from ovl inode. Instead of just stashing struct inode for the lower layer we stash struct path for the lower layer. The helpers allow to retrieve a struct path for the relevant upper or lower layer. This will be used when retrieving information based on struct inode when copying up inode attributes from upper or lower inodes to ovl inodes and when checking permissions in ovl_permission() in following patches. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-04-28ovl: handle idmappings for layer fileattrsChristian Brauner
Take the upper mount's idmapping into account when setting fileattrs on the upper layer. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-04-28ovl: use ovl_path_getxattr() wrapperChristian Brauner
Add a helper that allows to retrieve ovl xattrs from either lower or upper layers. To stop passing mnt and dentry separately everywhere use struct path which more accurately reflects the tight coupling between mount and dentry in this helper. Swich over all places to pass a path argument that can operate on either upper or lower layers. This is needed to support idmapped base layers with overlayfs. Some helpers are always called with an upper dentry, which is now utilized by these helpers to create the path. Make this usage explicit by renaming the argument to "upperdentry" and by renaming the function as well in some cases. Also add a check in ovl_do_getxattr() to catch misuse of these functions. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-04-28ovl: use ovl_do_notify_change() wrapperChristian Brauner
Introduce ovl_do_notify_change() as a simple wrapper around notify_change() to support idmapped layers. The helper mirrors other ovl_do_*() helpers that operate on the upper layers. When changing ownership of an upper object the intended ownership needs to be mapped according to the upper layer's idmapping. This mapping is the inverse to the mapping applied when copying inode information from an upper layer to the corresponding overlay inode. So e.g., when an upper mount maps files that are stored on-disk as owned by id 1001 to 1000 this means that calling stat on this object from an idmapped mount will report the file as being owned by id 1000. Consequently in order to change ownership of an object in this filesystem so it appears as being owned by id 1000 in the upper idmapped layer it needs to store id 1001 on disk. The mnt mapping helpers take care of this. All idmapping helpers are nops when no idmapped base layers are used. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-04-28ovl: use wrappers to all vfs_*xattr() callsAmir Goldstein
Use helpers ovl_*xattr() to access user/trusted.overlay.* xattrs and use helpers ovl_do_*xattr() to access generic xattrs. This is a preparatory patch for using idmapped base layers with overlay. Note that a few of those places called vfs_*xattr() calls directly to reduce the amount of debug output. But as Miklos pointed out since overlayfs has been stable for quite some time the debug output isn't all that relevant anymore and the additional debug in all locations was actually quite helpful when developing this patch series. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-11-04ovl: fix filattr copy-up failureMiklos Szeredi
This regression can be reproduced with ntfs-3g and overlayfs: mkdir lower upper work overlay dd if=/dev/zero of=ntfs.raw bs=1M count=2 mkntfs -F ntfs.raw mount ntfs.raw lower touch lower/file.txt mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work - overlay mv overlay/file.txt overlay/file2.txt mv fails and (misleadingly) prints mv: cannot move 'overlay/file.txt' to a subdirectory of itself, 'overlay/file2.txt' The reason is that ovl_copy_fileattr() is triggered due to S_NOATIME being set on all inodes (by fuse) regardless of fileattr. ovl_copy_fileattr() tries to retrieve file attributes from lower file, but that fails because filesystem does not support this ioctl (this should fail with ENOTTY, but ntfs-3g return EINVAL instead). This failure is propagated to origial operation (in this case rename) that triggered the copy-up. The fix is to ignore ENOTTY and EINVAL errors from fileattr_get() in copy up. This also requires turning the internal ENOIOCTLCMD into ENOTTY. As a further measure to prevent unnecessary failures, only try the fileattr_get/set on upper if there are any flags to copy up. Side note: a number of filesystems set S_NOATIME (and sometimes other inode flags) irrespective of fileattr flags. This causes unnecessary calls during copy up, which might lead to a performance issue, especially if latency is high. To fix this, the kernel would need to differentiate between the two cases. E.g. introduce SB_NOATIME_UPDATE, a per-sb variant of S_NOATIME. SB_NOATIME doesn't work, because that's interpreted as "filesystem doesn't store an atime attribute" Reported-and-tested-by: Kevin Locke <kevin@kevinlocke.name> Fixes: 72db82115d2b ("ovl: copy up sync/noatime fileattr flags") Cc: <stable@vger.kernel.org> # v5.15 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-18ovl: enable RCU'd ->get_acl()Miklos Szeredi
Overlayfs does not cache ACL's (to avoid double caching). Instead it just calls the underlying filesystem's i_op->get_acl(), which will return the cached value, if possible. In rcu path walk, however, get_cached_acl_rcu() is employed to get the value from the cache, which will fail on overlayfs resulting in dropping out of rcu walk mode. This can result in a big performance hit in certain situations. Fix by calling ->get_acl() with rcu=true in case of ACL_DONT_CACHE (which indicates pass-through) Reported-by: garyhuang <zjh.20052005@163.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-18vfs: add rcu argument to ->get_acl() callbackMiklos Szeredi
Add a rcu argument to the ->get_acl() callback to allow get_cached_acl_rcu() to call the ->get_acl() method in the next patch. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17ovl: update ctime when changing fileattrChengguang Xu
Currently we keep size, mode and times of overlay inode as the same as upper inode, so should update ctime when changing file attribution as well. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17ovl: skip checking lower file's i_writecount on truncateChengguang Xu
It is possible that a directory tree is shared between multiple overlay instances as a lower layer. In this case when one instance executes a file residing on the lower layer, the other instance denies a truncate(2) call on this file. This only happens for truncate(2) and not for open(2) with the O_TRUNC flag. Fix this interference and inconsistency by removing the preliminary i_writecount check before copy-up. This means that unlike on normal filesystems truncate(argv[0]) will now succeed. If this ever causes a regression in a real world use case this needs to be revisited. One way to fix this properly would be to keep a correct i_writecount in the overlay inode, but that is difficult due to memory mapping code only dealing with the real file/inode. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>