summaryrefslogtreecommitdiff
path: root/fs/overlayfs/overlayfs.h
AgeCommit message (Collapse)Author
2019-02-13ovl: Do not lose security.capability xattr over metadata file copy-upVivek Goyal
If a file has been copied up metadata only, and later data is copied up, upper loses any security.capability xattr it has (underlying filesystem clears it as upon file write). From a user's point of view, this is just a file copy-up and that should not result in losing security.capability xattr. Hence, before data copy up, save security.capability xattr (if any) and restore it on upper after data copy up is complete. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Fixes: 0c2888749363 ("ovl: A new xattr OVL_XATTR_METACOPY for file on upper") Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-26ovl: abstract ovl_inode lock with a helperAmir Goldstein
The abstraction improves code readabilty (to some). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-26ovl: remove the 'locked' argument of ovl_nlink_{start,end}Amir Goldstein
It just makes the interface strange without adding any significant value. The only case where locked is false and return value is 0 is in ovl_rename() when new is negative, so handle that case explicitly in ovl_rename(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-04ovl: fix format of setxattr debugMiklos Szeredi
Format has a typo: it was meant to be "%.*s", not "%*s". But at some point callers grew nonprintable values as well, so use "%*pE" instead with a maximized length. Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") Cc: <stable@vger.kernel.org> # v4.12
2018-07-20ovl: add helper to force data copy-upVivek Goyal
Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Check redirect on index as wellVivek Goyal
Right now we seem to check redirect only if upperdentry is found. But it is possible that there is no upperdentry but later we found an index. We need to check redirect on index as well and set it in ovl_inode->redirect. Otherwise link code can assume that dentry does not have redirect and place a new one which breaks things. In my testing overlay/033 test started failing in xfstests. Following are the details. For example do following. $ mkdir lower upper work merged - Make lower dir with 4 links. $ echo "foo" > lower/l0.txt $ ln lower/l0.txt lower/l1.txt $ ln lower/l0.txt lower/l2.txt $ ln lower/l0.txt lower/l3.txt - Mount with index on and metacopy on. $ mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work,\ index=on,metacopy=on none merged - Link lower $ ln merged/l0.txt merged/l4.txt (This will metadata copy up of l0.txt and put an absolute redirect /l0.txt) $ echo 2 > /proc/sys/vm/drop/caches $ ls merged/l1.txt (Now l1.txt will be looked up. There is no upper dentry but there is lower dentry and index will be found. We don't check for redirect on index, hence ovl_inode->redirect will be NULL.) - Link Upper $ ln merged/l4.txt merged/l5.txt (Lookup of l4.txt will use inode from l1.txt lookup which is still in cache. It has ovl_inode->redirect NULL, hence link will put a new redirect and replace /l0.txt with /l4.txt - Drop caches. echo 2 > /proc/sys/vm/drop_caches - List l1.txt and it returns -ESTALE $ ls merged/l0.txt (It returns stale because, we found a metacopy of l0.txt in upper and it has redirect l4.txt but there is no file named l4.txt in lower layer. So lower data copy is not found and -ESTALE is returned.) So problem here is that we did not process redirect on index. Check redirect on index as well and then problem is fixed. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Add an inode flag OVL_CONST_INOVivek Goyal
Add an ovl_inode flag OVL_CONST_INO. This flag signifies if inode number will remain constant over copy up or not. This flag does not get updated over copy up and remains unmodifed after setting once. Next patch in the series will make use of this flag. It will basically figure out if dentry is of type ORIGIN or not. And this can be derived by this flag. ORIGIN = (upperdentry && ovl_test_flag(OVL_CONST_INO, inode)). Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Add helper ovl_inode_realdata()Vivek Goyal
Add an helper to retrieve real data inode associated with overlay inode. This helper will ignore all metacopy inodes and will return only the real inode which has data. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Store lower data inode in ovl_inodeVivek Goyal
Right now ovl_inode stores inode pointer for lower inode. This helps with quickly getting lower inode given overlay inode (ovl_inode_lower()). Now with metadata only copy-up, we can have metacopy inode in middle layer as well and inode containing data can be different from ->lower. I need to be able to open the real file in ovl_open_realfile() and for that I need to quickly find the lower data inode. Hence store lower data inode also in ovl_inode. Also provide an helper ovl_inode_lowerdata() to access this field. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Fix ovl_getattr() to get number of blocks from lowerVivek Goyal
If an inode has been copied up metadata only, then we need to query the number of blocks from lower and fill up the stat->st_blocks. We need to be careful about races where we are doing stat on one cpu and data copy up is taking place on other cpu. We want to return stat->st_blocks either from lower or stable upper and not something in between. Hence, ovl_has_upperdata() is called first to figure out whether block reporting will take place from lower or upper. We now support metacopy dentries in middle layer. That means number of blocks reporting needs to come from lowest data dentry and this could be different from lower dentry. Hence we end up making a separate vfs_getxattr() call for metacopy dentries to get number of blocks. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Add helper ovl_dentry_lowerdata() to get lower data dentryVivek Goyal
Now we have the notion of data dentry and metacopy dentry. ovl_dentry_lower() will return uppermost lower dentry, but it could be either data or metacopy dentry. Now we support metacopy dentries in lower layers so it is possible that lowerstack[0] is metacopy dentry while lowerstack[1] is actual data dentry. So add an helper which returns lowest most dentry which is supposed to be data dentry. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Copy up meta inode data from lowest data inodeVivek Goyal
So far lower could not be a meta inode. So whenever it was time to copy up data of a meta inode, we could copy it up from top most lower dentry. But now lower itself can be a metacopy inode. That means data copy up needs to take place from a data inode in metacopy inode chain. Find lower data inode in the chain and use that for data copy up. Introduced a helper called ovl_path_lowerdata() to find the lower data inode chain. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Modify ovl_lookup() and friends to lookup metacopy dentryVivek Goyal
This patch modifies ovl_lookup() and friends to lookup metacopy dentries. It also allows for presence of metacopy dentries in lower layer. During lookup, check for presence of OVL_XATTR_METACOPY and if not present, set OVL_UPPERDATA bit in flags. We don't support metacopy feature with nfs_export. So in nfs_export code, we set OVL_UPPERDATA flag set unconditionally if upper inode exists. Do not follow metacopy origin if we find a metacopy only inode and metacopy feature is not enabled for that mount. Like redirect, this can have security implications where an attacker could hand craft upper and try to gain access to file on lower which it should not have to begin with. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: A new xattr OVL_XATTR_METACOPY for file on upperVivek Goyal
Now we will have the capability to have upper inodes which might be only metadata copy up and data is still on lower inode. So add a new xattr OVL_XATTR_METACOPY to distinguish between two cases. Presence of OVL_XATTR_METACOPY reflects that file has been copied up metadata only and and data will be copied up later from lower origin. So this xattr is set when a metadata copy takes place and cleared when data copy takes place. We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects whether ovl inode has data or not (as opposed to metadata only copy up). If a file is copied up metadata only and later when same file is opened for WRITE, then data copy up takes place. We copy up data, remove METACOPY xattr and then set the UPPERDATA flag in ovl_inode->flags. While all these operations happen with oi->lock held, read side of oi->flags can be lockless. That is another thread on another cpu can check if UPPERDATA flag is set or not. So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if another cpu sees UPPERDATA flag set, then it should be guaranteed that effects of data copy up and remove xattr operations are also visible. For example. CPU1 CPU2 ovl_open() acquire(oi->lock) ovl_open_maybe_copy_up() ovl_copy_up_data() open_open_need_copy_up() vfs_removexattr() ovl_already_copied_up() ovl_dentry_needs_data_copy_up() ovl_set_flag(OVL_UPPERDATA) ovl_test_flag(OVL_UPPERDATA) release(oi->lock) Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if CPU1 perceives the effects of setting UPPERDATA flag but not the effects of preceding operations (ex. upper that is not fully copied up), it will be a problem. Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation and smp_rmb() on UPPERDATA flag test operation. May be some other lock or barrier is already covering it. But I am not sure what that is and is it obvious enough that we will not break it in future. So hence trying to be safe here and introducing barriers explicitly for UPPERDATA flag/bit. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Add helper ovl_already_copied_up()Vivek Goyal
There are couple of places where we need to know if file is already copied up (in lockless manner). Right now its open coded and there are only two conditions to check. Soon this patch series will introduce another condition to check and Amir wants to introduce one more. So introduce a helper instead to check this so that code is easier to read. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Move the copy up helpers to copy_up.cVivek Goyal
Right now two copy up helpers are in inode.c. Amir suggested it might be better to move these to copy_up.c. There will one more related function which will come in later patch. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Initialize ovl_inode->redirect in ovl_get_inode()Vivek Goyal
ovl_inode->redirect is an inode property and should be initialized in ovl_get_inode() only when we are adding a new inode to cache. If inode is already in cache, it is already initialized and we should not be touching ovl_inode->redirect field. As of now this is not a problem as redirects are used only for directories which don't share inode. But soon I want to use redirects for regular files also and there it can become an issue. Hence, move ->redirect initialization in ovl_get_inode(). Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18ovl: stack file opsMiklos Szeredi
Implement file operations on a regular overlay file. The underlying file is opened separately and cached in ->private_data. It might be worth making an exception for such files when accounting in nr_file to confirm to userspace expectations. We are only adding a small overhead (248bytes for the struct file) since the real inode and dentry are pinned by overlayfs anyway. This patch doesn't have any effect, since the vfs will use d_real() to find the real underlying file to open. The patch at the end of the series will actually enable this functionality. AV: make it use open_with_fake_path(), don't mess with override_creds SzM: still need to mess with override_creds() until no fs uses current_cred() in their open method. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-18ovl: copy up file size as wellMiklos Szeredi
Copy i_size of the underlying inode to the overlay inode in ovl_copyattr(). This is in preparation for stacking I/O operations on overlay files. This patch shouldn't have any observable effect. Remove stale comment from ovl_setattr() [spotted by Vivek Goyal]. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18ovl: copy up inode flagsMiklos Szeredi
On inode creation copy certain inode flags from the underlying real inode to the overlay inode. This is in preparation for moving overlay functionality out of the VFS. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18ovl: copy up timesMiklos Szeredi
Copy up mtime and ctime to overlay inode after times in real object are modified. Be careful not to dirty cachelines when not necessary. This is in preparation for moving overlay functionality out of the VFS. This patch shouldn't have any observable effect. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-06-15Merge tag 'vfs-timespec64' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate()
2018-06-05vfs: change inode times to use struct timespec64Deepa Dinamani
struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) | - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) | - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) | - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) | node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) | - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) | - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) | - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode *node; struct iattr *attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); | fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); | - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode *node; struct iattr *attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode *node; struct iattr *attr; struct kstat *stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); | + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode *node; struct inode *node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr *attrp; struct iattr *attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat *stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ; | node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | stat->xtime = node2->i_xtime1; | stat1.xtime = node2->i_xtime1; | ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; | ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2; | - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); | - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); | node->i_xtime1 = current_time(...); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk>
2018-05-31ovl: use inode_insert5() to hash a newly created inodeAmir Goldstein
Currently, there is a small window where ovl_obtain_alias() can race with ovl_instantiate() and create two different overlay inodes with the same underlying real non-dir non-hardlink inode. The race requires an adversary to guess the file handle of the yet to be created upper inode and decode the guessed file handle after ovl_creat_real(), but before ovl_instantiate(). This race does not affect overlay directory inodes, because those are decoded via ovl_lookup_real() and not with ovl_obtain_alias(). This patch fixes the race, by using inode_insert5() to add a newly created inode to cache. If the newly created inode apears to already exist in cache (hashed by the same real upper inode), we instantiate the dentry with the old inode and drop the new inode, instead of silently not hashing the new inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31ovl: Pass argument to ovl_get_inode() in a structureVivek Goyal
ovl_get_inode() right now has 5 parameters. Soon this patch series will add 2 more and suddenly argument list starts looking too long. Hence pass arguments to ovl_get_inode() in a structure and it looks little cleaner. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31ovl: create helper ovl_create_temp()Amir Goldstein
Also used ovl_create_temp() in ovl_create_index() instead of calling ovl_do_mkdir() directly, so now all callers of ovl_do_mkdir() are routed through ovl_create_real(), which paves the way for Al's fix for non-hashed result from vfs_mkdir(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31ovl: return dentry from ovl_create_real()Miklos Szeredi
Al Viro suggested to simplify callers of ovl_create_real() by returning the created dentry (or ERR_PTR) from ovl_create_real(). Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31ovl: struct cattr cleanupsAmir Goldstein
* Rename to ovl_cattr * Fold ovl_create_real() hardlink argument into struct ovl_cattr * Create macro OVL_CATTR() to initialize struct ovl_cattr from mode Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31ovl: strip debug argument from ovl_do_ helpersAmir Goldstein
It did not prove to be useful. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: consistent i_ino for non-samefs with xinoAmir Goldstein
When overlay layers are not all on the same fs, but all inode numbers of underlying fs do not use the high 'xino' bits, overlay st_ino values are constant and persistent. In that case, set i_ino value to the same value as st_ino for nfsd readdirplus validator. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: constant st_ino for non-samefs with xinoAmir Goldstein
On 64bit systems, when overlay layers are not all on the same fs, but all inode numbers of underlying fs are not using the high bits, use the high bits to partition the overlay st_ino address space. The high bits hold the fsid (upper fsid is 0). This way overlay inode numbers are unique and all inodes use overlay st_dev. Inode numbers are also persistent for a given layer configuration. Currently, our only indication for available high ino bits is from a filesystem that supports file handles and uses the default encode_fh() operation, which encodes a 32bit inode number. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: factor out ovl_map_dev_ino() helperAmir Goldstein
A helper for ovl_getattr() to map the values of st_dev and st_ino according to constant st_ino rules. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: do not try to reconnect a disconnected origin dentryAmir Goldstein
On lookup of non directory, we try to decode the origin file handle stored in upper inode. The origin file handle is supposed to be decoded to a disconnected non-dir dentry, which is fine, because we only need the lower inode of a copy up origin. However, if the origin file handle somehow turns out to be a directory we pay the expensive cost of reconnecting the directory dentry, only to get a mismatch file type and drop the dentry. Optimize this case by explicitly opting out of reconnecting the dentry. Opting-out of reconnect is done by passing a NULL acceptable callback to exportfs_decode_fh(). While the case described above is a strange corner case that does not really need to be optimized, the API added for this optimization will be used by a following patch to optimize a more common case of decoding an overlayfs file handle. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: disambiguate ovl_encode_fh()Amir Goldstein
Rename ovl_encode_fh() to ovl_encode_real_fh() to differentiate from the exportfs function ovl_encode_inode_fh() and change the latter to ovl_encode_fh() to match the exportfs method name. Rename ovl_decode_fh() to ovl_decode_real_fh() for consistency. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-16ovl: check lower ancestry on encode of lower dir file handleAmir Goldstein
This change relaxes copy up on encode of merge dir with lower layer > 1 and handles the case of encoding a merge dir with lower layer 1, where an ancestor is a non-indexed merge dir. In that case, decode of the lower file handle will not have been possible if the non-indexed ancestor is redirected before or after encode. Before encoding a non-upper directory file handle from real layer N, we need to check if it will be possible to reconnect an overlay dentry from the real lower decoded dentry. This is done by following the overlay ancestry up to a "layer N connected" ancestor and verifying that all parents along the way are "layer N connectable". If an ancestor that is NOT "layer N connectable" is found, we need to copy up an ancestor, which is "layer N connectable", thus making that ancestor "layer N connected". For example: layer 1: /a layer 2: /a/b/c The overlay dentry /a is NOT "layer 2 connectable", because if dir /a is copied up and renamed, upper dir /a will be indexed by lower dir /a from layer 1. The dir /a from layer 2 will never be indexed, so the algorithm in ovl_lookup_real_ancestor() (*) will not be able to lookup a connected overlay dentry from the connected lower dentry /a/b/c. To avoid this problem on decode time, we need to copy up an ancestor of /a/b/c, which is "layer 2 connectable", on encode time. That ancestor is /a/b. After copy up (and index) of /a/b, it will become "layer 2 connected" and when the time comes to decode the file handle from lower dentry /a/b/c, ovl_lookup_real_ancestor() will find the indexed ancestor /a/b and decoding a connected overlay dentry will be accomplished. (*) the algorithm in ovl_lookup_real_ancestor() can be improved to lookup an entry /a in the lower layers above layer N and find the indexed dir /a from layer 1. If that improvement is made, then the check for "layer N connected" will need to verify there are no redirects in lower layers above layer N. In the example above, /a will be "layer 2 connectable". However, if layer 2 dir /a is a target of a layer 1 redirect, then /a will NOT be "layer 2 connectable": layer 1: /A (redirect = /a) layer 2: /a/b/c Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: lookup indexed ancestor of lower dirAmir Goldstein
ovl_lookup_real() in lower layer walks back lower parents to find the topmost indexed parent. If an indexed ancestor is found before reaching lower layer root, ovl_lookup_real() is called recursively with upper layer to walk back from indexed upper to the topmost connected/hashed upper parent (or up to root). ovl_lookup_real() in upper layer then walks forward to connect the topmost upper overlay dir dentry and ovl_lookup_real() in lower layer continues to walk forward to connect the decoded lower overlay dir dentry. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: lookup connected ancestor of dir in inode cacheAmir Goldstein
Decoding a dir file handle requires walking backward up to layer root and for lower dir also checking the index to see if any of the parents have been copied up. Lookup overlay ancestor dentry in inode/dentry cache by decoded real parents to shortcut looking up all the way back to layer root. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: decode indexed dir file handlesAmir Goldstein
Decoding an indexed dir file handle is done by looking up the file handle in index dir by name and then decoding the upper dir from the index origin file handle. The decoded upper path is used to lookup an overlay dentry of the same path. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: decode lower file handles of unlinked but open filesAmir Goldstein
Lookup overlay inode in cache by origin inode, so we can decode a file handle of an open file even if the index has a whiteout index entry to mark this overlay inode was unlinked. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: decode lower non-dir file handlesAmir Goldstein
Decoding a lower non-dir file handle is done by decoding the lower dentry from underlying lower fs, finding or allocating an overlay inode that is hashed by the real lower inode and instantiating an overlay dentry with that inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: decode pure upper file handlesAmir Goldstein
Decoding an upper file handle is done by decoding the upper dentry from underlying upper fs, finding or allocating an overlay inode that is hashed by the real upper inode and instantiating an overlay dentry with that inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: encode pure upper file handlesAmir Goldstein
Encode overlay file handles as struct ovl_fh containing the file handle encoding of the real upper inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: store 'has_upper' and 'opaque' as bit flagsAmir Goldstein
We need to make some room in struct ovl_entry to store information about redirected ancestors for NFS export, so cram two booleans as bit flags. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: do not pass overlay dentry to ovl_get_inode()Amir Goldstein
This is needed for using ovl_get_inode() for decoding file handles for NFS export. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: factor out ovl_get_index_fh() helperAmir Goldstein
The helper is needed to lookup an index by file handle for NFS export. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: whiteout index when union nlink drops to zeroAmir Goldstein
With NFS export feature enabled, when overlay inode nlink drops to zero, instead of removing the index entry, replace it with a whiteout index entry. This is needed for NFS export in order to prevent future open by handle from opening the lower file directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: create ovl_need_index() helperAmir Goldstein
The helper determines which lower file needs to be indexed on copy up and before nlink changes. For index=on, the helper evaluates to true for lower hardlinks. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: use directory index entries for consistency verificationAmir Goldstein
A directory index is a directory type entry in index dir with a "trusted.overlay.upper" xattr containing an encoded ovl_fh of the merge directory upper dir inode. On lookup of non-dir files, lower file is followed by origin file handle. On lookup of dir entries, lower dir is found by name and then compared to origin file handle. We only trust dir index if we verified that lower dir matches origin file handle, otherwise index may be inconsistent and we ignore it. If we find an indexed non-upper dir or an indexed merged dir, whose index 'upper' xattr points to a different upper dir, that means that the lower directory may be also referenced by another upper dir via redirect, so we fail the lookup on inconsistency error. To be consistent with directory index entries format, the association of index dir to upper root dir, that was stored by older kernels in "trusted.overlay.origin" xattr is now stored in "trusted.overlay.upper" xattr. This also serves as an indication that overlay was mounted with a kernel that support index directory entries. For backward compatibility, if an 'origin' xattr exists on the index dir we also verify it on mount. Directory index entries are going to be used for NFS export. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: add support for "nfs_export" configurationAmir Goldstein
Introduce the "nfs_export" config, module and mount options. The NFS export feature depends on the "index" feature and enables two implicit overlayfs features: "index_all" and "verify_lower". The "index_all" feature creates an index on copy up of every file and directory. The "verify_lower" feature uses the full index to detect overlay filesystems inconsistencies on lookup, like redirect from multiple upper dirs to the same lower dir. NFS export can be enabled for non-upper mount with no index. However, because lower layer redirects cannot be verified with the index, enabling NFS export support on an overlay with no upper layer requires turning off redirect follow (e.g. "redirect_dir=nofollow"). The full index may incur some overhead on mount time, especially when verifying that lower directory file handles are not stale. NFS export support, full index and consistency verification will be implemented by following patches. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: generalize ovl_verify_origin() and helpersAmir Goldstein
Remove the "origin" language from the functions that handle set, get and verify of "origin" xattr and pass the xattr name as an argument. The same helpers are going to be used for NFS export to get, get and verify the "upper" xattr for directory index entries. ovl_verify_origin() is now a helper used only to verify non upper file handle stored in "origin" xattr of upper inode. The upper root dir file handle is still stored in "origin" xattr on the index dir for backward compatibility. This is going to be changed by the patch that adds directory index entries support. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>