summaryrefslogtreecommitdiff
path: root/fs/overlayfs/inode.c
AgeCommit message (Collapse)Author
2018-07-20ovl: Do not do metadata only copy-up for truncate operationVivek Goyal
truncate should copy up full file (and not do metacopy only), otherwise it will be broken. For example, use truncate to increase size of a file so that any read beyong existing size will return null bytes. If we don't copy up full file, then we end up opening lower file and read from it only reads upto the old size (and not new size after truncate). Hence to avoid such situations, copy up data as well when file size changes. So far it was being done by d_real(O_WRONLY) call in truncate() path. Now that patch has been reverted. So force full copy up in ovl_setattr() if size of file is changing. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Add an inode flag OVL_CONST_INOVivek Goyal
Add an ovl_inode flag OVL_CONST_INO. This flag signifies if inode number will remain constant over copy up or not. This flag does not get updated over copy up and remains unmodifed after setting once. Next patch in the series will make use of this flag. It will basically figure out if dentry is of type ORIGIN or not. And this can be derived by this flag. ORIGIN = (upperdentry && ovl_test_flag(OVL_CONST_INO, inode)). Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Store lower data inode in ovl_inodeVivek Goyal
Right now ovl_inode stores inode pointer for lower inode. This helps with quickly getting lower inode given overlay inode (ovl_inode_lower()). Now with metadata only copy-up, we can have metacopy inode in middle layer as well and inode containing data can be different from ->lower. I need to be able to open the real file in ovl_open_realfile() and for that I need to quickly find the lower data inode. Hence store lower data inode also in ovl_inode. Also provide an helper ovl_inode_lowerdata() to access this field. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Fix ovl_getattr() to get number of blocks from lowerVivek Goyal
If an inode has been copied up metadata only, then we need to query the number of blocks from lower and fill up the stat->st_blocks. We need to be careful about races where we are doing stat on one cpu and data copy up is taking place on other cpu. We want to return stat->st_blocks either from lower or stable upper and not something in between. Hence, ovl_has_upperdata() is called first to figure out whether block reporting will take place from lower or upper. We now support metacopy dentries in middle layer. That means number of blocks reporting needs to come from lowest data dentry and this could be different from lower dentry. Hence we end up making a separate vfs_getxattr() call for metacopy dentries to get number of blocks. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Modify ovl_lookup() and friends to lookup metacopy dentryVivek Goyal
This patch modifies ovl_lookup() and friends to lookup metacopy dentries. It also allows for presence of metacopy dentries in lower layer. During lookup, check for presence of OVL_XATTR_METACOPY and if not present, set OVL_UPPERDATA bit in flags. We don't support metacopy feature with nfs_export. So in nfs_export code, we set OVL_UPPERDATA flag set unconditionally if upper inode exists. Do not follow metacopy origin if we find a metacopy only inode and metacopy feature is not enabled for that mount. Like redirect, this can have security implications where an attacker could hand craft upper and try to gain access to file on lower which it should not have to begin with. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Use out_err instead of out_nomemVivek Goyal
Right now we use goto out_nomem which assumes error code is -ENOMEM. But there are other errors returned like -ESTALE as well. So instead of out_nomem, use out_err which will do ERR_PTR(err). That way one can put error code in err and jump to out_err. This just code reorganization and no change of functionality. I am about to add more code and this organization helps laying more code and error paths on top of it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Move the copy up helpers to copy_up.cVivek Goyal
Right now two copy up helpers are in inode.c. Amir suggested it might be better to move these to copy_up.c. There will one more related function which will come in later patch. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Initialize ovl_inode->redirect in ovl_get_inode()Vivek Goyal
ovl_inode->redirect is an inode property and should be initialized in ovl_get_inode() only when we are adding a new inode to cache. If inode is already in cache, it is already initialized and we should not be touching ovl_inode->redirect field. As of now this is not a problem as redirects are used only for directories which don't share inode. But soon I want to use redirects for regular files also and there it can become an issue. Hence, move ->redirect initialization in ovl_get_inode(). Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18ovl: add ovl_fiemap()Miklos Szeredi
Implement stacked fiemap(). Need to split inode operations for regular file (which has fiemap) and special file (which doesn't have fiemap). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18ovl: stack file opsMiklos Szeredi
Implement file operations on a regular overlay file. The underlying file is opened separately and cached in ->private_data. It might be worth making an exception for such files when accounting in nr_file to confirm to userspace expectations. We are only adding a small overhead (248bytes for the struct file) since the real inode and dentry are pinned by overlayfs anyway. This patch doesn't have any effect, since the vfs will use d_real() to find the real underlying file to open. The patch at the end of the series will actually enable this functionality. AV: make it use open_with_fake_path(), don't mess with override_creds SzM: still need to mess with override_creds() until no fs uses current_cred() in their open method. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-18ovl: copy up file size as wellMiklos Szeredi
Copy i_size of the underlying inode to the overlay inode in ovl_copyattr(). This is in preparation for stacking I/O operations on overlay files. This patch shouldn't have any observable effect. Remove stale comment from ovl_setattr() [spotted by Vivek Goyal]. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18Revert "Revert "ovl: get_write_access() in truncate""Miklos Szeredi
This reverts commit 31c3a7069593b072bd57192b63b62f9a7e994e9a. Re-add functionality dealing with i_writecount on truncate to overlayfs. This patch shouldn't have any observable effects, since we just re-assert the writecout that vfs_truncate() already got for us. This is in preparation for moving overlay functionality out of the VFS. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18ovl: copy up timesMiklos Szeredi
Copy up mtime and ctime to overlay inode after times in real object are modified. Be careful not to dirty cachelines when not necessary. This is in preparation for moving overlay functionality out of the VFS. This patch shouldn't have any observable effect. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-06-15Merge tag 'vfs-timespec64' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate()
2018-06-05vfs: change inode times to use struct timespec64Deepa Dinamani
struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) | - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) | - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) | - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) | node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) | - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) | - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) | - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode *node; struct iattr *attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); | fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); | - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode *node; struct iattr *attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode *node; struct iattr *attr; struct kstat *stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); | + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode *node; struct inode *node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr *attrp; struct iattr *attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat *stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ; | node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | stat->xtime = node2->i_xtime1; | stat1.xtime = node2->i_xtime1; | ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; | ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2; | - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); | - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); | node->i_xtime1 = current_time(...); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk>
2018-05-31ovl: use inode_insert5() to hash a newly created inodeAmir Goldstein
Currently, there is a small window where ovl_obtain_alias() can race with ovl_instantiate() and create two different overlay inodes with the same underlying real non-dir non-hardlink inode. The race requires an adversary to guess the file handle of the yet to be created upper inode and decode the guessed file handle after ovl_creat_real(), but before ovl_instantiate(). This race does not affect overlay directory inodes, because those are decoded via ovl_lookup_real() and not with ovl_obtain_alias(). This patch fixes the race, by using inode_insert5() to add a newly created inode to cache. If the newly created inode apears to already exist in cache (hashed by the same real upper inode), we instantiate the dentry with the old inode and drop the new inode, instead of silently not hashing the new inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31ovl: Pass argument to ovl_get_inode() in a structureVivek Goyal
ovl_get_inode() right now has 5 parameters. Soon this patch series will add 2 more and suddenly argument list starts looking too long. Hence pass arguments to ovl_get_inode() in a structure and it looks little cleaner. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: consistent i_ino for non-samefs with xinoAmir Goldstein
When overlay layers are not all on the same fs, but all inode numbers of underlying fs do not use the high 'xino' bits, overlay st_ino values are constant and persistent. In that case, set i_ino value to the same value as st_ino for nfsd readdirplus validator. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: constant st_ino for non-samefs with xinoAmir Goldstein
On 64bit systems, when overlay layers are not all on the same fs, but all inode numbers of underlying fs are not using the high bits, use the high bits to partition the overlay st_ino address space. The high bits hold the fsid (upper fsid is 0). This way overlay inode numbers are unique and all inodes use overlay st_dev. Inode numbers are also persistent for a given layer configuration. Currently, our only indication for available high ino bits is from a filesystem that supports file handles and uses the default encode_fh() operation, which encodes a 32bit inode number. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: allocate anon bdev per unique lower fsAmir Goldstein
Instead of allocating an anonymous bdev per lower layer, allocate one anonymous bdev per every unique lower fs that is different than upper fs. Every unique lower fs is assigned an fsid > 0 and the number of unique lower fs are stored in ofs->numlowerfs. The assigned fsid is stored in the lower layer struct and will be used also for inode number multiplexing. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: factor out ovl_map_dev_ino() helperAmir Goldstein
A helper for ovl_getattr() to map the values of st_dev and st_ino according to constant st_ino rules. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: cleanup ovl_update_time()Miklos Szeredi
No need to mess with an alias, the upperdentry can be retrieved directly from the overlay inode. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: cleanup setting OVL_INDEXVivek Goyal
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: set lower layer st_dev only if setting lower st_inoAmir Goldstein
For broken hardlinks, we do not return lower st_ino, so we should also not return lower pseudo st_dev. Fixes: a0c5ad307ac0 ("ovl: relax same fs constraint for constant st_ino") Cc: <stable@vger.kernel.org> #v4.15 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: set i_ino to the value of st_ino for NFS exportAmir Goldstein
Eddie Horng reported that readdir of an overlayfs directory that was exported via NFSv3 returns entries with d_type set to DT_UNKNOWN. The reason is that while preparing the response for readdirplus, nfsd checks inside encode_entryplus_baggage() that a child dentry's inode number matches the value of d_ino returns by overlayfs readdir iterator. Because the overlayfs inodes use arbitrary inode numbers that are not correlated with the values of st_ino/d_ino, NFSv3 falls back to not encoding d_type. Although this is an allowed behavior, we can fix it for the case of all overlayfs layers on the same underlying filesystem. When NFS export is enabled and d_ino is consistent with st_ino (samefs), set the same value also to i_ino in ovl_fill_inode() for all overlayfs inodes, nfsd readdirplus sanity checks will pass. ovl_fill_inode() may be called from ovl_new_inode(), before real inode was created with ino arg 0. In that case, i_ino will be updated to real upper inode i_ino on ovl_inode_init() or ovl_inode_update(). Reported-by: Eddie Horng <eddiehorng.tw@gmail.com> Tested-by: Eddie Horng <eddiehorng.tw@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Fixes: 8383f1748829 ("ovl: wire up NFS export operations") Cc: <stable@vger.kernel.org> #v4.16 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-16ovl: hash non-dir by lower inode for fsnotifyAmir Goldstein
Commit 31747eda41ef ("ovl: hash directory inodes for fsnotify") fixed an issue of inotify watch on directory that stops getting events after dropping dentry caches. A similar issue exists for non-dir non-upper files, for example: $ mkdir -p lower upper work merged $ touch lower/foo $ mount -t overlay -o lowerdir=lower,workdir=work,upperdir=upper none merged $ inotifywait merged/foo & $ echo 2 > /proc/sys/vm/drop_caches $ cat merged/foo inotifywait doesn't get the OPEN event, because ovl_lookup() called from 'cat' allocates a new overlay inode and does not reuse the watched inode. Fix this by hashing non-dir overlay inodes by lower real inode in the following cases that were not hashed before this change: - A non-upper overlay mount - A lower non-hardlink when index=off A helper ovl_hash_bylower() was added to put all the logic and documentation about which real inode an overlay inode is hashed by into one place. The issue dates back to initial version of overlayfs, but this patch depends on ovl_inode code that was introduced in kernel v4.13. Cc: <stable@vger.kernel.org> #v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: lookup connected ancestor of dir in inode cacheAmir Goldstein
Decoding a dir file handle requires walking backward up to layer root and for lower dir also checking the index to see if any of the parents have been copied up. Lookup overlay ancestor dentry in inode/dentry cache by decoded real parents to shortcut looking up all the way back to layer root. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: hash non-indexed dir by upper inode for NFS exportAmir Goldstein
Non-indexed upper dirs are encoded as upper file handles. When NFS export is enabled, hash non-indexed directory inodes by upper inode, so we can find them in inode cache using the decoded upper inode. When NFS export is disabled, directories are not indexed on copy up, so hash non-indexed directory inodes by origin inode, the same hash key that is used before copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: decode lower file handles of unlinked but open filesAmir Goldstein
Lookup overlay inode in cache by origin inode, so we can decode a file handle of an open file even if the index has a whiteout index entry to mark this overlay inode was unlinked. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: copy up of disconnected dentriesAmir Goldstein
With NFS export, some operations on decoded file handles (e.g. open, link, setattr, xattr_set) may call copy up with a disconnected non-dir. In this case, we will copy up lower inode to index dir without linking it to upper dir. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: do not pass overlay dentry to ovl_get_inode()Amir Goldstein
This is needed for using ovl_get_inode() for decoding file handles for NFS export. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: unbless lower st_ino of unverified originAmir Goldstein
On a malformed overlay, several redirected dirs can point to the same dir on a lower layer. This presents a similar challenge as broken hardlinks, because different objects in the overlay can return the same st_ino/st_dev pair from stat(2). For broken hardlinks, we do not provide constant st_ino on copy up to avoid this inconsistency. When NFS export feature is enabled, apply the same logic to files and directories with unverified lower origin. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-19ovl: hash directory inodes for fsnotifyAmir Goldstein
fsnotify pins a watched directory inode in cache, but if directory dentry is released, new lookup will allocate a new dentry and a new inode. Directory events will be notified on the new inode, while fsnotify listener is watching the old pinned inode. Hash all directory inodes to reuse the pinned inode on lookup. Pure upper dirs are hashes by real upper inode, merge and lower dirs are hashed by real lower inode. The reference to lower inode was being held by the lower dentry object in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry may drop lower inode refcount to zero. Add a refcount on behalf of the overlay inode to prevent that. As a by-product, hashing directory inodes also detects multiple redirected dirs to the same lower dir and uncovered redirected dir target on and returns -ESTALE on lookup. The reported issue dates back to initial version of overlayfs, but this patch depends on ovl_inode code that was introduced in kernel v4.13. Cc: <stable@vger.kernel.org> #v4.13 Reported-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Niklas Cassel <niklas.cassel@axis.com>
2017-11-09ovl: relax same fs constraint for constant st_inoAmir Goldstein
For the case of all layers not on the same fs, return the copy up origin inode st_dev/st_ino for non-dir from stat(2). This guaranties constant st_dev/st_ino for non-dir across copy up. Like the same fs case, st_ino of non-dir is also persistent. If the st_dev/st_ino for copied up object would have been the same as that of the real underlying lower file, running diff on underlying lower file and overlay copied up file would result in diff reporting that the two files are equal when in fact, they may have different content. Therefore, unlike the same fs case, st_dev is not persistent because it uses the unique anonymous bdev allocated for the lower layer. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-09ovl: return anonymous st_dev for lower inodesChandan Rajendra
For non-samefs setup, to make sure that st_dev/st_ino pair is unique across the system, we return a unique anonymous st_dev for stat(2) of lower layer inode. A following patch is going to fix constant st_dev/st_ino across copy up by returning origin st_dev/st_ino for copied up objects. If the st_dev/st_ino for copied up object would have been the same as that of the real underlying lower file, running diff on underlying lower file and overlay copied up file would result in diff reporting that the 2 files are equal when in fact, they may have different content. [amir: simplify ovl_get_pseudo_dev() split from allocate anonymous bdev patch] Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-09ovl: move include of ovl_entry.h into overlayfs.hAmir Goldstein
Most overlayfs c files already explicitly include ovl_entry.h to use overlay entry struct definitions and upcoming changes are going to require even more c files to include this header. All overlayfs c files include overlayfs.h and overlayfs.h itself refers to some structs defined in ovl_entry.h, so it seems more logic to include ovl_entry.h from overlayfs.h than from c files. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-09ovl: no direct iteration for dir with origin xattrAmir Goldstein
If a non-merge dir in an overlay mount has an overlay.origin xattr, it means it was once an upper merge dir, which may contain whiteouts and then the lower dir was removed under it. Do not iterate real dir directly in this case to avoid exposing whiteouts. [SzM] Set OVL_WHITEOUT for all merge directories as well. [amir] A directory that was just copied up does not have the OVL_WHITEOUTS flag. We need to set it to fix merge dir iteration. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-09ovl: lockdep annotate of nested OVL_I(inode)->lockAmir Goldstein
This fixes a lockdep splat when mounting a nested overlayfs. Fixes: a015dafcaf5b ("ovl: use ovl_inode mutex to synchronize...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-24ovl: fix EIO from lookup of non-indexed upperAmir Goldstein
Commit fbaf94ee3cd5 ("ovl: don't set origin on broken lower hardlink") attempt to avoid the condition of non-indexed upper inode with lower hardlink as origin. If this condition is found, lookup returns EIO. The protection of commit mentioned above does not cover the case of lower that is not a hardlink when it is copied up (with either index=off/on) and then lower is hardlinked while overlay is offline. Changes to lower layer while overlayfs is offline should not result in unexpected behavior, so a permanent EIO error after creating a link in lower layer should not be considered as correct behavior. This fix replaces EIO error with success in cases where upper has origin but no index is found, or index is found that does not match upper inode. In those cases, lookup will not fail and the returned overlay inode will be hashed by upper inode instead of by lower origin inode. Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-12ovl: fix false positive ESTALE on lookupAmir Goldstein
Commit b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up origin") verifies that the origin lower inode stored in the overlayfs inode matched the inode of a copy up origin dentry found by lookup. There is a false positive result in that check when lower fs does not support file handles and copy up origin cannot be followed by file handle at lookup time. The false negative happens when finding an overlay inode in cache on a copied up overlay dentry lookup. The overlay inode still 'remembers' the copy up origin inode, but the copy up origin dentry is not available for verification. Relax the check in case copy up origin dentry is not available. Fixes: b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up...") Cc: <stable@vger.kernel.org> # v4.13 Reported-by: Jordi Pujol <jordipujolp@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-27ovl: check snprintf returnMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-20ovl: fix xattr get and set with selinuxMiklos Szeredi
inode_doinit_with_dentry() in SELinux wants to read the upper inode's xattr to get security label, and ovl_xattr_get() calls ovl_dentry_real(), which depends on dentry->d_inode, but d_inode is null and not initialized yet at this point resulting in an Oops. Fix by getting the upperdentry info from the inode directly in this case. Reported-by: Eryu Guan <eguan@redhat.com> Fixes: 09d8b586731b ("ovl: move __upperdentry to ovl_inode") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: cleanup orphan index entriesAmir Goldstein
index entry should live only as long as there are upper or lower hardlinks. Cleanup orphan index entries on mount and when dropping the last overlay inode nlink. When about to cleanup or link up to orphan index and the index inode nlink > 1, admit that something went wrong and adjust overlay nlink to index inode nlink - 1 to prevent it from dropping below zero. This could happen when adding lower hardlinks underneath a mounted overlay and then trying to unlink them. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: persistent overlay inode nlink for indexed inodesAmir Goldstein
With inodes index enabled, an overlay inode nlink counts the union of upper and non-covered lower hardlinks. During the lifetime of a non-pure upper inode, the following nlink modifying operations can happen: 1. Lower hardlink copy up 2. Upper hardlink created, unlinked or renamed over 3. Lower hardlink whiteout or renamed over For the first, copy up case, the union nlink does not change, whether the operation succeeds or fails, but the upper inode nlink may change. Therefore, before copy up, we store the union nlink value relative to the lower inode nlink in the index inode xattr trusted.overlay.nlink. For the second, upper hardlink case, the union nlink should be incremented or decremented IFF the operation succeeds, aligned with nlink change of the upper inode. Therefore, before link/unlink/rename, we store the union nlink value relative to the upper inode nlink in the index inode. For the last, lower cover up case, we simplify things by preceding the whiteout or cover up with copy up. This makes sure that there is an index upper inode where the nlink xattr can be stored before the copied up upper entry is unlink. Return the overlay inode nlinks for indexed upper inodes on stat(2). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: implement index dir copy upAmir Goldstein
Implement a copy up method for non-dir objects using index dir to prevent breaking lower hardlinks on copy up. This method requires that the inodes index dir feature was enabled and that all underlying fs support file handle encoding/decoding. On the first lower hardlink copy up, upper file is created in index dir, named after the hex representation of the lower origin inode file handle. On the second lower hardlink copy up, upper file is found in index dir, by the same lower handle key. On either case, the upper indexed inode is then linked to the copy up upper path. The index entry remains linked for future lower hardlink copy up and for lower to upper inode map, that is needed for exporting overlayfs to NFS. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: hash overlay non-dir inodes by copy up originMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: lookup index entry for copy up originAmir Goldstein
When inodes index feature is enabled, lookup in indexdir for the index entry of lower real inode or copy up origin inode. The index entry name is the hex representation of the lower inode file handle. If the index dentry in negative, then either no lower aliases have been copied up yet, or aliases have been copied up in older kernels and are not indexed. If the index dentry for a copy up origin inode is positive, but points to an inode different than the upper inode, then either the upper inode has been copied up and not indexed or it was indexed, but since then index dir was cleared. Either way, that index cannot be used to indentify the overlay inode. If a positive dentry that matches the upper inode was found, then it is safe to use the copy up origin st_ino for upper hardlinks, because all indexed upper hardlinks are represented by the same overlay inode as the copy up origin. Set the INDEX type flag on an indexed upper dentry. A non-upper dentry may also have a positive index from copy up of another lower hardlink. This situation will be handled by following patches. Index lookup is going to be used to prevent breaking hardlinks on copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: move impure to ovl_inodeMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: move __upperdentry to ovl_inodeMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04ovl: use i_private only as a keyMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>