summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-03-19f2fs: show mounted timeJaegeuk Kim
Let's show mounted time. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: Use scnprintf() for avoiding potential buffer overflowTakashi Iwai
Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19Merge tag '5.6-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three small smb3 fixes, two for stable" * tag '5.6-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: CIFS: fiemap: do not return EINVAL if get nothing CIFS: Increment num_remote_opens stats counter even in case of smb2_query_dir_first cifs: potential unintitliazed error code in cifs_getattr()
2020-03-19xfs: remove the di_version field from struct icdinodeChristoph Hellwig
We know the version is 3 if on a v5 file system. For earlier file systems formats we always upgrade the remaining v1 inodes to v2 and thus only use v2 inodes. Use the xfs_sb_version_has_large_dinode helper to check if we deal with small or large dinodes, and thus remove the need for the di_version field in struct icdinode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19xfs: simplify a check in xfs_ioctl_setattr_check_cowextsizeChristoph Hellwig
Only v5 file systems can have the reflink feature, and those will always use the large dinode format. Remove the extra check for the inode version. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19xfs: simplify di_flags2 inheritance in xfs_iallocChristoph Hellwig
di_flags2 is initialized to zero for v4 and earlier file systems. This means di_flags2 can only be non-zero for a v5 file systems, in which case both the parent and child inodes can store the field. Remove the extra di_version check, and also remove the rather pointless local di_flags2 variable while at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19xfs: only check the superblock version for dinode size calculationChristoph Hellwig
The size of the dinode structure is only dependent on the file system version, so instead of checking the individual inode version just use the newly added xfs_sb_version_has_large_dinode helper, and simplify various calling conventions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19xfs: add a new xfs_sb_version_has_v3inode helperChristoph Hellwig
Add a new wrapper to check if a file system supports the v3 inode format with a larger dinode core. Previously we used xfs_sb_version_hascrc for that, which is technically correct but a little confusing to read. Also move xfs_dinode_good_version next to xfs_sb_version_has_v3inode so that we have one place that documents the superblock version to inode version relationship. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19nfsd: fsnotify on rmdir under nfsd/clients/J. Bruce Fields
Userspace should be able to monitor nfsd/clients/ to see when clients come and go, but we're failing to send fsnotify events. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-19nfsd4: kill warnings on testing stateids with mismatched clientidsJ. Bruce Fields
It's normal for a client to test a stateid from a previous instance, e.g. after a network partition. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-18locks: reinstate locks_delete_block optimizationLinus Torvalds
There is measurable performance impact in some synthetic tests due to commit 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter). Fix the race condition instead by clearing the fl_blocker pointer after the wake_up, using explicit acquire/release semantics. This does mean that we can no longer use the clearing of fl_blocker as the wait condition, so switch the waiters over to checking whether the fl_blocked_member list_head is empty. Reviewed-by: yangerkun <yangerkun@huawei.com> Reviewed-by: NeilBrown <neilb@suse.de> Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter) Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-18xfs: fix unmount hang and memory leak on shutdown during quotaoffBrian Foster
AIL removal of the quotaoff start intent and free of both quotaoff intents is currently limited to the ->iop_committed() handler of the end intent. This executes when the end intent is committed to the on-disk log and marks the completion of the operation. The problem with this is it assumes the success of the operation. If a shutdown or other error occurs during the quotaoff, it's possible for the quotaoff task to exit without removing the start intent from the AIL. This results in an unmount hang as the AIL cannot be emptied. Further, no other codepath frees the intents and so this is also a memory leak vector. First, update the high level quotaoff error path to directly remove and free the quotaoff start intent if it still exists in the AIL at the time of the error. Next, update both of the start and end quotaoff intents with an ->iop_release() callback to properly handle transaction abort. This means that If the quotaoff start transaction aborts, it frees the start intent in the transaction commit path. If the filesystem shuts down before the end transaction allocates, the quotaoff sequence removes and frees the start intent. If the end transaction aborts, it removes the start intent and frees both. This ensures that a shutdown does not result in a hung unmount and that memory is not leaked regardless of when a quotaoff error occurs. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-03-18xfs: factor out quotaoff intent AIL removal and memory freeBrian Foster
AIL removal of the quotaoff start intent and free of both intents is hardcoded to the ->iop_committed() handler of the end intent. Factor out the start intent handling code so it can be used in a future patch to properly handle quotaoff errors. Use xfs_trans_ail_remove() instead of the _delete() variant to acquire the AIL lock and also handle cases where an intent might not reside in the AIL at the time of a failure. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-03-18xfs: add support for rmap btree staging cursorsDarrick J. Wong
Add support for btree staging cursors for the rmap btrees. This is needed both for online repair and also to convert xfs_repair to use btree bulk loading. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18xfs: add support for refcount btree staging cursorsDarrick J. Wong
Add support for btree staging cursors for the refcount btrees. This is needed both for online repair and also to convert xfs_repair to use btree bulk loading. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18xfs: add support for inode btree staging cursorsDarrick J. Wong
Add support for btree staging cursors for the inode btrees. This is needed both for online repair and also to convert xfs_repair to use btree bulk loading. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18xfs: add support for free space btree staging cursorsDarrick J. Wong
Add support for btree staging cursors for the free space btrees. This is needed both for online repair and also to convert xfs_repair to use btree bulk loading. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18xfs: support bulk loading of staged btreesDarrick J. Wong
Add a new btree function that enables us to bulk load a btree cursor. This will be used by the upcoming online repair patches to generate new btrees. This avoids the programmatic inefficiency of calling xfs_btree_insert in a loop (which generates a lot of log traffic) in favor of stamping out new btree blocks with ordered buffers, and then committing both the new root and scheduling the removal of the old btree blocks in a single transaction commit. The design of this new generic code is based off the btree rebuilding code in xfs_repair's phase 5 code, with the explicit goal of enabling us to share that code between scrub and repair. It has the additional feature of being able to control btree block loading factors. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18xfs: introduce fake roots for inode-rooted btreesDarrick J. Wong
Create an in-core fake root for inode-rooted btree types so that callers can generate a whole new btree using the upcoming btree bulk load function without making the new tree accessible from the rest of the filesystem. It is up to the individual btree type to provide a function to create a staged cursor (presumably with the appropriate callouts to update the fakeroot) and then commit the staged root back into the filesystem. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18xfs: introduce fake roots for ag-rooted btreesDarrick J. Wong
Create an in-core fake root for AG-rooted btree types so that callers can generate a whole new btree using the upcoming btree bulk load function without making the new tree accessible from the rest of the filesystem. It is up to the individual btree type to provide a function to create a staged cursor (presumably with the appropriate callouts to update the fakeroot) and then commit the staged root back into the filesystem. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18xfs: replace open-coded bitmap weight logicDarrick J. Wong
Add a xbitmap_hweight helper function so that we can get rid of the open-coded loop. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18xfs: rename xfs_bitmap to xbitmapDarrick J. Wong
Shorten the name of xfs_bitmap to xbitmap since the scrub bitmap has nothing to do with the libxfs bitmap. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18xfs: xrep_reap_extents should not destroy the bitmapDarrick J. Wong
Remove the xfs_bitmap_destroy call from the end of xrep_reap_extents because this sort of violates our rule that the function initializing a structure should destroy it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18iomap: fix comments in iomap_dio_rwyangerkun
Double 'three' exists in the comments of iomap_dio_rw. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-18block: fix a device invalidation regressionChristoph Hellwig
Historically we only set the capacity to zero for devices that support partitions (independ of actually having partitions created). Doing that is rather inconsistent, but changing it broke legacy udisks polling for legacy ide-cdrom devices. Use the crude a crude check for devices that either are non-removable or partitionable to get the sane behavior for most device while not breaking userspace for this particular setup. Fixes: a1548b674403 ("block: move rescan_partitions to fs/block_dev.c") Reported-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-18debugfs: remove return value of debugfs_create_file_size()Greg Kroah-Hartman
No one checks the return value of debugfs_create_file_size, as it's not needed, so make the return value void, so that no one tries to do so in the future. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20200309163640.237984-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18debugfs: Check module state before warning in {full/open}_proxy_open()Taehee Yoo
When the module is being removed, the module state is set to MODULE_STATE_GOING. At this point, try_module_get() fails. And when {full/open}_proxy_open() is being called, it calls try_module_get() to try to hold module reference count. If it fails, it warns about the possibility of debugfs file leak. If {full/open}_proxy_open() is called while the module is being removed, it fails to hold the module. So, It warns about debugfs file leak. But it is not the debugfs file leak case. So, this patch just adds module state checking routine in the {full/open}_proxy_open(). Test commands: #SHELL1 while : do modprobe netdevsim echo 1 > /sys/bus/netdevsim/new_device modprobe -rv netdevsim done #SHELL2 while : do cat /sys/kernel/debug/netdevsim/netdevsim1/ports/0/ipsec done Splat looks like: [ 298.766738][T14664] debugfs file owner did not clean up at exit: ipsec [ 298.766766][T14664] WARNING: CPU: 2 PID: 14664 at fs/debugfs/file.c:312 full_proxy_open+0x10f/0x650 [ 298.768595][T14664] Modules linked in: netdevsim(-) openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 n][ 298.771343][T14664] CPU: 2 PID: 14664 Comm: cat Tainted: G W 5.5.0+ #1 [ 298.772373][T14664] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 298.773545][T14664] RIP: 0010:full_proxy_open+0x10f/0x650 [ 298.774247][T14664] Code: 48 c1 ea 03 80 3c 02 00 0f 85 c1 04 00 00 49 8b 3c 24 e8 e4 b5 78 ff 84 c0 75 2d 4c 89 ee 48 [ 298.776782][T14664] RSP: 0018:ffff88805b7df9b8 EFLAGS: 00010282[ 298.777583][T14664] RAX: dffffc0000000008 RBX: ffff8880511725c0 RCX: 0000000000000000 [ 298.778610][T14664] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8880540c5c14 [ 298.779637][T14664] RBP: 0000000000000000 R08: fffffbfff15235ad R09: 0000000000000000 [ 298.780664][T14664] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffffc06b5000 [ 298.781702][T14664] R13: ffff88804c234a88 R14: ffff88804c22dd00 R15: ffffffff8a1b5660 [ 298.782722][T14664] FS: 00007fafa13a8540(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000 [ 298.783845][T14664] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 298.784672][T14664] CR2: 00007fafa0e9cd10 CR3: 000000004b286005 CR4: 00000000000606e0 [ 298.785739][T14664] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 298.786769][T14664] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 298.787785][T14664] Call Trace: [ 298.788237][T14664] do_dentry_open+0x63c/0xf50 [ 298.788872][T14664] ? open_proxy_open+0x270/0x270 [ 298.789524][T14664] ? __x64_sys_fchdir+0x180/0x180 [ 298.790169][T14664] ? inode_permission+0x65/0x390 [ 298.790832][T14664] path_openat+0xc45/0x2680 [ 298.791425][T14664] ? save_stack+0x69/0x80 [ 298.791988][T14664] ? save_stack+0x19/0x80 [ 298.792544][T14664] ? path_mountpoint+0x2e0/0x2e0 [ 298.793233][T14664] ? check_chain_key+0x236/0x5d0 [ 298.793910][T14664] ? sched_clock_cpu+0x18/0x170 [ 298.794527][T14664] ? find_held_lock+0x39/0x1d0 [ 298.795153][T14664] do_filp_open+0x16a/0x260 [ ... ] Fixes: 9fd4dcece43a ("debugfs: prevent access to possibly dead file_operations at file open") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20200218043150.29447-1-ap420073@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-17nfs: Fix up documentation in nfs_follow_referral() and nfs_do_submount()Trond Myklebust
Fallout from the mount patches. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-17CIFS: fiemap: do not return EINVAL if get nothingMurphy Zhou
If we call fiemap on a truncated file with none blocks allocated, it makes sense we get nothing from this call. No output means no blocks have been counted, but the call succeeded. It's a valid response. Simple example reproducer: xfs_io -f 'truncate 2M' -c 'fiemap -v' /cifssch/testfile xfs_io: ioctl(FS_IOC_FIEMAP) ["/cifssch/testfile"]: Invalid argument Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org>
2020-03-17CIFS: Increment num_remote_opens stats counter even in case of ↵Shyam Prasad N
smb2_query_dir_first The num_remote_opens counter keeps track of the number of open files which must be maintained by the server at any point. This is a per-tree-connect counter, and the value of this counter gets displayed in the /proc/fs/cifs/Stats output as a following... Open files: 0 total (local), 1 open on server ^^^^^^^^^^^^^^^^ As a thumb-rule, we want to increment this counter for each open/create that we successfully execute on the server. Similarly, we should decrement the counter when we successfully execute a close. In this case, an increment was being missed in case of smb2_query_dir_first, in case of successful open. As a result, we would underflow the counter and we could even see the counter go to negative after sufficient smb2_query_dir_first calls. I tested the stats counter for a bunch of filesystem operations with the fix. And it looks like the counter looks correct to me. I also check if we missed the increments and decrements elsewhere. It does not seem so. Few other cases where an open is done and we don't increment the counter are the compound calls where the corresponding close is also sent in the request. Signed-off-by: Shyam Prasad N <nspmangalore@gmail.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-03-17cifs: potential unintitliazed error code in cifs_getattr()Dan Carpenter
Smatch complains that "rc" could be uninitialized. fs/cifs/inode.c:2206 cifs_getattr() error: uninitialized symbol 'rc'. Changing it to "return 0;" improves readability as well. Fixes: cc1baf98c8f6 ("cifs: do not ignore the SYNC flags in getattr") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-03-17xfs: fix incorrect test in xfs_alloc_ag_vextent_lastblockDarrick J. Wong
When I lifted the code in xfs_alloc_ag_vextent_lastblock out of a loop, I forgot to convert all the accesses to len to be pointer dereferences. Coverity-id: 1457918 Fixes: 5113f8ec3753ed ("xfs: clean up weird while loop in xfs_alloc_ag_vextent_near") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-03-17ovl: fix a typo in commentChengguang Xu
Fix a typo in comment. (annonate -> annotate) Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Fixes: cbe7fba8edfc ("ovl: make sure that real fid is 32bit aligned in memory") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: ovl_obtain_alias(): don't call d_instantiate_anon() for oldAl Viro
The situation is the same as for __d_obtain_alias() (which is what that thing is parallel to) - if we find a preexisting alias, we want to grab it, drop the inode and return the alias we'd found. The only thing d_instantiate_anon() does compared to that is spurious security_d_instiate() that has already been done to that dentry with exact same arguments. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: strict upper fs requirements for remote upper fsAmir Goldstein
Overlayfs works sub-optimally with upper fs that has no xattr/d_type/ RENAME_WHITEOUT support. We should basically deprecate support for those filesystems, but so far, we only issue a warning and don't fail the mount for the sake of backward compat. Some features are already being disabled with no xattr support. For newly supported remote upper fs, we do not need to worry about backward compatibility, so we can fail the mount if upper fs is a sub-optimal filesystem. This reduces the in-tree remote filesystems supported as upper to just FUSE, for which the remote upper fs support was added. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: check if upper fs supports RENAME_WHITEOUTAmir Goldstein
As with other required upper fs features, we only warn if support is missing to avoid breaking existing sub-optimal setups. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: allow remote upperMiklos Szeredi
No reason to prevent upper layer being a remote filesystem. Do the revalidation in that case, just as we already do for lower layers. This lets virtiofs be used as upper layer, which appears to be a real use case. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: decide if revalidate needed on a per-dentry basisMiklos Szeredi
Allow completely skipping ->revalidate() on a per-dentry basis, in case the underlying layers used for a dentry do not themselves have ->revalidate(). E.g. negative overlay dentry has no underlying layers, hence revalidate is unnecessary. Or if lower layer is remote but overlay dentry is pure-upper, then can skip revalidate. The following places need to update whether the dentry needs revalidate or not: - fill-super (root dentry) - lookup - create - fh_to_dentry Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: separate detection of remote upper layer from stacked overlayMiklos Szeredi
Following patch will allow remote as upper layer, but not overlay stacked on upper layer. Separate the two concepts. This patch is doesn't change behavior. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: restructure dentry revalidationMiklos Szeredi
Use a common loop for plain and weak revalidation. This will aid doing revalidation on upper layer. This patch doesn't change behavior. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: ignore failure to copy up unknown xattrsMiklos Szeredi
This issue came up with NFSv4 as the lower layer, which generates "system.nfs4_acl" xattrs (even for plain old unix permissions). Prior to this patch this prevented copy-up from succeeding. The overlayfs permission model mandates that permissions are checked locally for the task and remotely for the mounter(*). NFS4 ACLs are not supported by the Linux kernel currently, hence they cannot be enforced locally. Which means it is indifferent whether this attribute is copied or not. Generalize this to any xattr that is not used in access checking (i.e. it's not a POSIX ACL and not in the "security." namespace). Incidentally, best effort copying of xattrs seems to also be the behavior of "cp -a", which is what overlayfs tries to mimic. (*) Documentation/filesystems/overlayfs.txt#Permission model Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: simplify i_ino initializationAmir Goldstein
Move i_ino initialization to ovl_inode_init() to avoid the dance of setting i_ino in ovl_fill_inode() sometimes on the first call and sometimes on the seconds call. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: factor out helper ovl_get_root()Amir Goldstein
Allocates and initializes the root dentry and inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: fix out of date comment and unreachable codeAmir Goldstein
ovl_inode_update() is no longer called from create object code path. Fixes: 01b39dcc9568 ("ovl: use inode_insert5() to hash a newly...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: fix value of i_ino for lower hardlink corner caseAmir Goldstein
Commit 6dde1e42f497 ("ovl: make i_ino consistent with st_ino in more cases"), relaxed the condition nfs_export=on in order to set the value of i_ino to xino map of real ino. Specifically, it also relaxed the pre-condition that index=on for consistent i_ino. This opened the corner case of lower hardlink in ovl_get_inode(), which calls ovl_fill_inode() with ino=0 and then ovl_init_inode() is called to set i_ino to lower real ino without the xino mapping. Pass the correct values of ino;fsid in this case to ovl_fill_inode(), so it can initialize i_ino correctly. Fixes: 6dde1e42f497 ("ovl: make i_ino consistent with st_ino in more ...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ext2: fix debug reference to ext2_xattr_cacheJan Kara
Fix a debug-only build error in ext2/xattr.c: When building without extra debugging, (and with another patch that uses no_printk() instead of <empty> for the ext2-xattr debug-print macros, this build error happens: ../fs/ext2/xattr.c: In function ‘ext2_xattr_cache_insert’: ../fs/ext2/xattr.c:869:18: error: ‘ext2_xattr_cache’ undeclared (first use in this function); did you mean ‘ext2_xattr_list’? atomic_read(&ext2_xattr_cache->c_entry_count)); Fix the problem by removing cached entry count from the debug message since otherwise we'd have to export the mbcache structure just for that. Fixes: be0726d33cb8 ("ext2: convert to mbcache2") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-16kernfs: Add option to enable user xattrsDaniel Xu
User extended attributes are useful as metadata storage for kernfs consumers like cgroups. Especially in the case of cgroups, it is useful to have a central metadata store that multiple processes/services can use to coordinate actions. A concrete example is for userspace out of memory killers. We want to let delegated cgroup subtree owners (running as non-root) to be able to say "please avoid killing this cgroup". This is especially important for desktop linux as delegated subtrees owners are less likely to run as root. This patch introduces a new flag, KERNFS_ROOT_SUPPORT_USER_XATTR, that lets kernfs consumers enable user xattr support. An initial limit of 128 entries or 128KB -- whichever is hit first -- is placed per cgroup because xattrs come from kernel memory and we don't want to let unprivileged users accidentally eat up too much kernel memory. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Chris Down <chris@chrisdown.name> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2020-03-16kernfs: Add removed_size out param for simple_xattr_setDaniel Xu
This helps set up size accounting in the next commit. Without this out param, it's difficult to find out the removed xattr size without taking a lock for longer and walking the xattr linked list twice. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Chris Down <chris@chrisdown.name> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2020-03-16kernfs: kvmalloc xattr value instead of kmallocDaniel Xu
xattr values have a 64k maximum size. This can result in an order 4 kmalloc request which can be difficult to fulfill. Since xattrs do not need physically contiguous memory, we can switch to kvmalloc and not have to worry about higher order allocations failing. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Chris Down <chris@chrisdown.name> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>