summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-12-11xfs: stabilize insert range start boundary to avoid COW writeback raceBrian Foster
generic/522 (fsx) occasionally fails with a file corruption due to an insert range operation. The primary characteristic of the corruption is a misplaced insert range operation that differs from the requested target offset. The reason for this behavior is a race between the extent shift sequence of an insert range and a COW writeback completion that causes a front merge with the first extent in the shift. The shift preparation function flushes and unmaps from the target offset of the operation to the end of the file to ensure no modifications can be made and page cache is invalidated before file data is shifted. An insert range operation then splits the extent at the target offset, if necessary, and begins to shift the start offset of each extent starting from the end of the file to the start offset. The shift sequence operates at extent level and so depends on the preparation sequence to guarantee no changes can be made to the target range during the shift. If the block immediately prior to the target offset was dirty and shared, however, it can undergo writeback and move from the COW fork to the data fork at any point during the shift. If the block is contiguous with the block at the start offset of the insert range, it can front merge and alter the start offset of the extent. Once the shift sequence reaches the target offset, it shifts based on the latest start offset and silently changes the target offset of the operation and corrupts the file. To address this problem, update the shift preparation code to stabilize the start boundary along with the full range of the insert. Also update the existing corruption check to fail if any extent is shifted with a start offset behind the target offset of the insert range. This prevents insert from racing with COW writeback completion and fails loudly in the event of an unexpected extent shift. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-11Merge tag 'erofs-for-5.5-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "Mainly address a regression reported by David recently observed together with overlayfs due to the improper return value of listxattr() without xattr. Update outdated expressions in document as well. Summary: - Fix improper return value of listxattr() with no xattr - Keep up documentation with latest code" * tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: update documentation erofs: zero out when listxattr is called with no xattr
2019-12-11pipe: simplify signal handling in pipe_read() and add commentsLinus Torvalds
There's no need to separately check for signals while inside the locked region, since we're going to do "wait_event_interruptible()" right afterwards anyway, and the error handling is much simpler there. The check for whether we had already read anything was also redundant, since we no longer do the odd merging of reads when there are pending writers. But perhaps more importantly, this adds commentary about why we still need to wake up possible writers even though we didn't read any data, and why we can skip all the finishing touches now if we get a signal (or had a signal pending) while waiting for more data. [ This is a split-out cleanup from my "make pipe IO use exclusive wait queues" thing, which I can't apply because it triggers a nasty bug in the GNU make jobserver - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-11afs: Show volume name in /proc/net/afs/<cell>/volumesDavid Howells
Show the name of each volume in /proc/net/afs/<cell>/volumes to make it easier to work out the name corresponding to a volume ID. This makes it easier to work out which mounts in /proc/mounts correspond to which volume ID. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
2019-12-11afs: Fix missing cell comparison in afs_test_super()David Howells
Fix missing cell comparison in afs_test_super(). Without this, any pair volumes that have the same volume ID will share a superblock, no matter the cell, unless they're in different network namespaces. Normally, most users will only deal with a single cell and so they won't see this. Even if they do look into a second cell, they won't see a problem unless they happen to hit a volume with the same ID as one they've already got mounted. Before the patch: # ls /afs/grand.central.org/archive linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # ls /afs/kth.se/ linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # cat /proc/mounts | grep afs none /afs afs rw,relatime,dyn,autocell 0 0 #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0 #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0 #grand.central.org:root.archive /afs/kth.se afs ro,relatime 0 0 After the patch: # ls /afs/grand.central.org/archive linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # ls /afs/kth.se/ admin/ common/ install/ OldFiles/ service/ system/ bakrestores/ home/ misc/ pkg/ src/ wsadmin/ # cat /proc/mounts | grep afs none /afs afs rw,relatime,dyn,autocell 0 0 #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0 #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0 #kth.se:root.cell /afs/kth.se afs ro,relatime 0 0 Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Carsten Jacobi <jacobi@de.ibm.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org> cc: Todd DeSantis <atd@us.ibm.com>
2019-12-11afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPPDavid Howells
Fix the lookup method on the dynamic root directory such that creation calls, such as mkdir, open(O_CREAT), symlink, etc. fail with EOPNOTSUPP rather than failing with some odd error (such as EEXIST). lookup() itself tries to create automount directories when it is invoked. These are cached locally in RAM and not committed to storage. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
2019-12-11afs: Fix mountpoint parsingDavid Howells
Each AFS mountpoint has strings that define the target to be mounted. This is required to end in a dot that is supposed to be stripped off. The string can include suffixes of ".readonly" or ".backup" - which are supposed to come before the terminal dot. To add to the confusion, the "fs lsmount" afs utility does not show the terminal dot when displaying the string. The kernel mount source string parser, however, assumes that the terminal dot marks the suffix and that the suffix is always "" and is thus ignored. In most cases, there is no suffix and this is not a problem - but if there is a suffix, it is lost and this affects the ability to mount the correct volume. The command line mount command, on the other hand, is expected not to include a terminal dot - so the problem doesn't arise there. Fix this by making sure that the dot exists and then stripping it when passing the string to the mount configuration. Fixes: bec5eb614130 ("AFS: Implement an autocell mount capability [ver #2]") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
2019-12-11crypto: skcipher - remove crypto_skcipher::keysizeEric Biggers
Due to the removal of the blkcipher and ablkcipher algorithm types, crypto_skcipher::keysize is now redundant since it always equals crypto_skcipher_alg(tfm)->max_keysize. Remove it and update crypto_skcipher_default_keysize() accordingly. Also rename crypto_skcipher_default_keysize() to crypto_skcipher_max_keysize() to clarify that it specifically returns the maximum key size, not some unspecified "default". Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-12-11sched/cputime, proc/stat: Fix incorrect guest nice cpustat valueFlavio Leitner
The value being used for guest_nice should be CPUTIME_GUEST_NICE and not CPUTIME_USER. Fixes: 26dae145a76c ("procfs: Use all-in-one vtime aware kcpustat accessor") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191205020344.14940-1-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystemsAl Viro
two requirements: no file creations in IS_DEADDIR and no cross-directory renames whatsoever. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-10f2fs: call f2fs_balance_fs outside of locked pageJaegeuk Kim
Otherwise, we can hit deadlock by waiting for the locked page in move_data_block in GC. Thread A Thread B - do_page_mkwrite - f2fs_vm_page_mkwrite - lock_page - f2fs_balance_fs - mutex_lock(gc_mutex) - f2fs_gc - do_garbage_collect - ra_data_block - grab_cache_page - f2fs_balance_fs - mutex_lock(gc_mutex) Fixes: 39a8695824510 ("f2fs: refactor ->page_mkwrite() flow") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-12-10io_uring: add sockets to list of files that support non-blocking issueJens Axboe
In chasing a performance issue between using IORING_OP_RECVMSG and IORING_OP_READV on sockets, tracing showed that we always punt the socket reads to async offload. This is due to io_file_supports_async() not checking for S_ISSOCK on the inode. Since sockets supports the O_NONBLOCK (or MSG_DONTWAIT) flag just fine, add sockets to the list of file types that we can do a non-blocking issue to. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: only hash regular files for async work executionJens Axboe
We hash regular files to avoid having multiple threads hammer on the inode mutex, but it should not be needed on other types of files (like sockets). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: run next sqe inline if possibleJens Axboe
One major use case of linked commands is the ability to run the next link inline, if at all possible. This is done correctly for async offload, but somewhere along the line we lost the ability to do so when we were able to complete a request without having to punt it. Ensure that we do so correctly. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: don't dynamically allocate poll dataJens Axboe
This essentially reverts commit e944475e6984. For high poll ops workloads, like TAO, the dynamic allocation of the wait_queue entry for IORING_OP_POLL_ADD adds considerable extra overhead. Go back to embedding the wait_queue_entry, but keep the usage of wait->private for the pointer stashing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: deferred send/recvmsg should assign iovJens Axboe
Don't just assign it from the main call path, that can miss the case when we're called from issue deferral. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: sqthread should grab ctx->uring_lock for submissionsJens Axboe
We use the mutex to guard against registered file updates, for instance. Ensure we're safe in accessing that state against concurrent updates. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io-wq: briefly spin for new work after finishing workJens Axboe
To avoid going to sleep only to get woken shortly thereafter, spin briefly for new work upon completion of work. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io-wq: remove worker->wait waitqueueJens Axboe
We only have one cases of using the waitqueue to wake the worker, the rest are using wake_up_process(). Since we can save some cycles not fiddling with the waitqueue io_wqe_worker(), switch the work activation to task wakeup and get rid of the now unused wait_queue_head_t in struct io_worker. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: allow unbreakable linksJens Axboe
Some commands will invariably end in a failure in the sense that the completion result will be less than zero. One such example is timeouts that don't have a completion count set, they will always complete with -ETIME unless cancelled. For linked commands, we sever links and fail the rest of the chain if the result is less than zero. Since we have commands where we know that will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever regardless of the completion result. Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly. Cc: stable@vger.kernel.org # v5.4 Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10ovl: relax WARN_ON() on rename to selfAmir Goldstein
In ovl_rename(), if new upper is hardlinked to old upper underneath overlayfs before upper dirs are locked, user will get an ESTALE error and a WARN_ON will be printed. Changes to underlying layers while overlayfs is mounted may result in unexpected behavior, but it shouldn't crash the kernel and it shouldn't trigger WARN_ON() either, so relax this WARN_ON(). Reported-by: syzbot+bb1836a212e69f8e201a@syzkaller.appspotmail.com Fixes: 804032fabb3b ("ovl: don't check rename to self") Cc: <stable@vger.kernel.org> # v4.9+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: fix corner case of non-unique st_dev;st_inoAmir Goldstein
On non-samefs overlay without xino, non pure upper inodes should use a pseudo_dev assigned to each unique lower fs and pure upper inodes use the real upper st_dev. It is fine for an overlay pure upper inode to use the same st_dev;st_ino values as the real upper inode, because the content of those two different filesystem objects is always the same. In this case, however: - two filesystems, A and B - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B Non pure upper overlay inode, whose origin is in layer 1 will have the same st_dev;st_ino values as the real lower inode. This may result with a false positive results of 'diff' between the real lower and copied up overlay inode. Fix this by using the upper st_dev;st_ino values in this case. This breaks the property of constant st_dev;st_ino across copy up of this case. This breakage will be fixed by a later patch. Fixes: 5148626b806a ("ovl: allocate anon bdev per unique lower fs") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: don't use a temp buf for encoding real fhAmir Goldstein
We can allocate maximum fh size and encode into it directly. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: make sure that real fid is 32bit aligned in memoryAmir Goldstein
Seprate on-disk encoding from in-memory and on-wire resresentation of overlay file handle. In-memory and on-wire we only ever pass around pointers to struct ovl_fh, which encapsulates at offset 3 the on-disk format struct ovl_fb. struct ovl_fb encapsulates at offset 21 the real file handle. That makes sure that the real file handle is always 32bit aligned in-memory when passed down to the underlying filesystem. On-disk format remains the same and store/load are done into correctly aligned buffer. New nfs exported file handles are exported with aligned real fid. Old nfs file handles are copied to an aligned buffer before being decoded. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: fix lookup failure on multi lower squashfsAmir Goldstein
In the past, overlayfs required that lower fs have non null uuid in order to support nfs export and decode copy up origin file handles. Commit 9df085f3c9a2 ("ovl: relax requirement for non null uuid of lower fs") relaxed this requirement for nfs export support, as long as uuid (even if null) is unique among all lower fs. However, said commit unintentionally also relaxed the non null uuid requirement for decoding copy up origin file handles, regardless of the unique uuid requirement. Amend this mistake by disabling decoding of copy up origin file handle from lower fs with a conflicting uuid. We still encode copy up origin file handles from those fs, because file handles like those already exist in the wild and because they might provide useful information in the future. There is an unhandled corner case described by Miklos this way: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B In this case bad_uuid won't be set for B, because the check only involves the list of lower fs. Hence we'll try to decode a layer 2 origin on layer 1 and fail. We will deal with this corner case later. Reported-by: Colin Ian King <colin.king@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/lkml/20191106234301.283006-1-colin.king@canonical.com/ Fixes: 9df085f3c9a2 ("ovl: relax requirement for non null uuid ...") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10Merge tag 'v5.5-rc1' into core/kprobes, to resolve conflictsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-09smb3: fix refcount underflow warning on unmount when no directory leasesSteve French
Fix refcount underflow warning when unmounting to servers which didn't grant directory leases. [ 301.680095] refcount_t: underflow; use-after-free. [ 301.680192] WARNING: CPU: 1 PID: 3569 at lib/refcount.c:28 refcount_warn_saturate+0xb4/0xf3 ... [ 301.682139] Call Trace: [ 301.682240] close_shroot+0x97/0xda [cifs] [ 301.682351] SMB2_tdis+0x7c/0x176 [cifs] [ 301.682456] ? _get_xid+0x58/0x91 [cifs] [ 301.682563] cifs_put_tcon.part.0+0x99/0x202 [cifs] [ 301.682637] ? ida_free+0x99/0x10a [ 301.682727] ? cifs_umount+0x3d/0x9d [cifs] [ 301.682829] cifs_put_tlink+0x3a/0x50 [cifs] [ 301.682929] cifs_umount+0x44/0x9d [cifs] Fixes: 72e73c78c446 ("cifs: close the shared root handle on tree disconnect") Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reported-and-tested-by: Arthur Marsh <arthur.marsh@internode.on.net>
2019-12-09f2fs: preallocate DIO blocks when forcing buffered_ioJaegeuk Kim
The previous preallocation and DIO decision like below. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io (*) No_Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO But, Javier reported Case (*) where zoned device bypassed preallocation but fell back to buffered writes in f2fs_direct_IO(), resulting in stale data being read. In order to fix the issue, actually we need to preallocate blocks whenever we fall back to buffered IO like this. No change is made in the other cases. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO Reported-and-tested-by: Javier Gonzalez <javier@javigon.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-12-09ceph: add more debug info when decoding mdsmapXiubo Li
Show the laggy state. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: switch to global cap helperXiubo Li
__ceph_is_any_caps is a duplicate helper. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: trigger the reclaim work once there has enough pending capsXiubo Li
The nr in ceph_reclaim_caps_nr() is very possibly larger than 1, so we may miss it and the reclaim work couldn't triggered as expected. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: show tasks waiting on caps in debugfs caps fileJeff Layton
Add some visibility of tasks that are waiting for caps to the "caps" debugfs file. Display the tgid of the waiting task, inode number, and the caps the task needs and wants. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: convert int fields in ceph_mount_options to unsigned intJeff Layton
Most of these values should never be negative, so convert them to unsigned values. Add some sanity checking to the parsed values, and clean up some unneeded casts. Note that while caps_max should never be negative, this patch leaves it signed, since this value ends up later being compared to a signed counter. Just ensure that userland never passes in a negative value for caps_max. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09treewide: Use sizeof_field() macroPankaj Bharadiya
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09fs/ext4/inode-test: Fix inode test on 32 bit platforms.Iurii Zaikin
Fixes the issue caused by the fact that in C in the expression of the form -1234L only 1234L is the actual literal, the unary minus is an operation applied to the literal. Which means that to express the lower bound for the type one has to negate the upper bound and subtract 1. Original error: Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == -2147483648 timestamp.tv_sec == 2147483648 1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits: msb:1 lower_bound:1 extra_bits: 0 Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == 2147483648 timestamp.tv_sec == 6442450944 2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on: msb:1 lower_bound:1 extra_bits: 1 Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == 6442450944 timestamp.tv_sec == 10737418240 2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on: msb:1 lower_bound:1 extra_bits: 2 not ok 1 - inode_test_xtimestamp_decoding not ok 1 - ext4_inode_test Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Iurii Zaikin <yzaikin@google.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-09btrfs: add Kconfig dependency for BLAKE2BDavid Sterba
Because the BLAKE2B code went through a different tree, it was not available at the time the btrfs part was merged. Now that the Kconfig symbol exists, add it to the list. Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-09nfsd4: avoid NULL deference on strange COPY compoundsJ. Bruce Fields
With cross-server COPY we've introduced the possibility that the current or saved filehandle might not have fh_dentry/fh_export filled in, but we missed a place that assumed it was. I think this could be triggered by a compound like: PUTFH(foreign filehandle) GETATTR SAVEFH COPY First, check_if_stalefh_allowed sets no_verify on the first (PUTFH) op. Then op_func = nfsd4_putfh runs and leaves current_fh->fh_export NULL. need_wrongsec_check returns true, since this PUTFH has OP_IS_PUTFH_LIKE set and GETATTR does not have OP_HANDLES_WRONGSEC set. We should probably also consider tightening the checks in check_if_stalefh_allowed and double-checking that we don't assume the filehandle is verified elsewhere in the compound. But I think this fixes the immediate issue. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 4e48f1cccab3 "NFSD: allow inter server COPY to have... " Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-09NFSD fixing possible null pointer derefering in copy offloadOlga Kornievskaia
Static checker revealed possible error path leading to possible NULL pointer dereferencing. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: e0639dc5805a: ("NFSD introduce async copy feature") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-09NFSD fix nfserro errno mismatchOlga Kornievskaia
There is mismatch between __be32 and u32 in nfserr and errno. Reported-by: kbuild test robot <lkp@intel.com> Fixes: d5e54eeb0e3d ("NFSD add nfs4 inter ssc to nfsd4_copy") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-09NFSD: fix seqid in copy stateidOlga Kornievskaia
s_stid->si_generation is a u32, copy->stateid.seqid is a __be32, so we should be byte-swapping here if necessary. This effectively undoes the byte-swap performed when reading s_stid->s_generation in nfsd4_decode_copy(). Without this second swap, the stateid we sent to the source in READ could be different from the one the client provided us in the COPY. We didn't spot this in testing since our implementation always uses a 0 in the seqid field. But other implementations might not do that. You'd think we should just skip the byte-swapping entirely, but the s_stid field can be used for either our own stateids (in the intra-server case) or foreign stateids (in the inter-server case), and the former are interpreted by us and need byte-swapping. Reported-by: kbuild test robot <lkp@intel.com> Fixes: d5e54eeb0e3d ("NFSD add nfs4 inter ssc to nfsd4_copy") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-09NFSD fix mismatching type in nfsd4_set_netaddrOlga Kornievskaia
Fix __be32 and u32 mismatch in return and assignment. Reported-by: kbuild test robot <lkp@intel.com> Fixes: dbd4c2dd8f13 ("NFSD add COPY_NOTIFY operation") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-09nfsd: unlock on error in manage_cpntf_state()Dan Carpenter
We are holding the "nn->s2s_cp_lock" so we can't return directly without unlocking first. Fixes: f3dee17721a0 ("NFSD check stateids against copy stateids") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-09NFSD add nfs4 inter ssc to nfsd4_copyOlga Kornievskaia
Given a universal address, mount the source server from the destination server. Use an internal mount. Call the NFS client nfs42_ssc_open to obtain the NFS struct file suitable for nfsd_copy_range. Ability to do "inter" server-to-server depends on the an nfsd kernel parameter "inter_copy_offload_enable". Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-12-09NFSD: allow inter server COPY to have a STALE source server fhOlga Kornievskaia
The inter server to server COPY source server filehandle is a foreign filehandle as the COPY is sent to the destination server. Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-12-09NFSD generalize nfsd4_compound_state flag namesOlga Kornievskaia
Allow for sid_flag field non-stateid use. Signed-off-by: Andy Adamson <andros@netapp.com>
2019-12-09NFSD check stateids against copy stateidsOlga Kornievskaia
Incoming stateid (used by a READ) could be a saved copy stateid. Using the provided stateid, look it up in the list of copy_notify stateids. If found, use the parent's stateid and parent's clid to look up the parent's stid to do the appropriate checks. Update the copy notify timestamp (cpntf_time) with current time this making it 'active' so that laundromat thread will not delete copy notify state. Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-12-09NFSD add COPY_NOTIFY operationOlga Kornievskaia
Introducing the COPY_NOTIFY operation. Create a new unique stateid that will keep track of the copy state and the upcoming READs that will use that stateid. Each associated parent stateid has a list of copy notify stateids. A copy notify structure makes a copy of the parent stateid and a clientid and will use it to look up the parent stateid during the READ request (suggested by Trond Myklebust <trond.myklebust@hammerspace.com>). At nfs4_put_stid() time, we walk the list of the associated copy notify stateids and delete them. Laundromat thread will traverse globally stored copy notify stateid in idr and notice if any haven't been referenced in the lease period, if so, it'll remove them. Return single netaddr to advertise to the copy. Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com>
2019-12-09NFSD COPY_NOTIFY xdrOlga Kornievskaia
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-12-09NFSD add ca_source_server<> to COPYOlga Kornievskaia
Decode the ca_source_server list that's sent but only use the first one. Presence of non-zero list indicates an "inter" copy. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-12-09NFSD fill-in netloc4 structureOlga Kornievskaia
nfs.4 defines nfs42_netaddr structure that represents netloc4. Populate needed fields from the sockaddr structure. This will be used by flexfiles and 4.2 inter copy Signed-off-by: Olga Kornievskaia <kolga@netapp.com>