summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-02-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-02-10Btrfs: fix btrfs_decompress_buf2page()Omar Sandoval
If btrfs_decompress_buf2page() is handed a bio with its page in the middle of the working buffer, then we adjust the offset into the working buffer. After we copy into the bio, we advance the iterator by the number of bytes we copied. Then, we have some logic to handle the case of discontiguous pages and adjust the offset into the working buffer again. However, if we didn't advance the bio to a new page, we may enter this case in error, essentially repeating the adjustment that we already made when we entered the function. The end result is bogus data in the bio. Previously, we only checked for this case when we advanced to a new page, but the conversion to bio iterators changed that. This restores the old, correct behavior. A case I saw when testing with zlib was: buf_start = 42769 total_out = 46865 working_bytes = total_out - buf_start = 4096 start_byte = 45056 The condition (total_out > start_byte && buf_start < start_byte) is true, so we adjust the offset: buf_offset = start_byte - buf_start = 2287 working_bytes -= buf_offset = 1809 current_buf_start = buf_start = 42769 Then, we copy bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809 buf_offset += bytes = 4096 working_bytes -= bytes = 0 current_buf_start += bytes = 44578 After bio_advance(), we are still in the same page, so start_byte is the same. Then, we check (total_out > start_byte && current_buf_start < start_byte), which is true! So, we adjust the values again: buf_offset = start_byte - buf_start = 2287 working_bytes = total_out - start_byte = 1809 current_buf_start = buf_start + buf_offset = 45056 But note that working_bytes was already zero before this, so we should have stopped copying. Fixes: 974b1adc3b10 ("btrfs: use bio iterators for the decompression handlers") Reported-by: Pat Erley <pat-lkml@erley.org> Reviewed-by: Chris Mason <clm@fb.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Tested-by: Liu Bo <bo.li.liu@oracle.com>
2017-02-10Merge tag 'nfsd-4.10-3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd revert from Bruce Fields: "This patch turned out to have a couple problems. The problems are fixable, but at least one of the fixes is a little ugly. The original bug has always been there, so we can wait another week or two to get this right" * tag 'nfsd-4.10-3' of git://linux-nfs.org/~bfields/linux: nfsd: Revert "nfsd: special case truncates some more"
2017-02-10afs: Use core kernel UUID generationArnd Bergmann
AFS uses a time based UUID to identify the host itself. This requires getting a timestamp which is currently done through the getnstimeofday() interface that we want to eventually get rid of. Instead of replacing it with a ktime-based interface, simply remove the entire function and use generate_random_uuid() instead, which has a v4 ("completely random") UUID instead of the time-based one. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com>
2017-02-10afs: Move UUID struct to linux/uuid.hDavid Howells
Move the afs_uuid struct to linux/uuid.h, rename it to uuid_v1 and change the u16/u32 fields to __be16/__be32 instead so that the structure can be cast to a 16-octet network-order buffer. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de
2017-02-10kernfs: handle null pointers while printing node name and pathKonstantin Khlebnikov
Null kernfs nodes could be found at cgroups during construction. It seems safer to handle these null pointers right in kernfs in the same way as printf prints "(null)" for null pointer string. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-10timerfd: Protect the might cancel mechanism properThomas Gleixner
The handling of the might_cancel queueing is not properly protected, so parallel operations on the file descriptor can race with each other and lead to list corruptions or use after free. Protect the context for these operations with a seperate lock. The wait queue lock cannot be reused for this because that would create a lock inversion scenario vs. the cancel lock. Replacing might_cancel with an atomic (atomic_t or atomic bit) does not help either because it still can race vs. the actual list operation. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "linux-fsdevel@vger.kernel.org" Cc: syzkaller <syzkaller@googlegroups.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701311521430.3457@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10ext4: do not use stripe_width if it is not setJan Kara
Avoid using stripe_width for sbi->s_stripe value if it is not actually set. It prevents using the stride for sbi->s_stripe. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-10ext4: fix stripe-unaligned allocationsJan Kara
When a filesystem is created using: mkfs.ext4 -b 4096 -E stride=512 <dev> and we try to allocate 64MB extent, we will end up directly in ext4_mb_complex_scan_group(). This is because the request is detected as power-of-two allocation (so we start in ext4_mb_regular_allocator() with ac_criteria == 0) however the check before ext4_mb_simple_scan_group() refuses the direct buddy scan because the allocation request is too large. Since cr == 0, the check whether we should use ext4_mb_scan_aligned() fails as well and we fall back to ext4_mb_complex_scan_group(). Fix the problem by checking for upper limit on power-of-two requests directly when detecting them. Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-09nfsd: Revert "nfsd: special case truncates some more"J. Bruce Fields
This patch incorrectly attempted nested mnt_want_write, and incorrectly disabled nfsd's owner override for truncate. We'll fix those problems and make another attempt soon, for the moment I think the safest is to revert. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-09pstore: don't OOPS when there are no ftrace zonesBrian Norris
We'll OOPS in ramoops_get_next_prz() if the platform didn't ask for any ftrace zones (i.e., cxt->fprzs will be NULL). Let's just skip this entire FTRACE section if there's no 'fprzs'. Regression seen on a coreboot/depthcharge-based Chromebook. Fixes: 2fbea82bbb89 ("pstore: Merge per-CPU ftrace records into one") Cc: Joel Fernandes <joelaf@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-02-09orangefs: fix buffer size mis-match between kernel space and user space.Mike Marshall
The deamon through which the kernel module communicates with the userspace part of Orangefs, the "client-core", sends initialization data to the kernel module with ioctl. The initialization data was built by the client-core in a 2k buffer and copy_from_user'd into a 1k buffer in the kernel module. When more than 1k of initialization data needed to be sent, some was lost, reducing the usability of the control by which debug levels are set. This patch sets the kernel side buffer to 2K to match the userspace side... Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-02-09orangefs: Dan Carpenter influenced cleanups...Mike Marshall
This patch is simlar to one Dan Carpenter sent me, cleans up some return codes and whitespace errors. There was one place where he thought inserting an error message into the ring buffer might be too chatty, I hope I convinced him othewise. As a consolation <g> I changed a truly chatty error message in another location into a debug message, system-admins had already yelled at me about that one... Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-02-09xfs: don't block the log commit handler for discardsChristoph Hellwig
Instead we submit the discard requests and use another workqueue to release the extents from the extent busy list. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-09xfs: improve busy extent sortingChristoph Hellwig
Sort busy extents by the full block number instead of just the AGNO so that we can issue consecutive discard requests that the block layer could merge (although we'll need additional block layer fixes for fast devices). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-09NFSv4: Set the connection timeout to match the lease periodTrond Myklebust
Set the timeout for TCP connections to be 1 lease period to ensure that we don't lose our lease due to a faulty TCP connection. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-02-09xfs: improve handling of busy extents in the low-level allocatorChristoph Hellwig
Currently we force the log and simply try again if we hit a busy extent, but especially with online discard enabled it might take a while after the log force for the busy extents to disappear, and we might have already completed our second pass. So instead we add a new waitqueue and a generation counter to the pag structure so that we can do wakeups once we've removed busy extents, and we replace the single retry with an unconditional one - after all we hold the AGF buffer lock, so no other allocations or frees can be racing with us in this AG. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-09xfs: don't fail xfs_extent_busy allocationChristoph Hellwig
We don't just need the structure to track busy extents which can be avoided with a synchronous transaction, but also to keep track of pending discard. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-09xfs: correct null checks and error processing in xfs_initialize_peragBill O'Donnell
If pag cannot be allocated, the current error exit path will trip a null pointer deference error when calling xfs_buf_hash_destroy with a null pag. Fix this by adding a new error exit labels and jumping to those accordingly, avoiding the hash destroy and unnecessary kmem_free on pag. Up to three things need to be properly unwound: 1) pag memory allocation 2) xfs_buf_hash_init 3) radix_tree_insert For any given iteration through the loop, any of the above which succeed must be unwound for /this/ pag, and then all prior initialized pags must be unwound. Addresses-Coverity-Id: 1397628 ("Dereference after null check") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-09xfs: update ctime and mtime on clone destinatation inodesChristoph Hellwig
We're changing both metadata and data, so we need to update the timestamps for clone operations. Dedupe on the other hand does not change file data, and only changes invisible metadata so the timestamps should not be updated. This follows existing btrfs behavior. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: remove redundant is_dedupe test] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-09fanotify: simplify the code of fanotify_mergeKinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-02-08NFSv4: Fix memory and state leak in _nfs4_open_and_get_stateTrond Myklebust
If we exit because the file access check failed, we currently leak the struct nfs4_state. We need to attach it to the open context before returning. Fixes: 3efb9722475e ("NFSv4: Refactor _nfs4_open_and_get_state..") Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-02-08sunrpc/nfs: cleanup procfs/pipefs entry in cache_detailKinglong Mee
Record flush/channel/content entries is useless, remove them. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-02-08nfs: no PG_private waiters remain, remove wakerNicholas Piggin
Since commit 4f52b6bb ("NFS: Don't call COMMIT in ->releasepage()"), no tasks wait on PagePrivate, so the wake introduced in commit 95905446 ("NFS: avoid deadlocks with loop-back mounted NFS filesystems.") can be removed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-02-08NFS: nfs_rename() handle -ERESTARTSYS dentry left behindBenjamin Coddington
An interrupted rename will leave the old dentry behind if the rename succeeds. Fix this by moving the final local work of the rename to rpc_call_done so that the results of the RENAME can always be handled, even if the original process has already returned with -ERESTARTSYS. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-02-08dax: assert that i_rwsem is held exclusive for writesChristoph Hellwig
Make sure all callers follow the same locking protocol, given that DAX transparantly replaced the normal buffered I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2017-02-08ext4: fix DAX write lockingChristoph Hellwig
Unlike O_DIRECT DAX is not an optional opt-in feature selected by the application, so we'll have to provide the traditional synchronŃ–zation of overlapping writes as we do for buffered writes. This was broken historically for DAX, but got fixed for ext2 and XFS as part of the iomap conversion. Fix up ext4 as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2017-02-08btrfs: fix btrfs_compat_ioctl failures on non-compat ioctlsJeff Mahoney
Commit 4c63c2454ef incorrectly assumed that returning -ENOIOCTLCMD would cause the native ioctl to be called. The ->compat_ioctl callback is expected to handle all ioctls, not just compat variants. As a result, when using 32-bit userspace on 64-bit kernels, everything except those three ioctls would return -ENOTTY. Fixes: 4c63c2454ef ("btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl") Cc: stable@vger.kernel.org Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-08fscrypt: constify struct fscrypt_operationsEric Biggers
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Richard Weinberger <richard@nod.at>
2017-02-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The conflict was an interaction between a bug fix in the netvsc driver in 'net' and an optimization of the RX path in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07mm: fix KPF_SWAPCACHE in /proc/kpageflagsHugh Dickins
Commit 6326fec1122c ("mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked") aliased PG_swapcache to PG_owner_priv_1 (and depending on PageSwapBacked being true). As a result, the KPF_SWAPCACHE bit in '/proc/kpageflags' should now be synthesized, instead of being shown on unrelated pages which just happen to have PG_owner_priv_1 set. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-07ovl: drop CAP_SYS_RESOURCE from saved mounter's credentialsKonstantin Khlebnikov
If overlay was mounted by root then quota set for upper layer does not work because overlay now always use mounter's credentials for operations. Also overlay might deplete reserved space and inodes in ext4. This patch drops capability SYS_RESOURCE from saved credentials. This affects creation new files, whiteouts, and copy-up operations. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context") Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07ovl: properly implement sync_filesystem()Amir Goldstein
overlayfs syncs all inode pages on sync_filesystem(), but it also needs to call s_op->sync_fs() of upper fs for metadata sync. This fixes correctness of syncfs(2) as demonstrated by following xfs specific test: xfs_sync_stats() { echo $1 echo -n "xfs_log_force = " grep log /proc/fs/xfs/stat | awk '{ print $5 }' } xfs_sync_stats "before touch" touch x xfs_sync_stats "after touch" xfs_io -c syncfs . xfs_sync_stats "after syncfs" xfs_io -c fsync x xfs_sync_stats "after fsync" xfs_io -c fsync x xfs_sync_stats "after fsync #2" When this test is run in overlay mount over xfs, log force count does not increase with syncfs command. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07ovl: concurrent copy up of regular filesAmir Goldstein
Now that copy up of regular file is done using O_TMPFILE, we don't need to hold rename_lock throughout copy up. Use the copy up waitqueue to synchronize concurrent copy up of the same file. Different regular files can be copied up concurrently. The upper dir inode_lock is taken instead of rename_lock, because it is needed for lookup and later for linking the temp file, but it is released while copying up data. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07ovl: introduce copy up waitqueueAmir Goldstein
The overlay sb 'copyup_wq' and overlay inode 'copying' condition variable are about to replace the upper sb rename_lock, as finer grained synchronization objects for concurrent copy up. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07ovl: copy up regular file using O_TMPFILEAmir Goldstein
In preparation for concurrent copy up, implement copy up of regular file as O_TMPFILE that is linked to upperdir instead of a file in workdir that is moved to upperdir. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07ovl: rearrange code in ovl_copy_up_locked()Amir Goldstein
As preparation to implementing copy up with O_TMPFILE, name the variable for dentry before final rename 'temp' and assign it to 'newdentry' only after rename. Also lookup upper dentry before looking up temp dentry and move ovl_set_timestamps() into ovl_copy_up_locked(), because that is going to be more convenient for upcoming change. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07ovl: check if upperdir fs supports O_TMPFILEAmir Goldstein
This is needed for choosing between concurrent copyup using O_TMPFILE and legacy copyup using workdir+rename. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07vfs: wrap write f_ops with file_{start,end}_write()Amir Goldstein
Before calling write f_ops, call file_start_write() instead of sb_start_write(). Replace {sb,file}_start_write() for {copy,clone}_file_range() and for fallocate(). Beyond correct semantics, this avoids freeze protection to sb when operating on special inodes, such as fallocate() on a blockdev. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07vfs: deny copy_file_range() for non regular filesAmir Goldstein
There is no in-tree file system that implements copy_file_range() for non regular files. Deny an attempt to copy_file_range() a directory with EISDIR and any other non regualr file with EINVAL to conform with behavior of vfs_{clone,dedup}_file_range(). This change is needed prior to converting sb_start_write() to file_start_write() in the vfs helper. Cc: linux-api@vger.kernel.org Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07vfs: deny fallocate() on directoryAmir Goldstein
There was an obscure use case of fallocate of directory inode in the vfs helper with the comment: "Let individual file system decide if it supports preallocation for directories or not." But there is no in-tree file system that implements fallocate for directory operations. Deny an attempt to fallocate a directory with EISDIR error. This change is needed prior to converting sb_start_write() to file_start_write(), so freeze protection is correctly handled for cases of fallocate file and blockdev. Cc: linux-api@vger.kernel.org Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07vfs: create vfs helper vfs_tmpfile()Amir Goldstein
Factor out some common vfs bits from do_tmpfile() to be used by overlayfs for concurrent copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-06fscrypt: properly declare on-stack completionRichard Weinberger
When a completion is declared on-stack we have to use COMPLETION_INITIALIZER_ONSTACK(). Fixes: 0b81d07790726 ("fs crypto: move per-file encryption from f2fs tree to fs/crypto") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-06fscrypt: split supp and notsupp declarations into their own headersEric Biggers
Previously, each filesystem configured without encryption support would define all the public fscrypt functions to their notsupp_* stubs. This list of #defines had to be updated in every filesystem whenever a change was made to the public fscrypt functions. To make things more maintainable now that we have three filesystems using fscrypt, split the old header fscrypto.h into several new headers. fscrypt_supp.h contains the real declarations and is included by filesystems when configured with encryption support, whereas fscrypt_notsupp.h contains the inline stubs and is included by filesystems when configured without encryption support. fscrypt_common.h contains common declarations needed by both. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-06fscrypt: remove redundant assignment of resColin Ian King
res is assigned to sizeof(ctx), however, this is unused and res is updated later on without that assigned value to res ever being used. Remove this redundant assignment. Fixes CoverityScan CID#1395546 "Unused value" Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-06xfs: allocate direct I/O COW blocks in iomap_beginChristoph Hellwig
Instead of preallocating all the required COW blocks in the high-level write code do it inside the iomap code, like we do for all other I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-06xfs: go straight to real allocations for direct I/O COW writesChristoph Hellwig
When we allocate COW fork blocks for direct I/O writes we currently first create a delayed allocation, and then convert it to a real allocation once we've got the delayed one. As there is no good reason for that this patch instead makes use call xfs_bmapi_write from the COW allocation path. The only interesting bits are a few tweaks the low-level allocator to allow for this, most notably the need to remove the call to xfs_bmap_extsize_align for the cowextsize in xfs_bmap_btalloc - for the existing convert case it's a no-op, but for the direct allocation case it would blow up our block reservation way beyond what we reserved for the transaction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-06xfs: return the converted extent in __xfs_reflink_convert_cowChristoph Hellwig
We'll need it for the direct I/O code. Also rename the function to xfs_reflink_convert_cow_extent to describe it a bit better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-06xfs: introduce xfs_aligned_fsb_countChristoph Hellwig
Factor a helper to calculate the extent-size aligned block out of the iomap code, so that it can be reused by the upcoming reflink dio code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-06xfs: reject all unaligned direct writes to reflinked filesChristoph Hellwig
We currently fall back from direct to buffered writes if we detect a remaining shared extent in the iomap_begin callback. But by the time iomap_begin is called for the potentially unaligned end block we might have already written most of the data to disk, which we'd now write again using buffered I/O. To avoid this reject all writes to reflinked files before starting I/O so that we are guaranteed to only write the data once. The alternative would be to unshare the unaligned start and/or end block before doing the I/O. I think that's doable, and will actually be required to support reflinks on DAX file system. But it will take a little more time and I'd rather get rid of the double write ASAP. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>