summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-09-30Compiler Attributes: ext4: remove local __nonstring definitionMiguel Ojeda
Commit 072ebb3bffe6 ("ext4: add nonstring annotations to ext4.h") introduced a local definition of __nonstring to suppress some false positives in gcc 8's -Wstringop-truncation. Since now we support __nonstring for everyone, remove it. Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # on top of v4.19-rc5, clang 7 Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2018-09-30pstore/ram: Fix failure-path memory leak in ramoops_initKees Cook
As reported by nixiaoming, with some minor clarifications: 1) memory leak in ramoops_register_dummy(): dummy_data = kzalloc(sizeof(*dummy_data), GFP_KERNEL); but no kfree() if platform_device_register_data() fails. 2) memory leak in ramoops_init(): Missing platform_device_unregister(dummy) and kfree(dummy_data) if platform_driver_register(&ramoops_driver) fails. I've clarified the purpose of ramoops_register_dummy(), and added a common cleanup routine for all three failure paths to call. Reported-by: nixiaoming <nixiaoming@huawei.com> Cc: stable@vger.kernel.org Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-09-30Merge tag 'libnvdimm-fixes2-4.19-rc6' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Dan writes: "filesystem-dax for 4.19-rc6 Fix a deadlock in the new for 4.19 dax_lock_mapping_entry() routine." * tag 'libnvdimm-fixes2-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix deadlock in dax_lock_mapping_entry()
2018-09-29xarray: Replace exceptional entriesMatthew Wilcox
Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com>
2018-09-29Update email addressMatthew Wilcox
Redirect some older email addresses that are in the git logs. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-09-29iomap: set page dirty after partial delalloc on mkwriteBrian Foster
The iomap page fault mechanism currently dirties the associated page after the full block range of the page has been allocated. This leaves the page susceptible to delayed allocations without ever being set dirty on sub-page block sized filesystems. For example, consider a page fault on a page with one preexisting real (non-delalloc) block allocated in the middle of the page. The first iomap_apply() iteration performs delayed allocation on the range up to the preexisting block, the next iteration finds the preexisting block, and the last iteration attempts to perform delayed allocation on the range after the prexisting block to the end of the page. If the first allocation succeeds and the final allocation fails with -ENOSPC, iomap_apply() returns the error and iomap_page_mkwrite() fails to dirty the page having already performed partial delayed allocation. This eventually results in the page being invalidated without ever converting the delayed allocation to real blocks. This problem is reliably reproduced by generic/083 on XFS on ppc64 systems (64k page size, 4k block size). It results in leaked delalloc blocks on inode reclaim, which triggers an assert failure in xfs_fs_destroy_inode() and filesystem accounting inconsistency. Move the set_page_dirty() call from iomap_page_mkwrite() to the actor callback, similar to how the buffer head implementation works. The actor callback is called iff ->iomap_begin() returns success, so ensures the page is dirtied as soon as possible after an allocation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: remove invalid log recovery first/last cycle checkBrian Foster
One of the first steps of log recovery is to check for the special case of a zeroed log. If the first cycle in the log is zero or the tail portion of the log is zeroed, the head is set to the first instance of cycle 0. xlog_find_zeroed() includes a sanity check that enforces that the first cycle in the log must be 1 if the last cycle is 0. While this is true in most cases, the check is not totally valid because it doesn't consider the case where the filesystem crashed after a partial/out of order log buffer completion that wraps around the end of the physical log. For example, consider a filesystem that has completed most of the first cycle of the log, reaches the end of the physical log and splits the next single log buffer write into two in order to wrap around the end of the log. If these I/Os are reordered, the second (wrapped) I/O completes and the first happens to fail, the log is left in a state where the last cycle of the log is 0 and the first cycle is 2. This causes the xlog_find_zeroed() sanity check to fail and prevents the filesystem from mounting. This situation has been reproduced on particular systems via repeated runs of generic/475. This is an expected state that log recovery already knows how to deal with, however. Since the log is still partially zeroed, the head is detected correctly and points to a valid tail. The subsequent stale block detection clears blocks beyond the head up to the tail (within a maximum range), with the express purpose of clearing such out of order writes. As expected, this removes the out of order cycle 2 blocks at the physical start of the log. In other words, the only thing that prevents a clean mount and recovery of the filesystem in this scenario is the specific (last == 0 && first != 1) sanity check in xlog_find_zeroed(). Since the log head/tail are now independently validated via cycle, log record and CRC checks, this highly specific first cycle check is of dubious value. Remove it and rely on the higher level validation to determine whether log content is sane and recoverable. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: validate inode di_forkoffEric Sandeen
Verify the inode di_forkoff, lifted from xfs_repair's process_check_inode_forkoff(). Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: skip delalloc COW blocks in xfs_reflink_end_cowChristoph Hellwig
The iomap direct I/O code issues a single ->end_io call for the whole I/O request, and if some of the extents cowered needed a COW operation it will call xfs_reflink_end_cow over the whole range. When we do AIO writes we drop the iolock after doing the initial setup, but before the I/O completion. Between dropping the lock and completing the I/O we can have a racing buffered write create new delalloc COW fork extents in the region covered by the outstanding direct I/O write, and thus see delalloc COW fork extents in xfs_reflink_end_cow. As concurrent writes are fundamentally racy and no guarantees are given we can simply skip those. This can be easily reproduced with xfstests generic/208 in always_cow mode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: don't treat unknown di_flags2 as corruption in scrubEric Sandeen
xchk_inode_flags2() currently treats any di_flags2 values that the running kernel doesn't recognize as corruption, and calls xchk_ino_set_corrupt() if they are set. However, it's entirely possible that these flags were set in some newer kernel and are quite valid, but ignored in this kernel. (Validators don't care one bit about unknown di_flags2.) Call xchk_ino_set_warning instead, because this may or may not actually indicate a problem. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: remove duplicated include from alloc.cYueHaibing
Remove duplicated include xfs_alloc.h Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: don't bring in extents in xfs_bmap_punch_delalloc_rangeChristoph Hellwig
This function is only used to punch out delayed allocations on I/O failure, which means we need to have read the extents earlier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: fix transaction leak in xfs_reflink_allocate_cow()Dave Chinner
When xfs_reflink_allocate_cow() allocates a transaction, it drops the ILOCK to perform the operation. This Introduces a race condition where another thread modifying the file can perform the COW allocation operation underneath us. This result in the retry loop finding an allocated block and jumping straight to the conversion code. It does not, however, cancel the transaction it holds and so this gets leaked. This results in a lockdep warning: ================================================ WARNING: lock held when returning to user space! 4.18.5 #1 Not tainted ------------------------------------------------ worker/6123 is leaving the kernel with locks still held! 1 lock held by worker/6123: #0: 000000009eab4f1b (sb_internal#2){.+.+}, at: xfs_trans_alloc+0x17c/0x220 And eventually the filesystem deadlocks because it runs out of log space that is reserved by the leaked transaction and never gets released. The logic flow in xfs_reflink_allocate_cow() is a convoluted mess of gotos - it's no surprise that it has bug where the flow through several goto jumps then fails to clean up context from a non-obvious logic path. CLean up the logic flow and make sure every path does the right thing. Reported-by: Alexander Y. Fomichev <git.user@gmail.com> Tested-by: Alexander Y. Fomichev <git.user@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200981 Signed-off-by: Dave Chinner <dchinner@redhat.com> [hch: slight refactor] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: avoid lockdep false positives in xfs_trans_allocDave Chinner
We've had a few reports of lockdep tripping over memory reclaim context vs filesystem freeze "deadlocks". They all have looked to be false positives on analysis, but it seems that they are being tripped because we take freeze references before we run a GFP_KERNEL allocation for the struct xfs_trans. We can avoid this false positive vector just by re-ordering the operations in xfs_trans_alloc(). That is. we need allocate the structure before we take the freeze reference and enter the GFP_NOFS allocation context that follows the xfs_trans around. This prevents lockdep from seeing the GFP_KERNEL allocation inside the transaction context, and that prevents it from triggering the freeze level vs alloc context vs reclaim warnings. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-09-29xfs: refactor xfs_buf_log_item reference count handlingBrian Foster
The xfs_buf_log_item structure has a reference counter with slightly tricky semantics. In the common case, a buffer is logged and committed in a transaction, committed to the on-disk log (added to the AIL) and then finally written back and removed from the AIL. The bli refcount covers two potentially overlapping timeframes: 1. the bli is held in an active transaction 2. the bli is pinned by the log The caveat to this approach is that the reference counter does not purely dictate the lifetime of the bli. IOW, when a dirty buffer is physically logged and unpinned, the bli refcount may go to zero as the log item is inserted into the AIL. Only once the buffer is written back can the bli finally be freed. The above semantics means that it is not enough for the various refcount decrementing contexts to release the bli on decrement to zero. xfs_trans_brelse(), transaction commit (->iop_unlock()) and unpin (->iop_unpin()) must all drop the associated reference and make additional checks to determine if the current context is responsible for freeing the item. For example, if a transaction holds but does not dirty a particular bli, the commit may drop the refcount to zero. If the bli itself is clean, it is also not AIL resident and must be freed at this time. The same is true for xfs_trans_brelse(). If the transaction dirties a bli and then aborts or an unpin results in an abort due to a log I/O error, the last reference count holder is expected to explicitly remove the item from the AIL and release it (since an abort means filesystem shutdown and metadata writeback will never occur). This leads to fairly complex checks being replicated in a few different places. Since ->iop_unlock() and xfs_trans_brelse() are nearly identical, refactor the logic into a common helper that implements and documents the semantics in one place. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: clean up xfs_trans_brelse()Brian Foster
xfs_trans_brelse() is a bit of a historical mess, similar to xfs_buf_item_unlock(). It is unnecessarily verbose, has snippets of commented out code, inconsistency with regard to stale items, etc. Clean up xfs_trans_brelse() to use similar logic and flow as xfs_buf_item_unlock() with regard to bli reference count handling. This patch makes no functional changes, but facilitates further refactoring of the common bli reference count handling code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: don't unlock invalidated buf on aborted tx commitBrian Foster
xfstests generic/388,475 occasionally reproduce assertion failures in xfs_buf_item_unpin() when the final bli reference is dropped on an invalidated buffer and the buffer is not locked as it is expected to be. Invalidated buffers should remain locked on transaction commit until the final unpin, at which point the buffer is removed from the AIL and the bli is freed since stale buffers are not written back. The assert failures are associated with filesystem shutdown, typically due to log I/O errors injected by the test. The problematic situation can occur if the shutdown happens to cause a race between an active transaction that has invalidated a particular buffer and an I/O error on a log buffer that contains the bli associated with the same (now stale) buffer. Both transaction and log contexts acquire a bli reference. If the transaction has already invalidated the buffer by the time the I/O error occurs and ends up aborting due to shutdown, the transaction and log hold the last two references to a stale bli. If the transaction cancel occurs first, it treats the buffer as non-stale due to the aborted state: the bli reference is dropped and the buffer is released/unlocked. The log buffer I/O error handling eventually calls into xfs_buf_item_unpin(), drops the final reference to the bli and treats it as stale. The buffer wasn't left locked by xfs_buf_item_unlock(), however, so the assert fails and the buffer is double unlocked. The latter problem is mitigated by the fact that the fs is shutdown and no further damage is possible. ->iop_unlock() of an invalidated buffer should behave consistently with respect to the bli refcount, regardless of aborted state. If the refcount remains elevated on commit, we know the bli is awaiting an unpin (since it can't be in another transaction) and will be handled appropriately on log buffer completion. If the final bli reference of an invalidated buffer is dropped in ->iop_unlock(), we can assume the transaction has aborted because invalidation implies a dirty transaction. In the non-abort case, the log would have acquired a bli reference in ->iop_pin() and prevented bli release at ->iop_unlock() time. In the abort case the item must be freed and buffer unlocked because it wasn't pinned by the log. Rework xfs_buf_item_unlock() to simplify the currently circuitous and duplicate logic and leave invalidated buffers locked based on bli refcount, regardless of aborted state. This ensures that a pinned, stale buffer is always found locked when eventually unpinned. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: remove last of unnecessary xfs_defer_cancel() callersBrian Foster
Now that deferred operations are completely managed via transactions, it's no longer necessary to cancel the dfops in error paths that already cancel the associated transaction. There are a few such calls lingering throughout the codebase. Remove all remaining unnecessary calls to xfs_defer_cancel(). This leaves xfs_defer_cancel() calls in two places. The first is the call in the transaction cancel path itself, which facilitates this patch. The second is made via the xfs_defer_finish() error path to provide consistent error semantics with transaction commit. For example, xfs_trans_commit() expects an xfs_defer_finish() failure to clean up the dfops structure before it returns. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29xfs: don't crash the vfs on a garbage inline symlinkDarrick J. Wong
The VFS routine that calls ->get_link blindly copies whatever's returned into the user's buffer. If we return a NULL pointer, the vfs will crash on the null pointer. Therefore, return -EFSCORRUPTED instead of blowing up the kernel. [dgc: clean up with hch's suggestions] Reported-by: wen.xu@gatech.edu Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-28f2fs: fix missing up_readJaegeuk Kim
This patch fixes missing up_read call. Fixes: c9b60788fc76 ("f2fs: fix to do sanity check with block address in main area") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-28f2fs: return correct errno in f2fs_gcJaegeuk Kim
This fixes overriding error number in f2fs_gc. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-28f2fs: avoid f2fs_bug_on if f2fs_get_meta_page_nofail got EIOJaegeuk Kim
This patch avoids BUG_ON when f2fs_get_meta_page_nofail got EIO during xfstests/generic/475. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-28fuse: extract fuse_emit() helperMiklos Szeredi
Prepare for cache filling by introducing a helper for emitting a single directory entry. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: split out readdir.cMiklos Szeredi
Directory reading code is about to grow larger, so split it out from dir.c into a new source file. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: Use hash table to link processing requestKirill Tkhai
We noticed the performance bottleneck in FUSE running our Virtuozzo storage over rdma. On some types of workload we observe 20% of times spent in request_find() in profiler. This function is iterating over long requests list, and it scales bad. The patch introduces hash table to reduce the number of iterations, we do in this function. Hash generating algorithm is taken from hash_add() function, while 256 lines table is used to store pending requests. This fixes problem and improves the performance. Reported-by: Alexey Kuznetsov <kuznet@virtuozzo.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: kill req->intr_uniqueKirill Tkhai
This field is not needed after the previous patch, since we can easily convert request ID to interrupt request ID and vice versa. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: change interrupt requests allocation algorithmKirill Tkhai
Using of two unconnected IDs req->in.h.unique and req->intr_unique does not allow to link requests to a hash table. We need can't use none of them as a key to calculate hash. This patch changes the algorithm of allocation of IDs for a request. Plain requests obtain even ID, while interrupt requests are encoded in the low bit. So, in next patches we will be able to use the rest of ID bits to calculate hash, and the hash will be the same for plain and interrupt requests. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: do not take fc->lock in fuse_request_send_background()Kirill Tkhai
Currently, we take fc->lock there only to check for fc->connected. But this flag is changed only on connection abort, which is very rare operation. So allow checking fc->connected under just fc->bg_lock and use this lock (as well as fc->lock) when resetting fc->connected. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: introduce fc->bg_lockKirill Tkhai
To reduce contention of fc->lock, this patch introduces bg_lock for protection of fields related to background queue. These are: max_background, congestion_threshold, num_background, active_background, bg_queue and blocked. This allows next patch to make async reads not requiring fc->lock, so async reads and writes will have better performance executed in parallel. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: add locking to max_background and congestion_threshold changesKirill Tkhai
Functions sequences like request_end()->flush_bg_queue() require that max_background and congestion_threshold are constant during their execution. Otherwise, checks like if (fc->num_background == fc->max_background) made in different time may behave not like expected. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: use READ_ONCE on congestion_threshold and max_backgroundKirill Tkhai
Since they are of unsigned int type, it's allowed to read them unlocked during reporting to userspace. Let's underline this fact with READ_ONCE() macroses. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: use list_first_entry() in flush_bg_queue()Kirill Tkhai
This cleanup patch makes the function to use the primitive instead of direct dereferencing. Also, move fiq dereferencing out of cycle, since it's always constant. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: add support for copy_file_range()Niels de Vos
There are several FUSE filesystems that can implement server-side copy or other efficient copy/duplication/clone methods. The copy_file_range() syscall is the standard interface that users have access to while not depending on external libraries that bypass FUSE. Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: fix blocked_waitq wakeupMiklos Szeredi
Using waitqueue_active() is racy. Make sure we issue a wake_up() unconditionally after storing into fc->blocked. After that it's okay to optimize with waitqueue_active() since the first wake up provides the necessary barrier for all waiters, not the just the woken one. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 3c18ef8117f0 ("fuse: optimize wake_up") Cc: <stable@vger.kernel.org> # v3.10
2018-09-28fuse: set FR_SENT while lockedMiklos Szeredi
Otherwise fuse_dev_do_write() could come in and finish off the request, and the set_bit(FR_SENT, ...) could trigger the WARN_ON(test_bit(FR_SENT, ...)) in request_end(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reported-by: syzbot+ef054c4d3f64cd7f7cec@syzkaller.appspotmai Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2
2018-09-28fuse: Fix use-after-free in fuse_dev_do_write()Kirill Tkhai
After we found req in request_find() and released the lock, everything may happen with the req in parallel: cpu0 cpu1 fuse_dev_do_write() fuse_dev_do_write() req = request_find(fpq, ...) ... spin_unlock(&fpq->lock) ... ... req = request_find(fpq, oh.unique) ... spin_unlock(&fpq->lock) queue_interrupt(&fc->iq, req); ... ... ... ... ... request_end(fc, req); fuse_put_request(fc, req); ... queue_interrupt(&fc->iq, req); Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2
2018-09-28fuse: Fix use-after-free in fuse_dev_do_read()Kirill Tkhai
We may pick freed req in this way: [cpu0] [cpu1] fuse_dev_do_read() fuse_dev_do_write() list_move_tail(&req->list, ...); ... spin_unlock(&fpq->lock); ... ... request_end(fc, req); ... fuse_put_request(fc, req); if (test_bit(FR_INTERRUPTED, ...)) queue_interrupt(fiq, req); Fix that by keeping req alive until we finish all manipulations. Reported-by: syzbot+4e975615ca01f2277bdd@syzkaller.appspotmail.com Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2
2018-09-27Merge tag 'for_v4.19-rc6' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Jan writes: "an ext2 patch fixing fsync(2) for DAX mounts." * tag 'for_v4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2, dax: set ext2_dax_aops for dax files
2018-09-27dax: Fix deadlock in dax_lock_mapping_entry()Jan Kara
When dax_lock_mapping_entry() has to sleep to obtain entry lock, it will fail to unlock mapping->i_pages spinlock and thus immediately deadlock against itself when retrying to grab the entry lock again. Fix the problem by unlocking mapping->i_pages before retrying. Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()") Reported-by: Barret Rhoden <brho@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-09-27fanotify: store fanotify_init() flags in group's fanotify_dataAmir Goldstein
This averts the need to re-generate flags in fanotify_show_fdinfo() and sets the scene for addition of more upcoming flags without growing new members to the fanotify_data struct. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-26f2fs: mark inode dirty explicitly in recover_inode()Chao Yu
Mark inode dirty explicitly in the end of recover_inode() to make sure that all recoverable fields can be persisted later. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26f2fs: fix to recover inode's crtime during PORChao Yu
Testcase to reproduce this bug: 1. mkfs.f2fs -O extra_attr -O inode_crtime /dev/sdd 2. mount -t f2fs /dev/sdd /mnt/f2fs 3. touch /mnt/f2fs/file 4. xfs_io -f /mnt/f2fs/file -c "fsync" 5. godown /mnt/f2fs 6. umount /mnt/f2fs 7. mount -t f2fs /dev/sdd /mnt/f2fs 8. xfs_io -f /mnt/f2fs/file -c "statx -r" stat.btime.tv_sec = 0 stat.btime.tv_nsec = 0 This patch fixes to recover inode creation time fields during mount. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26f2fs: fix to recover inode's i_gc_failures during PORChao Yu
inode.i_gc_failures is used to indicate that skip count of migrating on blocks of inode, we should guarantee it can be recovered in sudden power-off case. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26f2fs: fix to recover inode's i_flags during PORChao Yu
Testcase to reproduce this bug: 1. mkfs.f2fs /dev/sdd 2. mount -t f2fs /dev/sdd /mnt/f2fs 3. touch /mnt/f2fs/file 4. sync 5. chattr +A /mnt/f2fs/file 6. xfs_io -f /mnt/f2fs/file -c "fsync" 7. godown /mnt/f2fs 8. umount /mnt/f2fs 9. mount -t f2fs /dev/sdd /mnt/f2fs 10. lsattr /mnt/f2fs/file -----------------N- /mnt/f2fs/file But actually, we expect the corrct result is: -------A---------N- /mnt/f2fs/file The reason is we didn't recover inode.i_flags field during mount, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26f2fs: fix to recover inode's project id during PORChao Yu
Testcase to reproduce this bug: 1. mkfs.f2fs -O extra_attr -O project_quota /dev/sdd 2. mount -t f2fs /dev/sdd /mnt/f2fs 3. touch /mnt/f2fs/file 4. sync 5. chattr -p 1 /mnt/f2fs/file 6. xfs_io -f /mnt/f2fs/file -c "fsync" 7. godown /mnt/f2fs 8. umount /mnt/f2fs 9. mount -t f2fs /dev/sdd /mnt/f2fs 10. lsattr -p /mnt/f2fs/file 0 -----------------N- /mnt/f2fs/file But actually, we expect the correct result is: 1 -----------------N- /mnt/f2fs/file The reason is we didn't recover inode.i_projid field during mount, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26f2fs: update i_size after DIO completionJaegeuk Kim
This is related to ee70daaba82d ("xfs: update i_size after unwritten conversion in dio completion") If we update i_size during dio_write, dio_read can read out stale data, which breaks xfstests/465. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26f2fs: report ENOENT correctly in f2fs_renameJaegeuk Kim
This fixes wrong error report in f2fs_rename. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-25f2fs: fix remount problem of option io_bitsChengguang Xu
Currently we show mount option "io_bits=%u" as "io_size=%uKB", it will cause option parsing problem(unrecognized mount option) in remount. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-25nfsd: remove set but not used variable 'dirp'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: fs/nfsd/vfs.c: In function 'nfsd_create': fs/nfsd/vfs.c:1279:16: warning: variable 'dirp' set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-09-25NFSD introduce async copy featureOlga Kornievskaia
Upon receiving a request for async copy, create a new kthread. If we get asynchronous request, make sure to copy the needed arguments/state from the stack before starting the copy. Then start the thread and reply back to the client indicating copy is asynchronous. nfsd_copy_file_range() will copy in a loop over the total number of bytes is needed to copy. In case a failure happens in the middle, we ignore the error and return how much we copied so far. Once done creating a workitem for the callback workqueue and send CB_OFFLOAD with the results. The lifetime of the copy stateid is bound to the vfs copy. This way we don't need to keep the nfsd_net structure for the callback. We could keep it around longer so that an OFFLOAD_STATUS that came late would still get results, but clients should be able to deal without that. We handle OFFLOAD_CANCEL by sending a signal to the copy thread and calling kthread_stop. A client should cancel any ongoing copies before calling DESTROY_CLIENT; if not, we return a CLIENT_BUSY error. If the client is destroyed for some other reason (lease expiration, or server shutdown), we must clean up any ongoing copies ourselves. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> [colin.king@canonical.com: fix leak in error case] [bfields@fieldses.org: remove signalling, merge patches] Signed-off-by: J. Bruce Fields <bfields@redhat.com>