summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-11-30Merge tag 'fscache-fixes-20181130' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache and cachefiles fixes from David Howells: "Misc fixes: - Fix an assertion failure at fs/cachefiles/xattr.c:138 caused by a race between a cache object lookup failing and someone attempting to reenable that object, thereby triggering an update of the object's attributes. - Fix an assertion failure at fs/fscache/operation.c:449 caused by a split atomic subtract and atomic read that allows a race to happen. - Fix a leak of backing pages when simultaneously reading the same page from the same object from two or more threads. - Fix a hang due to a race between a cache object being discarded and the corresponding cookie being reenabled. There are also some minor cleanups: - Cast an enum value to a different enum type to prevent clang from generating a warning. This shouldn't cause any sort of change in the emitted code. - Use ktime_get_real_seconds() instead of get_seconds(). This is just used to uniquify a filename for an object to be placed in the graveyard. Objects placed there are deleted by cachfilesd in userspace immediately thereafter. - Remove an initialised, but otherwise unused variable. This should have been entirely optimised away anyway" * tag 'fscache-fixes-20181130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache, cachefiles: remove redundant variable 'cache' cachefiles: avoid deprecated get_seconds() cachefiles: Explicitly cast enumerated type in put_object fscache: fix race between enablement and dropping of object cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active fscache: Fix race in fscache_op_complete() due to split atomic_sub & read cachefiles: Fix an assertion failure when trying to update a failed object
2018-11-30ocfs2: fix potential use after freePan Bian
ocfs2_get_dentry() calls iput(inode) to drop the reference count of inode, and if the reference count hits 0, inode is freed. However, in this function, it then reads inode->i_generation, which may result in a use after free bug. Move the put operation later. Link: http://lkml.kernel.org/r/1543109237-110227-1-git-send-email-bianpan2016@163.com Fixes: 781f200cb7a("ocfs2: Remove masklog ML_EXPORT.") Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmasAndrea Arcangeli
After the VMA to register the uffd onto is found, check that it has VM_MAYWRITE set before allowing registration. This way we inherit all common code checks before allowing to fill file holes in shmem and hugetlbfs with UFFDIO_COPY. The userfaultfd memory model is not applicable for readonly files unless it's a MAP_PRIVATE. Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com Fixes: ff62a3421044 ("hugetlb: implement memfd sealing") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Hugh Dickins <hughd@google.com> Reported-by: Jann Horn <jannh@google.com> Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Cc: <stable@vger.kernel.org> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30hfsplus: do not free node before usingPan Bian
hfs_bmap_free() frees node via hfs_bnode_put(node). However it then reads node->this when dumping error message on an error path, which may result in a use-after-free bug. This patch frees node only when it is never used. Link: http://lkml.kernel.org/r/1543053441-66942-1-git-send-email-bianpan2016@163.com Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30hfs: do not free node before usingPan Bian
hfs_bmap_free() frees the node via hfs_bnode_put(node). However, it then reads node->this when dumping error message on an error path, which may result in a use-after-free bug. This patch frees the node only when it is never again used. Link: http://lkml.kernel.org/r/1542963889-128825-1-git-send-email-bianpan2016@163.com Fixes: a1185ffa2fc ("HFS rewrite") Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Perches <joe@perches.com> Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30ocfs2: fix deadlock caused by ocfs2_defrag_extent()Larry Chen
ocfs2_defrag_extent may fall into deadlock. ocfs2_ioctl_move_extents ocfs2_ioctl_move_extents ocfs2_move_extents ocfs2_defrag_extent ocfs2_lock_allocators_move_extents ocfs2_reserve_clusters inode_lock GLOBAL_BITMAP_SYSTEM_INODE __ocfs2_flush_truncate_log inode_lock GLOBAL_BITMAP_SYSTEM_INODE As backtrace shows above, ocfs2_reserve_clusters() will call inode_lock against the global bitmap if local allocator has not sufficient cluters. Once global bitmap could meet the demand, ocfs2_reserve_cluster will return success with global bitmap locked. After ocfs2_reserve_cluster(), if truncate log is full, __ocfs2_flush_truncate_log() will definitely fall into deadlock because it needs to inode_lock global bitmap, which has already been locked. To fix this bug, we could remove from ocfs2_lock_allocators_move_extents() the code which intends to lock global allocator, and put the removed code after __ocfs2_flush_truncate_log(). ocfs2_lock_allocators_move_extents() is referred by 2 places, one is here, the other does not need the data allocator context, which means this patch does not affect the caller so far. Link: http://lkml.kernel.org/r/20181101071422.14470-1-lchen@suse.com Signed-off-by: Larry Chen <lchen@suse.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro: "Assorted fixes all over the place. The iov_iter one is this cycle regression (splice from UDP triggering WARN_ON()), the rest is older" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs: Use d_instantiate() rather than d_add() and don't d_drop() afs: Fix missing net error handling afs: Fix validation/callback interaction iov_iter: teach csum_and_copy_to_iter() to handle pipe-backed ones exportfs: do not read dentry after free exportfs: fix 'passing zero to ERR_PTR()' warning aio: fix failure to put the file pointer sysv: return 'err' instead of 0 in __sysv_write_inode
2018-11-30Merge tag 'pstore-v4.20-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fix from Kees Cook: "Fix corrupted compression due to unlucky size choice with ECC" * tag 'pstore-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Correctly calculate usable PRZ bytes
2018-11-30fs/locks: allow a lock request to block other requests.NeilBrown
Currently, a lock can block pending requests, but all pending requests are equal. If lots of pending requests are mutually exclusive, this means they will all be woken up and all but one will fail. This can hurt performance. So we will allow pending requests to block other requests. Only the first request will be woken, and it will wake the others. This patch doesn't implement this fully, but prepares the way. - It acknowledges that a request might be blocking other requests, and when the request is converted to a lock, those blocked requests are moved across. - When a request is requeued or discarded, all blocked requests are woken. - When deadlock-detection looks for the lock which blocks a given request, we follow the chain of ->fl_blocker all the way to the top. Tested-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30fs/locks: use properly initialized file_lock when unlocking.NeilBrown
Both locks_remove_posix() and locks_remove_flock() use a struct file_lock without calling locks_init_lock() on it. This means the various list_heads are not initialized, which will become a problem with a later patch. So change them both to initialize properly. For flock locks, this involves using flock_make_lock(), and changing it to allow a file_lock to be passed in, so memory allocation isn't always needed. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30ocfs2: properly initial file_lock used for unlock.NeilBrown
Rather than assuming all-zeros is sufficient, use the available API to initialize the file_lock structure use for unlock. VFS-level changes will soon make it important that the list_heads in file_lock are always properly initialized. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30gfs2: properly initial file_lock used for unlock.NeilBrown
Rather than assuming all-zeros is sufficient, use the available API to initialize the file_lock structure use for unlock. VFS-level changes will soon make it important that the list_heads in file_lock are always properly initialized. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30NFS: use locks_copy_lock() to copy locks.NeilBrown
Using memcpy() to copy lock requests leaves the various list_head in an inconsistent state. As we will soon attach a list of conflicting request to another pending request, we need these lists to be consistent. So change NFS to use locks_init_lock/locks_copy_lock instead of memcpy. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30fs/locks: split out __locks_wake_up_blocks().NeilBrown
This functionality will be useful in future patches, so split it out from locks_wake_up_blocks(). Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30fs/locks: rename some lists and pointers.NeilBrown
struct file lock contains an 'fl_next' pointer which is used to point to the lock that this request is blocked waiting for. So rename it to fl_blocker. The fl_blocked list_head in an active lock is the head of a list of blocked requests. In a request it is a node in that list. These are two distinct uses, so replace with two list_heads with different names. fl_blocked_requests is the head of a list of blocked requests fl_blocked_member is a node in a member of that list. The two different list_heads are never used at the same time, but that will change in a future patch. Note that a tracepoint is changed to report fl_blocker instead of fl_next. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30fscache, cachefiles: remove redundant variable 'cache'Colin Ian King
Variable 'cache' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'cache' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30cachefiles: avoid deprecated get_seconds()Arnd Bergmann
get_seconds() returns an unsigned long can overflow on some architectures and is deprecated because of that. In cachefs, we cast that number to a a 32-bit integer, which will overflow in year 2106 on all architectures. As confirmed by David Howells, the overflow probably isn't harmful in the end, since the timestamps are only used to make the file names unique, but they don't strictly have to be in monotonically increasing order since the files only exist in order to be deleted as quickly as possible. Moving to ktime_get_real_seconds() avoids the deprecated interface. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30cachefiles: Explicitly cast enumerated type in put_objectNathan Chancellor
Clang warns when one enumerated type is implicitly converted to another. fs/cachefiles/namei.c:247:50: warning: implicit conversion from enumeration type 'enum cachefiles_obj_ref_trace' to different enumeration type 'enum fscache_obj_ref_trace' [-Wenum-conversion] cache->cache.ops->put_object(&xobject->fscache, cachefiles_obj_put_wait_retry); Silence this warning by explicitly casting to fscache_obj_ref_trace, which is also done in put_object. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30fscache: fix race between enablement and dropping of objectNeilBrown
It was observed that a process blocked indefintely in __fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP to be cleared via fscache_wait_for_deferred_lookup(). At this time, ->backing_objects was empty, which would normaly prevent __fscache_read_or_alloc_page() from getting to the point of waiting. This implies that ->backing_objects was cleared *after* __fscache_read_or_alloc_page was was entered. When an object is "killed" and then "dropped", FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is ->backing_objects cleared. This leaves a window where something else can set FSCACHE_COOKIE_LOOKING_UP and __fscache_read_or_alloc_page() can start waiting, before ->backing_objects is cleared There is some uncertainty in this analysis, but it seems to be fit the observations. Adding the wake in this patch will be handled correctly by __fscache_read_or_alloc_page(), as it checks if ->backing_objects is empty again, after waiting. Customer which reported the hang, also report that the hang cannot be reproduced with this fix. The backtrace for the blocked process looked like: PID: 29360 TASK: ffff881ff2ac0f80 CPU: 3 COMMAND: "zsh" #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8 #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache] #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache] #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs] #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs] #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73 #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs] #10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756 #11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa #12 [ffff881ff43eff18] sys_read at ffffffff811fda62 #13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30fs: fix lost error code in dio_completeMaximilian Heyne
commit e259221763a40403d5bb232209998e8c45804ab8 ("fs: simplify the generic_write_sync prototype") reworked callers of generic_write_sync(), and ended up dropping the error return for the directio path. Prior to that commit, in dio_complete(), an error would be bubbled up the stack, but after that commit, errors passed on to dio_complete were eaten up. This was reported on the list earlier, and a fix was proposed in https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but never followed up with. We recently hit this bug in our testing where fencing io errors, which were previously erroring out with EIO, were being returned as success operations after this commit. The fix proposed on the list earlier was a little short -- it would have still called generic_write_sync() in case `ret` already contained an error. This fix ensures generic_write_sync() is only called when there's no pending error in the write. Additionally, transferred is replaced with ret to bring this code in line with other callers. Fixes: e259221763a4 ("fs: simplify the generic_write_sync prototype") Reported-by: Ravi Nankani <rnankani@amazon.com> Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Christoph Hellwig <hch@lst.de> CC: Torsten Mehlan <tomeh@amazon.de> CC: Uwe Dannowski <uwed@amazon.de> CC: Amit Shah <aams@amazon.de> CC: David Woodhouse <dwmw@amazon.co.uk> CC: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-30block: avoid extra bio reference for async O_DIRECTChristoph Hellwig
The bio referencing has a trick that doesn't do any actual atomic inc/dec on the reference count until we have to elevator to > 1. For the async IO O_DIRECT case, we can't use the simple DIO variants, so we use __blkdev_direct_IO(). It always grabs an extra reference to the bio after allocation, which means we then enter the slower path of actually having to do atomic_inc/dec on the count. We don't need to do that for the async case, unless we end up going multi-bio, in which case we're already doing huge amounts of IO. For the smaller IO case (< BIO_MAX_PAGES), we can do without the extra ref. Based on an earlier patch (and commit log) from Jens Axboe. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-29afs: Use d_instantiate() rather than d_add() and don't d_drop()David Howells
Use d_instantiate() rather than d_add() and don't d_drop() in afs_vnode_new_inode(). The dentry shouldn't be removed as it's not changing its name. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29afs: Fix missing net error handlingDavid Howells
kAFS can be given certain network errors (EADDRNOTAVAIL, EHOSTDOWN and ERFKILL) that it doesn't handle in its server/address rotation algorithms. They cause the probing and rotation to abort immediately rather than rotating. Fix this by: (1) Abstracting out the error prioritisation from the VL and FS rotation algorithms into a common function and expand usage into the server probing code. When multiple errors are available, this code selects the one we'd prefer to return. (2) Add handling for EADDRNOTAVAIL, EHOSTDOWN and ERFKILL. Fixes: 0fafdc9f888b ("afs: Fix file locking") Fixes: 0338747d8454 ("afs: Probe multiple fileservers simultaneously") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29afs: Fix validation/callback interactionDavid Howells
When afs_validate() is called to validate a vnode (inode), there are two unhandled cases in the fastpath at the top of the function: (1) If the vnode is promised (AFS_VNODE_CB_PROMISED is set), the break counters match and the data has expired, then there's an implicit case in which the vnode needs revalidating. This has no consequences since the default "valid = false" set at the top of the function happens to do the right thing. (2) If the vnode is not promised and it hasn't been deleted (AFS_VNODE_DELETED is not set) then there's a default case we're not handling in which the vnode is invalid. If the vnode is invalid, we need to bring cb_s_break and cb_v_break up to date before we refetch the status. As a consequence, once the server loses track of the client (ie. sufficient time has passed since we last sent it an operation), it will send us a CB.InitCallBackState* operation when we next try to talk to it. This calls afs_init_callback_state() which increments afs_server::cb_s_break, but this then doesn't propagate to the afs_vnode record. The result being that every afs_validate() call thereafter sends a status fetch operation to the server. Clarify and fix this by: (A) Setting valid in all the branches rather than initialising it at the top so that the compiler catches where we've missed. (B) Restructuring the logic in the 'promised' branch so that we set valid to false if the callback is due to expire (or has expired) and so that the final case is that the vnode is still valid. (C) Adding an else-statement that ups cb_s_break and cb_v_break if the promised and deleted cases don't match. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29VFS: use synchronize_rcu_expedited() in namespace_unlock()NeilBrown
The synchronize_rcu() in namespace_unlock() is called every time a filesystem is unmounted. If a great many filesystems are mounted, this can cause a noticable slow-down in, for example, system shutdown. The sequence: mkdir -p /tmp/Mtest/{0..5000} time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done time umount /tmp/Mtest/* on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and 100 seconds to unmount them. Boot the same VM with 1 CPU and it takes 18 seconds to mount the tmpfs filesystems, but only 36 to unmount. If we change the synchronize_rcu() to synchronize_rcu_expedited() the umount time on a 4-cpu VM drop to 0.6 seconds I think this 200-fold speed up is worth the slightly high system impact of using synchronize_rcu_expedited(). Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective) Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29pstore/ram: Correctly calculate usable PRZ bytesKees Cook
The actual number of bytes stored in a PRZ is smaller than the bytes requested by platform data, since there is a header on each PRZ. Additionally, if ECC is enabled, there are trailing bytes used as well. Normally this mismatch doesn't matter since PRZs are circular buffers and the leading "overflow" bytes are just thrown away. However, in the case of a compressed record, this rather badly corrupts the results. This corruption was visible with "ramoops.mem_size=204800 ramoops.ecc=1". Any stored crashes would not be uncompressable (producing a pstorefs "dmesg-*.enc.z" file), and triggering errors at boot: [ 2.790759] pstore: crypto_comp_decompress failed, ret = -22! Backporting this depends on commit 70ad35db3321 ("pstore: Convert console write to use ->write_buf") Reported-by: Joel Fernandes <joel@joelfernandes.org> Fixes: b0aad7a99c1d ("pstore: Add compression support to pstore") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-11-29Merge tag 'fixes_for_v4.20-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2 and udf fixes from Jan Kara: "Three small ext2 and udf fixes" * tag 'fixes_for_v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: fix potential use after free ext2: initialize opts.s_mount_opt as zero before using it udf: Allow mounting volumes with incorrect identification strings
2018-11-28nfsd: clean up indentation, increase indentation in switch statementColin Ian King
Trivial fix to clean up indentation, add in missing tabs. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-28nfsd: fix a warning in __cld_pipe_upcall()Scott Mayhew
__cld_pipe_upcall() emits a "do not call blocking ops when !TASK_RUNNING" warning due to the dput() call in rpc_queue_upcall(). Fix it by using a completion instead of hand coding the wait. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-28nfsd4: fix crash on writing v4_end_grace before nfsd startupJ. Bruce Fields
Anatoly Trosinenko reports that this: 1) Checkout fresh master Linux branch (tested with commit e195ca6cb) 2) Copy x84_64-config-4.14 to .config, then enable NFS server v4 and build 3) From `kvm-xfstests shell`: results in NULL dereference in locks_end_grace. Check that nfsd has been started before trying to end the grace period. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-28dax: Don't access a freed inodeMatthew Wilcox
After we drop the i_pages lock, the inode can be freed at any time. The get_unlocked_entry() code has no choice but to reacquire the lock, so it can't be used here. Create a new wait_entry_unlocked() which takes care not to acquire the lock or dereference the address_space in any way. Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()") Cc: <stable@vger.kernel.org> Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-11-28dax: Check page->mapping isn't NULLMatthew Wilcox
If we race with inode destroy, it's possible for page->mapping to be NULL before we even enter this routine, as well as after having slept waiting for the dax entry to become unlocked. Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()") Cc: <stable@vger.kernel.org> Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-11-28Merge tag 'for-4.20-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Some of these bugs are being hit during testing so we'd like to get them merged, otherwise there are usual stability fixes for stable trees" * tag 'for-4.20-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: relocation: set trans to be NULL after ending transaction Btrfs: fix race between enabling quotas and subvolume creation Btrfs: send, fix infinite loop due to directory rename dependencies Btrfs: ensure path name is null terminated at btrfs_control_ioctl Btrfs: fix rare chances for data loss when doing a fast fsync btrfs: Always try all copies when reading extent buffers
2018-11-28cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is activeKiran Kumar Modukuri
[Description] In a heavily loaded system where the system pagecache is nearing memory limits and fscache is enabled, pages can be leaked by fscache while trying read pages from cachefiles backend. This can happen because two applications can be reading same page from a single mount, two threads can be trying to read the backing page at same time. This results in one of the threads finding that a page for the backing file or netfs file is already in the radix tree. During the error handling cachefiles does not clean up the reference on backing page, leading to page leak. [Fix] The fix is straightforward, to decrement the reference when error is encountered. [dhowells: Note that I've removed the clearance and put of newpage as they aren't attested in the commit message and don't appear to actually achieve anything since a new page is only allocated is newpage!=NULL and any residual new page is cleared before returning.] [Testing] I have tested the fix using following method for 12+ hrs. 1) mkdir -p /mnt/nfs ; mount -o vers=3,fsc <server_ip>:/export /mnt/nfs 2) create 10000 files of 2.8MB in a NFS mount. 3) start a thread to simulate heavy VM presssure (while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done)& 4) start multiple parallel reader for data set at same time find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & .. .. find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & 5) finally check using cat /proc/fs/fscache/stats | grep -i pages ; free -h , cat /proc/meminfo and page-types -r -b lru to ensure all pages are freed. Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Shantanu Goel <sgoel01@yahoo.com> Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> [dja: forward ported to current upstream] Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-28dlm: NULL check before kmem_cache_destroy is not neededWen Yang
kmem_cache_destroy(NULL) is safe, so removes NULL check before freeing the mem. This patch also fix ifnullfree.cocci warnings. Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-28cachefiles: Fix an assertion failure when trying to update a failed objectDavid Howells
If cachefiles gets an error other then ENOENT when trying to look up an object in the cache (in this case, EACCES), the object state machine will eventually transition to the DROP_OBJECT state. This state invokes fscache_drop_object() which tries to sync the auxiliary data with the cache (this is done lazily since commit 402cb8dda949d) on an incomplete cache object struct. The problem comes when cachefiles_update_object_xattr() is called to rewrite the xattr holding the data. There's an assertion there that the cache object points to a dentry as we're going to update its xattr. The assertion trips, however, as dentry didn't get set. Fix the problem by skipping the update in cachefiles if the object doesn't refer to a dentry. A better way to do it could be to skip the update from the DROP_OBJECT state handler in fscache, but that might deny the cache the opportunity to update intermediate state. If this error occurs, the kernel log includes lines that look like the following: CacheFiles: Lookup failed error -13 CacheFiles: CacheFiles: Assertion failed ------------[ cut here ]------------ kernel BUG at fs/cachefiles/xattr.c:138! ... Workqueue: fscache_object fscache_object_work_func [fscache] RIP: 0010:cachefiles_update_object_xattr.cold.4+0x18/0x1a [cachefiles] ... Call Trace: cachefiles_update_object+0xdd/0x1c0 [cachefiles] fscache_update_aux_data+0x23/0x30 [fscache] fscache_drop_object+0x18e/0x1c0 [fscache] fscache_object_work_func+0x74/0x2b0 [fscache] process_one_work+0x18d/0x340 worker_thread+0x2e/0x390 ? pwq_unbound_release_workfn+0xd0/0xd0 kthread+0x112/0x130 ? kthread_bind+0x30/0x30 ret_from_fork+0x35/0x40 Note that there are actually two issues here: (1) EACCES happened on a cache object and (2) an oops occurred. I think that the second is a consequence of the first (it certainly looks like it ought to be). This patch only deals with the second. Fixes: 402cb8dda949 ("fscache: Attach the index key and aux data to the cookie") Reported-by: Zhibin Li <zhibli@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-28f2fs: avoid frequent costly fsck triggersJaegeuk Kim
If we want to re-enable nat_bits, we rely on fsck which requires full scan of directory tree. Let's do that by regular fsck or unclean shutdown. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-27lockd: fix decoding of TEST resultsJ. Bruce Fields
We fail to advance the read pointer when reading the stat.oh field that identifies the lock-holder in a TEST result. This turns out not to matter if the server is knfsd, which always returns a zero-length field. But other servers (Ganesha is an example) may not do this. The result is bad values in fcntl F_GETLK results. Fix this. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-27nfsd4: skip unused assignmentJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-27nfsd4: forbid all renames during grace periodJ. Bruce Fields
The idea here was that renaming a file on a nosubtreecheck export would make lookups of the old filehandle return STALE, making it impossible for clients to reclaim opens. But during the grace period I think we should also hold off on operations that would break delegations. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-27nfsd4: remove unused nfs4_check_olstateid parameterJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-27nfsd4: zero-length WRITE should succeedJ. Bruce Fields
Zero-length writes are legal; from 5661 section 18.32.3: "If the count is zero, the WRITE will succeed and return a count of zero subject to permissions checking". This check is unnecessary and is causing zero-length reads to return EINVAL. Cc: stable@vger.kernel.org Fixes: 3fd9557aec91 "NFSD: Refactor the generic write vector fill helper" Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-27fs/file: Replace synchronize_sched() with synchronize_rcu()Paul E. McKenney
Now that synchronize_rcu() waits for preempt-disable regions of code as well as RCU read-side critical sections, synchronize_sched() can be replaced by synchronize_rcu(). This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <linux-fsdevel@vger.kernel.org>
2018-11-27kernfs: Improve kernfs_notify() poll notification latencyRadu Rendec
kernfs_notify() does two notifications: poll and fsnotify. Originally, both notifications were done from scheduled work context and all that kernfs_notify() did was schedule the work. This patch simply moves the poll notification from the scheduled work handler to kernfs_notify(). The fsnotify notification still needs to be done from scheduled work context because it can sleep (it needs to lock a mutex). If the poll notification is time critical (the notified thread needs to wake as quickly as possible), it's better to do it from kernfs_notify() directly. One example is calling sysfs_notify_dirent() from a hardware interrupt handler to wake up a thread and handle the interrupt in user space. Signed-off-by: Radu Rendec <radu.rendec@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-27ext2: fix potential use after freePan Bian
The function ext2_xattr_set calls brelse(bh) to drop the reference count of bh. After that, bh may be freed. However, following brelse(bh), it reads bh->b_data via macro HDR(bh). This may result in a use-after-free bug. This patch moves brelse(bh) after reading field. CC: stable@vger.kernel.org Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-27ext2: initialize opts.s_mount_opt as zero before using itxingaopeng
We need to initialize opts.s_mount_opt as zero before using it, else we may get some unexpected mount options. Fixes: 088519572ca8 ("ext2: Parse mount options into a dedicated structure") CC: stable@vger.kernel.org Signed-off-by: xingaopeng <xingaopeng@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-26f2fs: fix m_may_create to make OPU DIO write correctlyJia Zhu
Previously, we added a parameter @map.m_may_create to trigger OPU allocation and call f2fs_balance_fs() correctly. But in get_more_blocks(), @create has been overwritten by below code. So the function f2fs_map_blocks() will not allocate new block address but directly go out. Meanwile,there are several functions calling f2fs_map_blocks() directly and @map.m_may_create not initialized. CODE: create = dio->op == REQ_OP_WRITE; if (dio->flags & DIO_SKIP_HOLES) { if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> i_blkbits)) create = 0; } This patch fixes it. Signed-off-by: Jia Zhu <zhujia13@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix to update new block address correctly for OPUJia Zhu
Previously, we allocated a new block address for OPU mode in direct_IO. But the new address couldn't be assigned to @map->m_pblk correctly. This patch fix it. Cc: <stable@vger.kernel.org> Fixes: 511f52d02f05 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Jia Zhu <zhujia13@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: adjust trace print in f2fs_get_victim() to cover all pathsSahitya Tummala
Adjust the trace print in f2fs_get_victim() to cover GC done by F2FS_IOC_GARBAGE_COLLECT_RANGE. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix to allow node segment for GC by ioctl pathSahitya Tummala
Allow node type segments also to be GC'd via f2fs ioctl F2FS_IOC_GARBAGE_COLLECT_RANGE. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>