summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-12-15nfs: don't wait on commit in nfs_commit_inode() if there were no commit requestsScott Mayhew
If there were no commit requests, then nfs_commit_inode() should not wait on the commit or mark the inode dirty, otherwise the following BUG_ON can be triggered: [ 1917.130762] kernel BUG at fs/inode.c:578! [ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1] [ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries [ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G ------------ T 3.10.0-768.el7.ppc64 #1 [ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000 [ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80 [ 1917.130816] REGS: c00000004cea3720 TRAP: 0700 Tainted: G ------------ T (3.10.0-768.el7.ppc64) [ 1917.130820] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> CR: 22002822 XER: 20000000 [ 1917.130828] CFAR: c00000000011f594 SOFTE: 1 GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750 GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03 GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8 GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0 GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8 [ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350 [ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350 [ 1917.130873] Call Trace: [ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable) [ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270 [ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0 [ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs] [ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0 [ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180 [ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150 [ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100 [ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60 [ 1917.130919] Instruction dump: [ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0 [ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 <0b0a0000> 892d02a4 2f890000 40de0134 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org # 4.5+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-15nfs: fix a deadlock in nfs client initializationScott Mayhew
The following deadlock can occur between a process waiting for a client to initialize in while walking the client list during nfsv4 server trunking detection and another process waiting for the nfs_clid_init_mutex so it can initialize that client: Process 1 Process 2 --------- --------- spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTA->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTB->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); mutex_lock(&nfs_clid_init_mutex); nfs41_walk_client_list(clp, result, cred); nfs_wait_client_init_complete(CLIENTA); (waiting for nfs_clid_init_mutex) Make sure nfs_match_client() only evaluates clients that have completed initialization in order to prevent that deadlock. This patch also fixes v4.0 trunking behavior by not marking the client NFS_CS_READY until the clientid has been confirmed. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "17 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: arch: define weak abort() mm, oom_reaper: fix memory corruption kernel: make groups_sort calling a responsibility group_info allocators mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()' tools/slabinfo-gnuplot: force to use bash shell kcov: fix comparison callback signature mm/slab.c: do not hash pointers when debugging slab mm/page_alloc.c: avoid excessive IRQ disabled times in free_unref_page_list() mm/memory.c: mark wp_huge_pmd() inline to prevent build failure scripts/faddr2line: fix CROSS_COMPILE unset error Documentation/vm/zswap.txt: update with same-value filled page feature exec: avoid gcc-8 warning for get_task_comm autofs: fix careless error in recent commit string.h: workaround for increased stack usage mm/kmemleak.c: make cond_resched() rate-limiting more efficient lib/rbtree,drm/mm: add rbtree_replace_node_cached() include/linux/idr.h: add #include <linux/bug.h>
2017-12-14kernel: make groups_sort calling a responsibility group_info allocatorsThiago Rafael Becker
In testing, we found that nfsd threads may call set_groups in parallel for the same entry cached in auth.unix.gid, racing in the call of groups_sort, corrupting the groups for that entry and leading to permission denials for the client. This patch: - Make groups_sort globally visible. - Move the call to groups_sort to the modifiers of group_info - Remove the call to groups_sort from set_groups Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NeilBrown <neilb@suse.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14exec: avoid gcc-8 warning for get_task_commArnd Bergmann
gcc-8 warns about using strncpy() with the source size as the limit: fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] This is indeed slightly suspicious, as it protects us from source arguments without NUL-termination, but does not guarantee that the destination is terminated. This keeps the strncpy() to ensure we have properly padded target buffer, but ensures that we use the correct length, by passing the actual length of the destination buffer as well as adding a build-time check to ensure it is exactly TASK_COMM_LEN. There are only 23 callsites which I all reviewed to ensure this is currently the case. We could get away with doing only the check or passing the right length, but it doesn't hurt to do both. Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Kees Cook <keescook@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Aleksa Sarai <asarai@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14autofs: fix careless error in recent commitNeilBrown
Commit ecc0c469f277 ("autofs: don't fail mount for transient error") was meant to replace an 'if' with a 'switch', but instead added the 'switch' leaving the case in place. Link: http://lkml.kernel.org/r/87zi6wstmw.fsf@notabene.neil.brown.name Fixes: ecc0c469f277 ("autofs: don't fail mount for transient error") Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NeilBrown <neilb@suse.com> Cc: Ian Kent <raven@themaw.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14Merge tag '4.15-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Small SMB3 fixes for stable and 4.15rc" * tag '4.15-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6: CIFS: don't log STATUS_NOT_FOUND errors for DFS cifs: fix NULL deref in SMB2_read
2017-12-14xfs: allow CoW remap transactions to use reserve blocksDarrick J. Wong
Since we as yet have no way of holding on to the indlen blocks that are reserved as part of CoW fork delalloc reservations, let the CoW remap transaction dip into the reserves so that we avoid failing writes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: avoid infinite loop when cancelling CoW blocks after writeback failureDarrick J. Wong
When we're cancelling a cow range, we don't always delete each extent that we iterate, so we have to move icur backwards in the list to avoid an infinite loop. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: relax is_reflink_inode assert in xfs_reflink_find_cow_mappingDarrick J. Wong
We don't hold the ilock through the entire sequence of xfs_writepage_map -> xfs_map_cow -> xfs_reflink_find_cow_mapping. This means that we can race with another thread that is trying to clear the inode reflink flag, with the result that the flag is set for the xfs_map_cow check but cleared before we get to the assert in find_cow_mapping. When this happens, we blow the assert even though everything is fine. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: remove dest file's post-eof preallocations before reflinkingDarrick J. Wong
If we try to reflink into a file with post-eof preallocations at an offset well past the preallocations, we increase i_size as one would expect. However, those allocations do not have page cache backing them, so they won't get cleaned out on their own. This leads to asserts in the collapse/insert range code and xfs_destroy_inode when they encounter delalloc extents they weren't expecting to find. Since there are plenty of other places where we dump those post-eof blocks, do the same to the reflink destination file before we start remapping extents. This was found by adding clonerange support to fsstress and running it in write-only mode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: move xfs_iext_insert tracepoint to report useful informationDarrick J. Wong
Move the tracepoint in xfs_iext_insert to after the point where we've inserted the extent because otherwise we report stale extent data in the ftrace output. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: account for null transactions in bunmapiDarrick J. Wong
In e1a4e37cc7b665 ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent"), we try to constrain the amount of real extents we unmap from the data fork in a given call so that we don't blow out transaction reservations. However, not all bunmapi operations require a transaction -- if we're only removing a delalloc extent, no transaction is needed, so we have to code against that. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: hold xfs_buf locked between shortform->leaf conversion and the addition ↵Darrick J. Wong
of an attribute The new attribute leaf buffer is not held locked across the transaction roll between the shortform->leaf modification and the addition of the new entry. As a result, the attribute buffer modification being made is not atomic from an operational perspective. Hence the AIL push can grab it in the transient state of "just created" after the initial transaction is rolled, because the buffer has been released. This leads to xfs_attr3_leaf_verify() asserting that hdr.count is zero, treating this as in-memory corruption, and shutting down the filesystem. Darrick ported the original patch to 4.15 and reworked it use the xfs_defer_bjoin helper and hold/join the buffer correctly across the second transaction roll. Signed-off-by: Alex Lyakas <alex@zadarastorage.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: add the ability to join a held buffer to a defer_opsDarrick J. Wong
In certain cases, defer_ops callers will lock a buffer and want to hold the lock across transaction rolls. Similar to ijoined inodes, we want to dirty & join the buffer with each transaction roll in defer_finish so that afterwards the caller still owns the buffer lock and we haven't inadvertently pinned the log. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14ovl: fix overlay: warning prefixAmir Goldstein
Conform two stray warning messages to the standard overlayfs: prefix. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-13Merge tag 'xfs-4.15-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Here are a few more bug fixes & cleanups for 4.15-rc4: - clean up duplicate includes - remove ancient 'no-alloc' crap code that occasionally caused hard fs shutdowns due to lack of proper space reservations - fix regression in FIEMAP behavior when reporting xattr extents" * tag 'xfs-4.15-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: make iomap_begin functions trim iomaps consistently xfs: remove "no-allocation" reservations for file creations fs: xfs: remove duplicate includes
2017-12-13media: dvb_frontend: Add compat_ioctl callbackJaedon Shin
Adds compat_ioctl for 32-bit user space applications on a 64-bit system. [m.chehab@osg.samsung.com: add missing include compat.h] Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-12-12gfs2: Add a crc field to resource group headersAndrew Price
Add the rg_crc field to store a crc32 of the gfs2_rgrp structure. This allows us to check resource group headers' integrity and removes the requirement to check them against the rindex entries in fsck. If this field is found to be zero, it should be ignored (or updated with an accurate value). Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12gfs2: Add rindex fields to rgrp headersAndrew Price
Add rg_data0, rg_data and rg_bitbytes to struct gfs2_rgrp. The fields are identical to their counterparts in struct gfs2_rindex and are intended to reduce the use of the rindex. For now the fields are only written back as the in-memory equivalents in struct gfs2_rgrpd are set using values from the rindex. However, they are needed at this point so that userspace can make use of them, allowing a migration away from the rindex over time. The new fields take up previously reserved space which was explicitly zeroed on write so, in clusters with mixed kernels, these fields could get zeroed after being set and this should not be treated as an error. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12gfs2: Add a next-resource-group pointer to resource groupsAndrew Price
Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The rg_skip field has the following meaning: - If rg_skip is zero, it is considered unset and not useful. - If rg_skip is non-zero, its value will be the number of blocks between this rgrp's address and the next rgrp's address. This can be used as a hint by fsck.gfs2 when rebuilding a bad rindex, for example. This will provide less dependency on the rindex in future, and allow tools such as fsck.gfs2 to iterate the resource groups without keeping the rindex around. The field is updated in gfs2_rgrp_out() so that existing file systems will have it set. This means that any resource groups that aren't ever written will not be updated. The final rgrp is a special case as there is no next rgrp, so it will always have a rg_skip of 0 (unless the fs is extended). Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so the rg_skip field can get set back to 0 in cases where nodes with and without this patch are mixed in a cluster. In some cases, the field may bounce between being set by one node and then zeroed by another which may harm performance slightly, e.g. when two nodes create many small files. In testing this situation is rare but it becomes more likely as the filesystem fills up and there are fewer resource groups to choose from. The problem goes away when all nodes are running with this patch. Dipping into the space currently occupied by the rg_reserved field would have resulted in the same problem as it is also explicitly zeroed, so unfortunately there is no other way around it. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12btrfs: allow us to inject errors at io_ctl_initJosef Bacik
This was instrumental in reproducing a space cache bug. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12btrfs: make open_ctree error injectableJosef Bacik
This allows us to do error injection with BPF for open_ctree. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-11ext4: fix crash when a directory's i_size is too smallChandan Rajendra
On a ppc64 machine, when mounting a fuzzed ext2 image (generated by fsfuzzer) the following call trace is seen, VFS: brelse: Trying to free free buffer WARNING: CPU: 1 PID: 6913 at /root/repos/linux/fs/buffer.c:1165 .__brelse.part.6+0x24/0x40 .__brelse.part.6+0x20/0x40 (unreliable) .ext4_find_entry+0x384/0x4f0 .ext4_lookup+0x84/0x250 .lookup_slow+0xdc/0x230 .walk_component+0x268/0x400 .path_lookupat+0xec/0x2d0 .filename_lookup+0x9c/0x1d0 .vfs_statx+0x98/0x140 .SyS_newfstatat+0x48/0x80 system_call+0x58/0x6c This happens because the directory that ext4_find_entry() looks up has inode->i_size that is less than the block size of the filesystem. This causes 'nblocks' to have a value of zero. ext4_bread_batch() ends up not reading any of the directory file's blocks. This renders the entries in bh_use[] array to continue to have garbage data. buffer_uptodate() on bh_use[0] can then return a zero value upon which brelse() function is invoked. This commit fixes the bug by returning -ENOENT when the directory file has no associated blocks. Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Cc: stable@vger.kernel.org
2017-12-11Merge branches 'cond_resched.2017.12.04a', 'dyntick.2017.11.28a', ↵Paul E. McKenney
'fixes.2017.12.11a', 'srbd.2017.12.05a' and 'torture.2017.12.11a' into HEAD cond_resched.2017.12.04a: Convert cond_resched_rcu_qs() to cond_resched() dyntick.2017.11.28a: Make RCU dynticks handle interrupts from NMI fixes.2017.12.11a: Miscellaneous fixes srbd.2017.12.05a: Remove now-redundant smp_read_barrier_depends() torture.2017.12.11a: Torture-testing update
2017-12-11rhashtable: Change rhashtable_walk_start to return voidTom Herbert
Most callers of rhashtable_walk_start don't care about a resize event which is indicated by a return value of -EAGAIN. So calls to rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something like this is common: ret = rhashtable_walk_start(rhiter); if (ret && ret != -EAGAIN) goto out; Since zero and -EAGAIN are the only possible return values from the function this check is pointless. The condition never evaluates to true. This patch changes rhashtable_walk_start to return void. This simplifies code for the callers that ignore -EAGAIN. For the few cases where the caller cares about the resize event, particularly where the table can be walked in mulitple parts for netlink or seq file dump, the function rhashtable_walk_start_check has been added that returns -EAGAIN on a resize event. Signed-off-by: Tom Herbert <tom@quantonium.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11ovl: Use PTR_ERR_OR_ZERO()Vasyl Gomonovych
Fix ptr_ret.cocci warnings: fs/overlayfs/overlayfs.h:179:11-17: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-11ovl: Sync upper dirty data when syncing overlayfsChengguang Xu
When executing filesystem sync or umount on overlayfs, dirty data does not get synced as expected on upper filesystem. This patch fixes sync filesystem method to keep data consistency for overlayfs. Signed-off-by: Chengguang Xu <cgxu@mykernel.net> Fixes: e593b2bf513d ("ovl: properly implement sync_filesystem()") Cc: <stable@vger.kernel.org> #4.11 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-11ovl: update ctx->pos on impure dir iterationAmir Goldstein
This fixes a regression with readdir of impure dir in overlayfs that is shared to VM via 9p fs. Reported-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Fixes: 4edb83bb1041 ("ovl: constant d_ino for non-merge dirs") Cc: <stable@vger.kernel.org> #4.14 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-11ovl: Pass ovl_get_nlink() parameters in right orderVivek Goyal
Right now we seem to be passing index as "lowerdentry" and origin.dentry as "upperdentry". IIUC, we should pass these parameters in reversed order and this looks like a bug. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Amir Goldstein <amir73il@gmail.com> Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries") Cc: <stable@vger.kernel.org> #v4.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-11ovl: don't follow redirects if redirect_dir=offMiklos Szeredi
Overlayfs is following redirects even when redirects are disabled. If this is unintentional (probably the majority of cases) then this can be a problem. E.g. upper layer comes from untrusted USB drive, and attacker crafts a redirect to enable read access to otherwise unreadable directories. If "redirect_dir=off", then turn off following as well as creation of redirects. If "redirect_dir=follow", then turn on following, but turn off creation of redirects (which is what "redirect_dir=off" does now). This is a backward incompatible change, so make it dependent on a config option. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-10ext4: add missing error check in __ext4_new_inode()Theodore Ts'o
It's possible for ext4_get_acl() to return an ERR_PTR. So we need to add a check for this case in __ext4_new_inode(). Otherwise on an error we can end up oops the kernel. This was getting triggered by xfstests generic/388, which is a test which exercises the shutdown code path. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2017-12-10hpfs: don't bother with the i_version counter or f_versionJeff Layton
HPFS does not set SB_I_VERSION and does not use the i_version counter internally. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Mikulas Patocka <mikulas@twibright.com> Reviewed-by: Mikulas Patocka <mikulas@twibright.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-10Merge tag 'for-4.15-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "This contains a few fixes (error handling, quota leak, FUA vs nobarrier mount option). There's one one worth mentioning separately - an off-by-one fix that leads to overwriting first byte of an adjacent page with 0, out of bounds of the memory allocated by an ioctl. This is under a privileged part of the ioctl, can be triggerd in some subvolume layouts" * tag 'for-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: Fix possible off-by-one in btrfs_search_path_in_tree Btrfs: disable FUA if mounted with nobarrier btrfs: fix missing error return in btrfs_drop_snapshot btrfs: handle errors while updating refcounts in update_ref_for_cow btrfs: Fix quota reservation leak on preallocated files
2017-12-09VFS: Handle lazytime in do_mount()Markus Trippelsdorf
Since commit e462ec50cb5fa ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") the lazytime mount option doesn't get passed on anymore. Fix the issue by handling the option in do_mount(). Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-12-08xfs: make iomap_begin functions trim iomaps consistentlyDarrick J. Wong
Historically, the XFS iomap_begin function only returned mappings for exactly the range queried, i.e. it doesn't do XFS_BMAPI_ENTIRE lookups. The current vfs iomap consumers are only set up to deal with trimmed mappings. xfs_xattr_iomap_begin does BMAPI_ENTIRE lookups, which is inconsistent with the current iomap usage. Remove the flag so that both iomap_begin functions behave the same way. FWIW this also fixes a behavioral regression in xattr FIEMAP that was introduced in 4.8 wherein attr fork extents are no longer trimmed like they used to be. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-08xfs: remove "no-allocation" reservations for file creationsChristoph Hellwig
If we create a new file we will need an inode, and usually some metadata in the parent direction. Aiming for everything to go well despite the lack of a reservation leads to dirty transactions cancelled under a heavy create/delete load. This patch removes those nospace transactions, which will lead to slightly earlier ENOSPC on some workloads, but instead prevent file system shutdowns due to cancelling dirty transactions for others. A customer could observe assertations failures and shutdowns due to cancelation of dirty transactions during heavy NFS workloads as shown below: 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace: 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246] [<ffffffff81795daf>] dump_stack+0x63/0x81 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262] [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264] [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285] [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308] [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329] [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348] [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366] [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380] [<ffffffff81231de5>] vfs_create+0xd5/0x140 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390] [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396] [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401] [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416] [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423] [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427] [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432] [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438] [<ffffffff810c0d58>] kthread+0xd8/0xf0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451] [<ffffffff8179d962>] ret_from_fork+0x42/0x70 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]--- 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x4ee/0x7d0 [xfs] 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-12-08fs: xfs: remove duplicate includesPravin Shedge
These duplicate includes have been found with scripts/checkincludes.pl but they have been removed manually to avoid removing false positives. Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-12-08ceph: drop negative child dentries before try pruning inode's aliasYan, Zheng
Negative child dentry holds reference on inode's alias, it makes d_prune_aliases() do nothing. Cc: stable@vger.kernel.org Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-12-07vfs: remove unused hardirq.hYang Shi
Preempt counter APIs have been split out, currently, hardirq.h just includes irq_enter/exit APIs which are not used by vfs at all. So, remove the unused hardirq.h. Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-12-06proc: show si_ptr in /proc/<pid>/timers without hashingLinus Torvalds
It's a user pointer, and while the permissions of the file are pretty questionable (should it really be readable to everybody), hashing the pointer isn't going to be the solution. We should take a closer look at more of the /proc/<pid> file permissions in general. Sure, we do want many of them to often be readable (for 'ps' and friends), but I think we should probably do a few conversions from S_IRUGO to S_IRUSR. Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-07btrfs: Fix possible off-by-one in btrfs_search_path_in_treeNikolay Borisov
The name char array passed to btrfs_search_path_in_tree is of size BTRFS_INO_LOOKUP_PATH_MAX (4080). So the actual accessible char indexes are in the range of [0, 4079]. Currently the code uses the define but this represents an off-by-one. Implications: Size of btrfs_ioctl_ino_lookup_args is 4096, so the new byte will be written to extra space, not some padding that could be provided by the allocator. btrfs-progs store the arguments on stack, but kernel does own copy of the ioctl buffer and the off-by-one overwrite does not affect userspace, but the ending 0 might be lost. Kernel ioctl buffer is allocated dynamically so we're overwriting somebody else's memory, and the ioctl is privileged if args.objectid is not 256. Which is in most cases, but resolving a subvolume stored in another directory will trigger that path. Before this patch the buffer was one byte larger, but then the -1 was not added. Fixes: ac8e9819d71f907 ("Btrfs: add search and inode lookup ioctls") Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ added implications ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-12-07Btrfs: disable FUA if mounted with nobarrierOmar Sandoval
I was seeing disk flushes still happening when I mounted a Btrfs filesystem with nobarrier for testing. This is because we use FUA to write out the first super block, and on devices without FUA support, the block layer translates FUA to a flush. Even on devices supporting true FUA, using FUA when we asked for no barriers is surprising. Fixes: 387125fc722a8ed ("Btrfs: fix barrier flushes") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-12-07btrfs: fix missing error return in btrfs_drop_snapshotJeff Mahoney
If btrfs_del_root fails in btrfs_drop_snapshot, we'll pick up the error but then return 0 anyway due to mixing err and ret. Fixes: 79787eaab4612 ("btrfs: replace many BUG_ONs with proper error handling") Cc: <stable@vger.kernel.org> # v3.4+ Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-12-07btrfs: handle errors while updating refcounts in update_ref_for_cowJeff Mahoney
Since commit fb235dc06fa (btrfs: qgroup: Move half of the qgroup accounting time out of commit trans) the assumption that btrfs_add_delayed_{data,tree}_ref can only return 0 or -ENOMEM has been false. The qgroup operations call into btrfs_search_slot and friends and can now return the full spectrum of error codes. Fortunately, the fix here is easy since update_ref_for_cow failing is already handled so we just need to bail early with the error code. Fixes: fb235dc06fa (btrfs: qgroup: Move half of the qgroup accounting ...) Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Edmund Nadolski <enadolski@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-12-07btrfs: Fix quota reservation leak on preallocated filesJustin Maggard
Commit c6887cd11149 ("Btrfs: don't do nocow check unless we have to") changed the behavior of __btrfs_buffered_write() so that it first tries to get a data space reservation, and then skips the relatively expensive nocow check if the reservation succeeded. If we have quotas enabled, the data space reservation also includes a quota reservation. But in the rewrite case, the space has already been accounted for in qgroups. So btrfs_check_data_free_space() increases the quota reservation, but it never gets decreased when the data actually gets written and overwrites the pre-existing data. So we're left with both the qgroup and qgroup reservation accounting for the same space. This commit adds the missing btrfs_qgroup_free_data() call in the case of BTRFS_ORDERED_PREALLOC extents. Fixes: c6887cd11149 ("Btrfs: don't do nocow check unless we have to") Signed-off-by: Justin Maggard <jmaggard@netgear.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-12-06CIFS: don't log STATUS_NOT_FOUND errors for DFSAurelien Aptel
cifs.ko makes DFS queries regardless of the type of the server and non-DFS servers are common. This often results in superfluous logging of non-critical errors. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2017-12-06cifs: fix NULL deref in SMB2_readRonnie Sahlberg
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2017-12-06Merge Linus's staging merge point into staging-nextGreg Kroah-Hartman
This resolves the merge issue pointed out by Stephen in drivers/iio/adc/meson_saradc.c. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05fs/file.c: trim includesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>