summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-09-22nfs: use safe, interruptible sleeps when waiting to retry LOCKJeff Layton
We actually want to use TASK_INTERRUPTIBLE sleeps when we're in the process of polling for a NFSv4 lock. If there is a signal pending when the task wakes up, then we'll be returning an error anyway. So, we might as well wake up immediately for non-fatal signals as well. That allows us to return to userland more quickly in that case, but won't change the error that userland sees. Also, there is no need to use the *_unsafe sleep variants here, as no vfs-layer locks should be held at this point. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22nfs: eliminate pointless and confusing do_vfs_lock wrappersJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22nfs: the length argument to read_buf should be unsignedJeff Layton
Since it gets passed through to xdr_inline_decode, we might as well have read_buf expect what it expects -- a size_t. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22ext4: allow DAX writeback for hole punchRoss Zwisler
Currently when doing a DAX hole punch with ext4 we fail to do a writeback. This is because the logic around filemap_write_and_wait_range() in ext4_punch_hole() only looks for dirty page cache pages in the radix tree, not for dirty DAX exceptional entries. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-22jbd2: fix lockdep annotation in add_transaction_credits()Jan Kara
Thomas has reported a lockdep splat hitting in add_transaction_credits(). The problem is that that function calls jbd2_might_wait_for_commit() while holding j_state_lock which is wrong (we do not really wait for transaction commit while holding that lock). Fix the problem by moving jbd2_might_wait_for_commit() into places where we are ready to wait for transaction commit and thus j_state_lock is unlocked. Cc: stable@vger.kernel.org Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-22fs/locks: Use percpu_down_read_preempt_disable()Peter Zijlstra
Avoid spurious preemption. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: tj@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22fs/locks: Replace lg_local with a per-cpu spinlockPeter Zijlstra
As Oleg suggested, replace file_lock_list with a structure containing the hlist head and a spinlock. This completely removes the lglock from fs/locks. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: tj@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22fs/locks: Replace lg_global with a percpu-rwsemPeter Zijlstra
Replace the global part of the lglock with a percpu-rwsem. Since fcl_lock is a spinlock and itself nests under i_lock, which too is a spinlock we cannot acquire sleeping locks at locks_{insert,remove}_global_locks(). We can however wrap all fcl_lock acquisitions with percpu_down_read such that all invocations of locks_{insert,remove}_global_locks() have that read lock held. This allows us to replace the lg_global part of the lglock with the write side of the rwsem. In the absense of writers, percpu_{down,up}_read() are free of atomic instructions. This further avoids the very long preempt-disable regions caused by lglock on larger machines. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: tj@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22fs: Avoid premature clearing of capabilitiesJan Kara
Currently, notify_change() clears capabilities or IMA attributes by calling security_inode_killpriv() before calling into ->setattr. Thus it happens before any other permission checks in inode_change_ok() and user is thus allowed to trigger clearing of capabilities or IMA attributes for any file he can look up e.g. by calling chown for that file. This is unexpected and can lead to user DoSing a system. Fix the problem by calling security_inode_killpriv() at the end of inode_change_ok() instead of from notify_change(). At that moment we are sure user has permissions to do the requested change. References: CVE-2015-1350 Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-22fs: Give dentry to inode_change_ok() instead of inodeJan Kara
inode_change_ok() will be resposible for clearing capabilities and IMA extended attributes and as such will need dentry. Give it as an argument to inode_change_ok() instead of an inode. Also rename inode_change_ok() to setattr_prepare() to better relect that it does also some modifications in addition to checks. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-22fuse: Propagate dentry down to inode_change_ok()Jan Kara
To avoid clearing of capabilities or security related extended attributes too early, inode_change_ok() will need to take dentry instead of inode. Propagate it down to fuse_do_setattr(). Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-22ceph: Propagate dentry down to inode_change_ok()Jan Kara
To avoid clearing of capabilities or security related extended attributes too early, inode_change_ok() will need to take dentry instead of inode. ceph_setattr() has the dentry easily available but __ceph_setattr() is also called from ceph_set_acl() where dentry is not easily available. Luckily that call path does not need inode_change_ok() to be called anyway. So reorganize functions a bit so that inode_change_ok() is called only from paths where dentry is available. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-22xfs: Propagate dentry down to inode_change_ok()Jan Kara
To avoid clearing of capabilities or security related extended attributes too early, inode_change_ok() will need to take dentry instead of inode. Propagate dentry down to functions calling inode_change_ok(). This is rather straightforward except for xfs_set_mode() function which does not have dentry easily available. Luckily that function does not call inode_change_ok() anyway so we just have to do a little dance with function prototypes. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-22posix_acl: Clear SGID bit when setting file permissionsJan Kara
When file permissions are modified via chmod(2) and the user is not in the owning group or capable of CAP_FSETID, the setgid bit is cleared in inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file permissions as well as the new ACL, but doesn't clear the setgid bit in a similar way; this allows to bypass the check in chmod(2). Fix that. References: CVE-2016-7097 Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2016-09-21btrfs: ensure that file descriptor used with subvol ioctls is a dirJeff Mahoney
If the subvol/snapshot create/destroy ioctls are passed a regular file with execute permissions set, we'll eventually Oops while trying to do inode->i_op->lookup via lookup_one_len. This patch ensures that the file descriptor refers to a directory. Fixes: cb8e70901d (Btrfs: Fix subvolume creation locking rules) Fixes: 76dda93c6a (Btrfs: add snapshot/subvolume destroy ioctl) Cc: <stable@vger.kernel.org> #v2.6.29+ Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-09-21Btrfs: handle quota reserve failure properlyJosef Bacik
btrfs/022 was spitting a warning for the case that we exceed the quota. If we fail to make our quota reservation we need to clean up our data space reservation. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Tested-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-09-21gfs2: fix to detect failure of register_shrinkerChao Yu
register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node deferred work"), we should detect the failure of it, otherwise we may fail to register shrinker after gfs2 module was been inited successfully. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-09-21orangefs: bump minimum userspace versionMartin Brandenburg
OrangeFS 2.9.6 was released without support for the features op. Thus OrangeFS 2.9.7 will be required to use it. Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-09-21ovl: Fix info leak in ovl_lookup_temp()Richard Weinberger
The function uses the memory address of a struct dentry as unique id. While the address-based directory entry is only visible to root it is IMHO still worth fixing since the temporary name does not have to be a kernel address. It can be any unique number. Replace it by an atomic integer which is allowed to wrap around. Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # v3.18+ Fixes: e9be9d5e76e3 ("overlay filesystem")
2016-09-21debugfs: introduce a public file_operations accessorChristian Lamparter
This patch introduces an accessor which can be used by the users of debugfs (drivers, fs, ...) to get the original file_operations struct. It also removes the REAL_FOPS_DEREF macro in file.c and converts the code to use the public version. Previously, REAL_FOPS_DEREF was only available within the file.c of debugfs. But having a public getter available for debugfs users is important as some drivers (carl9170 and b43) use the pointer of the original file_operations in conjunction with container_of() within their debugfs implementations. Reviewed-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Cc: stable <stable@vger.kernel.org> # 4.7+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-20fs/proc/kcore.c: Add bounce buffer for ktext dataJiri Olsa
We hit hardened usercopy feature check for kernel text access by reading kcore file: usercopy: kernel memory exposure attempt detected from ffffffff8179a01f (<kernel text>) (4065 bytes) kernel BUG at mm/usercopy.c:75! Bypassing this check for kcore by adding bounce buffer for ktext data. Reported-by: Steve Best <sbest@redhat.com> Fixes: f5509cc18daa ("mm: Hardened usercopy") Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-20fs/proc/kcore.c: Make bounce buffer global for readJiri Olsa
Next patch adds bounce buffer for ktext area, so it's convenient to have single bounce buffer for both vmalloc/module and ktext cases. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-20Merge branch 'efi/urgent' into efi/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20nfs: cover ->migratepage with CONFIG_MIGRATIONChao Yu
It will be more clean to use CONFIG_MIGRATION to cover nfs' private .migratepage in nfs_file_aops like we do in other part of nfs operations. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-20Merge branch 'linus' into x86/asm, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-19Revert "ocfs2: bump up o2cb network protocol version"Junxiao Bi
This reverts commit 38b52efd218b ("ocfs2: bump up o2cb network protocol version"). This commit made rolling upgrade fail. When one node is upgraded to new version with this commit, the remaining nodes will fail to establish connections to it, then the application like VMs on the remaining nodes can't be live migrated to the upgraded one. This will cause an outage. Since negotiate hb timeout behavior didn't change without this commit, so revert it. Fixes: 38b52efd218bf ("ocfs2: bump up o2cb network protocol version") Link: http://lkml.kernel.org/r/1471396924-10375-1-git-send-email-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-19ocfs2: fix start offset to ocfs2_zero_range_for_truncate()Ashish Samant
If we punch a hole on a reflink such that following conditions are met: 1. start offset is on a cluster boundary 2. end offset is not on a cluster boundary 3. (end offset is somewhere in another extent) or (hole range > MAX_CONTIG_BYTES(1MB)), we dont COW the first cluster starting at the start offset. But in this case, we were wrongly passing this cluster to ocfs2_zero_range_for_truncate() to zero out. This will modify the cluster in place and zero it in the source too. Fix this by skipping this cluster in such a scenario. To reproduce: 1. Create a random file of say 10 MB xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile 2. Reflink it reflink -f 10MBfile reflnktest 3. Punch a hole at starting at cluster boundary with range greater that 1MB. You can also use a range that will put the end offset in another extent. fallocate -p -o 0 -l 1048615 reflnktest 4. sync 5. Check the first cluster in the source file. (It will be zeroed out). dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reported-by: Saar Maoz <saar.maoz@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Eric Ren <zren@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-19ocfs2: fix double unlock in case retry after free truncate logJoseph Qi
If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to free truncate log and then retry. Since ocfs2_try_to_free_truncate_log will lock/unlock global bitmap inode, we have to unlock it before calling this function. But when retry reserve and it fails with no global bitmap inode lock taken, it will unlock again in error handling branch and BUG. This issue also exists if no need retry and then ocfs2_inode_lock fails. So fix it. Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in truncate log") Link: http://lkml.kernel.org/r/57D91939.6030809@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-19fanotify: fix list corruption in fanotify_get_response()Jan Kara
fanotify_get_response() calls fsnotify_remove_event() when it finds that group is being released from fanotify_release() (bypass_perm is set). However the event it removes need not be only in the group's notification queue but it can have already moved to access_list (userspace read the event before closing the fanotify instance fd) which is protected by a different lock. Thus when fsnotify_remove_event() races with fanotify_release() operating on access_list, the list can get corrupted. Fix the problem by moving all the logic removing permission events from the lists to one place - fanotify_release(). Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events") Link: http://lkml.kernel.org/r/1473797711-14111-3-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-19fsnotify: add a way to stop queueing events on group shutdownJan Kara
Implement a function that can be called when a group is being shutdown to stop queueing new events to the group. Fanotify will use this. Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events") Link: http://lkml.kernel.org/r/1473797711-14111-2-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-19ocfs2: fix trans extend while free cached blocksJunxiao Bi
The root cause of this issue is the same with the one fixed by the last patch, but this time credits for allocator inode and group descriptor may not be consumed before trans extend. The following error was caught: WARNING: CPU: 0 PID: 2037 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]() Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_netfront parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 2037 Comm: rm Tainted: G W 4.1.12-37.6.3.el6uek.bug24573128v2.x86_64 #2 Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016 Call Trace: dump_stack+0x48/0x5c warn_slowpath_common+0x95/0xe0 warn_slowpath_null+0x1a/0x20 start_this_handle+0x4c3/0x510 [jbd2] jbd2__journal_restart+0x161/0x1b0 [jbd2] jbd2_journal_restart+0x13/0x20 [jbd2] ocfs2_extend_trans+0x74/0x220 [ocfs2] ocfs2_free_cached_blocks+0x16b/0x4e0 [ocfs2] ocfs2_run_deallocs+0x70/0x270 [ocfs2] ocfs2_commit_truncate+0x474/0x6f0 [ocfs2] ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2] ocfs2_wipe_inode+0x136/0x6a0 [ocfs2] ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2] ocfs2_evict_inode+0x28/0x60 [ocfs2] evict+0xab/0x1a0 iput_final+0xf6/0x190 iput+0xc8/0xe0 do_unlinkat+0x1b7/0x310 SyS_unlinkat+0x22/0x40 system_call_fastpath+0x12/0x71 ---[ end trace a62437cb060baa71 ]--- JBD2: rm wants too many credits (149 > 128) Link: http://lkml.kernel.org/r/1473674623-11810-2-git-send-email-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-19ocfs2: fix trans extend while flush truncate logJunxiao Bi
Every time, ocfs2_extend_trans() included a credit for truncate log inode, but as that inode had been managed by jbd2 running transaction first time, it will not consume that credit until jbd2_journal_restart(). Since total credits to extend always included the un-consumed ones, there will be more and more un-consumed credit, at last jbd2_journal_restart() will fail due to credit number over the half of max transction credit. The following error was caught when unlinking a large file with many extents: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 13626 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]() Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 13626 Comm: unlink Tainted: G W 4.1.12-37.6.3.el6uek.x86_64 #2 Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016 Call Trace: dump_stack+0x48/0x5c warn_slowpath_common+0x95/0xe0 warn_slowpath_null+0x1a/0x20 start_this_handle+0x4c3/0x510 [jbd2] jbd2__journal_restart+0x161/0x1b0 [jbd2] jbd2_journal_restart+0x13/0x20 [jbd2] ocfs2_extend_trans+0x74/0x220 [ocfs2] ocfs2_replay_truncate_records+0x93/0x360 [ocfs2] __ocfs2_flush_truncate_log+0x13e/0x3a0 [ocfs2] ocfs2_remove_btree_range+0x458/0x7f0 [ocfs2] ocfs2_commit_truncate+0x1b3/0x6f0 [ocfs2] ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2] ocfs2_wipe_inode+0x136/0x6a0 [ocfs2] ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2] ocfs2_evict_inode+0x28/0x60 [ocfs2] evict+0xab/0x1a0 iput_final+0xf6/0x190 iput+0xc8/0xe0 do_unlinkat+0x1b7/0x310 SyS_unlink+0x16/0x20 system_call_fastpath+0x12/0x71 ---[ end trace 28aa7410e69369cf ]--- JBD2: unlink wants too many credits (251 > 128) Link: http://lkml.kernel.org/r/1473674623-11810-1-git-send-email-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-19ipc/shm: fix crash if CONFIG_SHMEM is not setKirill A. Shutemov
Commit c01d5b300774 ("shmem: get_unmapped_area align huge page") makes use of shm_get_unmapped_area() in shm_file_operations() unconditional to CONFIG_MMU. As Tony Battersby pointed this can lead NULL-pointer dereference on machine with CONFIG_MMU=y and CONFIG_SHMEM=n. In this case ipc/shm is backed by ramfs which doesn't provide f_op->get_unmapped_area for configurations with MMU. The solution is to provide dummy f_op->get_unmapped_area for ramfs when CONFIG_MMU=y, which just call current->mm->get_unmapped_area(). Fixes: c01d5b300774 ("shmem: get_unmapped_area align huge page") Link: http://lkml.kernel.org/r/20160912102704.140442-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Tony Battersby <tonyb@cybernetics.com> Tested-by: Tony Battersby <tonyb@cybernetics.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> [4.7.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-19autofs: use dentry flags to block walks during expireIan Kent
Somewhere along the way the autofs expire operation has changed to hold a spin lock over expired dentry selection. The autofs indirect mount expired dentry selection is complicated and quite lengthy so it isn't appropriate to hold a spin lock over the operation. Commit 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") added a might_sleep() to dput() causing a WARN_ONCE() about this usage to be issued. But the spin lock doesn't need to be held over this check, the autofs dentry info. flags are enough to block walks into dentrys during the expire. I've left the direct mount expire as it is (for now) because it is much simpler and quicker than the indirect mount expire and adding spin lock release and re-aquires would do nothing more than add overhead. Fixes: 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") Link: http://lkml.kernel.org/r/20160912014017.1773.73060.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Reported-by: Takashi Iwai <tiwai@suse.de> Tested-by: Takashi Iwai <tiwai@suse.de> Cc: Takashi Iwai <tiwai@suse.de> Cc: NeilBrown <neilb@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-19ocfs2/dlm: fix race between convert and migrationJoseph Qi
Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery") checks if lockres master has changed to identify whether new master has finished recovery or not. This will introduce a race that right after old master does umount ( means master will change), a new convert request comes. In this case, it will reset lockres state to DLM_RECOVERING and then retry convert, and then fail with lockres->l_action being set to OCFS2_AST_INVALID, which will cause inconsistent lock level between ocfs2 and dlm, and then finally BUG. Since dlm recovery will clear lock->convert_pending in dlm_move_lockres_to_recovery_list, we can use it to correctly identify the race case between convert and recovery. So fix it. Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery") Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-19pnfs: add a new mechanism to select a layout driver according to an ordered listJeff Layton
Currently, the layout driver selection code always chooses the first one from the list. That's not really ideal however, as the server can send the list of layout types in any order that it likes. It's up to the client to select the best one for its needs. This patch adds an ordered list of preferred driver types and has the selection code sort the list of available layout drivers according to it. Any unrecognized layout type is sorted to the end of the list. For now, the order of preference is hardcoded, but it should be possible to make this configurable in the future. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19NFS pnfs data server multipath session trunkingAndy Adamson
Try all multipath addresses for a data server. The first address that successfully connects and creates a session is the DS mount address. All subsequent addresses are tested for session trunking and added as aliases. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19NFS test session trunking with exchange idAndy Adamson
Use an async exchange id call to test for session trunking To conform with RFC 5661 section 18.35.4, the Non-Update on Existing Clientid case, save the exchange id verifier in cl_confirm and use it for the session trunking exhange id test. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19NFS add xprt switch addrs test to match clientAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19NFS detect session trunkingAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19NFS refactor nfs4_check_serverowner_major_idAndy Adamson
For session trunking, to compare nfs41_exchange_id_res with existing nfs_client Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19NFS refactor nfs4_match_clientidsAndy Adamson
For session trunking, to compare nfs41_exchange_id_res with exiting nfs_client. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19NFS setup async exchange_idAndy Adamson
Testing an rpc_xprt for session trunking should not delay application progress over already established transports. Setup exchange_id to be able to be an async call to test an rpc_xprt for session trunking use. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19NFSv4.x: Add kernel parameter to control the callback serverTrond Myklebust
Add support for the kernel parameter nfs.callback_nr_threads to set the number of threads that will be assigned to the callback channel. Add support for the kernel parameter nfs.nfs.max_session_cb_slots to set the maximum size of the callback channel slot table. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19NFSv4.x: Switch to using svc_set_num_threads() to manage the callback threadsTrond Myklebust
This will allow us to bump the number of callback threads at will. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19NFSv4.x: Fix up the global tracking of the callback serverTrond Myklebust
Ensure that the nfs_callback_info[] array correctly tracks the struct svc_serv. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19SUNRPC: Initialise struct svc_serv backchannel fields during __svc_create()Trond Myklebust
Clean up. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19NFSv4.x: Set up struct svc_serv_ops for the callback channelTrond Myklebust
In order to manage the threads using svc_set_num_threads, we need to fill in a few extra fields. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19pnfs: track multiple layout types in fsinfo structureJeff Layton
Current NFSv4.1/pNFS client assumes that MDS supports only one layout type. While it's true for most existing servers, nevertheless, this can be change in the near future. For now, this patch just plumbs in the ability to track a list of layouts in the fsinfo structure. The existing behavior of the client is preserved, by having it just select the first entry in the list. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19ovl: during copy up, switch to mounter's creds earlyVivek Goyal
Now, we have the notion that copy up of a file is done with the creds of mounter of overlay filesystem (as opposed to task). Right now before we switch creds, we do some vfs_getattr() operations in the context of task and that itself can fail. We should do that getattr() using the creds of mounter instead. So this patch switches to mounter's creds early during copy up process so that even vfs_getattr() is done with mounter's creds. Do not call revert_creds() unless we have already called ovl_override_creds(). [Reported by Arnd Bergmann] Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>