summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2014-03-31locks: remove "inline" qualifier from fl_link manipulation functionsJeff Layton
It's best to let the compiler decide that. Acked-by: J. Bruce Fields <bfields@fieldses.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31locks: clean up comment typoJeff Layton
Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-03-31locks: close potential race between setlease and openJeff Layton
As Al Viro points out, there is an unlikely, but possible race between opening a file and setting a lease on it. generic_add_lease is done with the i_lock held, but the inode->i_flock check in break_lease is lockless. It's possible for another task doing an open to do the entire pathwalk and call break_lease between the point where generic_add_lease checks for a conflicting open and adds the lease to the list. If this occurs, we can end up with a lease set on the file with a conflicting open. To guard against that, check again for a conflicting open after adding the lease to the i_flock list. If the above race occurs, then we can simply unwind the lease setting and return -EAGAIN. Because we take dentry references and acquire write access on the file before calling break_lease, we know that if the i_flock list is empty when the open caller goes to check it then the necessary refcounts have already been incremented. Thus the additional check for a conflicting open will see that there is one and the setlease call will fail. Cc: Bruce Fields <bfields@fieldses.org> Cc: David Howells <dhowells@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
2014-03-31GFS2: Fix return value in slot_get()Abhi Das
ENOSPC was being returned in slot_get inspite of successful execution of the function. This patch fixes this return code. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-31Merge tag 'v3.14' into drm-intel-next-queuedDaniel Vetter
Linux 3.14 The vt-d w/a merged late in 3.14-rc needs a bit of fine-tuning, hence backmerge. Conflicts: drivers/gpu/drm/i915/i915_gem_gtt.c drivers/gpu/drm/i915/intel_ddi.c drivers/gpu/drm/i915/intel_dp.c All trivial adjacent lines changed type conflicts, so trivial git doesn't even show them in the merg commit. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-31Merge remote-tracking branch 'robh/for-next' into devicetree/nextGrant Likely
2014-03-30Merge branch 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Switch mnt_hash to hlist, turning the races between __lookup_mnt() and hash modifications into false negatives from __lookup_mnt() (instead of hangs)" On the false negatives from __lookup_mnt(): "The *only* thing we care about is not getting stuck in __lookup_mnt(). If it misses an entry because something in front of it just got moved around, etc, we are fine. We'll notice that mount_lock mismatch and that'll be it" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch mnt_hash to hlist don't bother with propagate_mnt() unless the target is shared keep shadowed vfsmounts together resizable namespace.c hashes
2014-03-30ext4: atomically set inode->i_flags in ext4_set_inode_flags()Theodore Ts'o
Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-30switch mnt_hash to hlistAl Viro
fixes RCU bug - walking through hlist is safe in face of element moves, since it's self-terminating. Cyclic lists are not - if we end up jumping to another hash chain, we'll loop infinitely without ever hitting the original list head. [fix for dumb braino folded] Spotted by: Max Kellermann <mk@cm4all.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30don't bother with propagate_mnt() unless the target is sharedAl Viro
If the dest_mnt is not shared, propagate_mnt() does nothing - there's no mounts to propagate to and thus no copies to create. Might as well don't bother calling it in that case. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30keep shadowed vfsmounts togetherAl Viro
preparation to switching mnt_hash to hlist Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30resizable namespace.c hashesAl Viro
* switch allocation to alloc_large_system_hash() * make sizes overridable by boot parameters (mhash_entries=, mphash_entries=) * switch mountpoint_hashtable from list_head to hlist_head Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcpKinglong Mee
Besides checking rpc_xprt out of xs_setup_bc_tcp, increase it's reference (it's important). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-30NFSD: Clear wcc data between compound opsKinglong Mee
Testing NFS4.0 by pynfs, I got some messeages as, "nfsd: inode locked twice during operation." When one compound RPC contains two or more ops that locks the filehandle,the second op will cause the message. As two SETATTR ops, after the first SETATTR, nfsd will not call fh_put() to release current filehandle, it means filehandle have unlocked with fh_post_saved = 1. The second SETATTR find fh_post_saved = 1, and printk the message. v2: introduce helper fh_clear_wcc(). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-30nfsd: Don't return NFS4ERR_STALE_STATEID for NFSv4.1+Trond Myklebust
RFC5661 obsoletes NFS4ERR_STALE_STATEID in favour of NFS4ERR_BAD_STATEID. Note that because nfsd encodes the clientid boot time in the stateid, we can hit this error case in certain scenarios where the Linux client state management thread exits early, before it has finished recovering all state. Reported-by: Idan Kedar <idank@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28nfsd4: fix nfs4err_resource in 4.1 caseJ. Bruce Fields
encode_getattr, for example, can return nfserr_resource to indicate it ran out of buffer space. That's not a legal error in the 4.1 case. And in the 4.1 case, if we ran out of buffer space, we should have exceeded a session limit too. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28nfsd4: fix setclientid encode sizeJ. Bruce Fields
Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28nfsd4: remove redundant check from nfsd4_check_resp_sizeJ. Bruce Fields
cstate->slot and ->session are each set together in nfsd4_sequence. If one is non-NULL, so is the other. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28nfsd4: use more generous NFS4_ACL_MAXJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28nfsd4: minor nfsd4_replay_cache_entry cleanupJ. Bruce Fields
Maybe this is comment true, who cares? Handle this like any other error. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28nfsd4: nfsd4_replay_cache_entry should be staticJ. Bruce Fields
This isn't actually used anywhere else. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28nfsd4: update comments with obsolete function nameJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28NFSD: Using free_conn free connectionKinglong Mee
Connection from alloc_conn must be freed through free_conn, otherwise, the reference of svc_xprt will never be put. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28NFSv4: Fix a use-after-free problem in open()Trond Myklebust
If we interrupt the nfs4_wait_for_completion_rpc_task() call in nfs4_run_open_task(), then we don't prevent the RPC call from completing. So freeing up the opendata->f_attr.mdsthreshold in the error path in _nfs4_do_open() leads to a use-after-free when the XDR decoder tries to decode the mdsthreshold information from the server. Fixes: 82be417aa37c0 (NFSv4.1 cache mdsthreshold values on OPEN) Tested-by: Steve Dickson <SteveD@redhat.com> Cc: stable@vger.kernel.org # 3.5+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-28nfsd: typo in nfsd_rename commentJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28NFSD: simplify saved/current fh uses in nfsd4_proc_compoundKinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28ocfs2: check if cluster name exists before derefSasha Levin
Commit c74a3bdd9b52 ("ocfs2: add clustername to cluster connection") is trying to strlcpy a string which was explicitly passed as NULL in the very same patch, triggering a NULL ptr deref. BUG: unable to handle kernel NULL pointer dereference at (null) IP: strlcpy (lib/string.c:388 lib/string.c:151) CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G W 3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274 RIP: strlcpy (lib/string.c:388 lib/string.c:151) Call Trace: ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350) ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396) user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679) dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503) vfs_mkdir (fs/namei.c:3467) SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472) tracesys (arch/x86/kernel/entry_64.S:749) akpm: this patch probably disables the feature. A temporary thing to avoid triviel oopses. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-28lockd: ensure we tear down any live sockets when socket creation fails ↵Jeff Layton
during lockd_up We had a Fedora ABRT report with a stack trace like this: kernel BUG at net/sunrpc/svc.c:550! invalid opcode: 0000 [#1] SMP [...] CPU: 2 PID: 913 Comm: rpc.nfsd Not tainted 3.13.6-200.fc20.x86_64 #1 Hardware name: Hewlett-Packard HP ProBook 4740s/1846, BIOS 68IRR Ver. F.40 01/29/2013 task: ffff880146b00000 ti: ffff88003f9b8000 task.ti: ffff88003f9b8000 RIP: 0010:[<ffffffffa0305fa8>] [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc] RSP: 0018:ffff88003f9b9de0 EFLAGS: 00010206 RAX: ffff88003f829628 RBX: ffff88003f829600 RCX: 00000000000041ee RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286 RBP: ffff88003f9b9de8 R08: 0000000000017360 R09: ffff88014fa97360 R10: ffffffff8114ce57 R11: ffffea00051c9c00 R12: ffff88003f829600 R13: 00000000ffffff9e R14: ffffffff81cc7cc0 R15: 0000000000000000 FS: 00007f4fde284840(0000) GS:ffff88014fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4fdf5192f8 CR3: 00000000a569a000 CR4: 00000000001407e0 Stack: ffff88003f792300 ffff88003f9b9e18 ffffffffa02de02a 0000000000000000 ffffffff81cc7cc0 ffff88003f9cb000 0000000000000008 ffff88003f9b9e60 ffffffffa033bb35 ffffffff8131c86c ffff88003f9cb000 ffff8800a5715008 Call Trace: [<ffffffffa02de02a>] lockd_up+0xaa/0x330 [lockd] [<ffffffffa033bb35>] nfsd_svc+0x1b5/0x2f0 [nfsd] [<ffffffff8131c86c>] ? simple_strtoull+0x2c/0x50 [<ffffffffa033c630>] ? write_pool_threads+0x280/0x280 [nfsd] [<ffffffffa033c6bb>] write_threads+0x8b/0xf0 [nfsd] [<ffffffff8114efa4>] ? __get_free_pages+0x14/0x50 [<ffffffff8114eff6>] ? get_zeroed_page+0x16/0x20 [<ffffffff811dec51>] ? simple_transaction_get+0xb1/0xd0 [<ffffffffa033c098>] nfsctl_transaction_write+0x48/0x80 [nfsd] [<ffffffff811b8b34>] vfs_write+0xb4/0x1f0 [<ffffffff811c3f99>] ? putname+0x29/0x40 [<ffffffff811b9569>] SyS_write+0x49/0xa0 [<ffffffff810fc2a6>] ? __audit_syscall_exit+0x1f6/0x2a0 [<ffffffff816962e9>] system_call_fastpath+0x16/0x1b Code: 31 c0 e8 82 db 37 e1 e9 2a ff ff ff 48 8b 07 8b 57 14 48 c7 c7 d5 c6 31 a0 48 8b 70 20 31 c0 e8 65 db 37 e1 e9 f4 fe ff ff 0f 0b <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 RIP [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc] RSP <ffff88003f9b9de0> Evidently, we created some lockd sockets and then failed to create others. make_socks then returned an error and we tried to tear down the svc, but svc->sv_permsocks was not empty so we ended up tripping over the BUG() in svc_destroy(). Fix this by ensuring that we tear down any live sockets we created when socket creation is going to return an error. Fixes: 786185b5f8abefa (SUNRPC: move per-net operations from...) Reported-by: Raphos <raphoszap@laposte.net> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28NFSD: Traverse unconfirmed client through hash-tableKinglong Mee
When stopping nfsd, I got BUG messages, and soft lockup messages, The problem is cuased by double rb_erase() in nfs4_state_destroy_net() and destroy_client(). This patch just let nfsd traversing unconfirmed client through hash-table instead of rbtree. [ 2325.021995] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2325.022809] IP: [<ffffffff8133c18c>] rb_erase+0x14c/0x390 [ 2325.022982] PGD 7a91b067 PUD 7a33d067 PMD 0 [ 2325.022982] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 2325.022982] Modules linked in: nfsd(OF) cfg80211 rfkill bridge stp llc snd_intel8x0 snd_ac97_codec ac97_bus auth_rpcgss nfs_acl serio_raw e1000 i2c_piix4 ppdev snd_pcm snd_timer lockd pcspkr joydev parport_pc snd parport i2c_core soundcore microcode sunrpc ata_generic pata_acpi [last unloaded: nfsd] [ 2325.022982] CPU: 1 PID: 2123 Comm: nfsd Tainted: GF O 3.14.0-rc8+ #2 [ 2325.022982] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 2325.022982] task: ffff88007b384800 ti: ffff8800797f6000 task.ti: ffff8800797f6000 [ 2325.022982] RIP: 0010:[<ffffffff8133c18c>] [<ffffffff8133c18c>] rb_erase+0x14c/0x390 [ 2325.022982] RSP: 0018:ffff8800797f7d98 EFLAGS: 00010246 [ 2325.022982] RAX: ffff880079c1f010 RBX: ffff880079f4c828 RCX: 0000000000000000 [ 2325.022982] RDX: 0000000000000000 RSI: ffff880079bcb070 RDI: ffff880079f4c810 [ 2325.022982] RBP: ffff8800797f7d98 R08: 0000000000000000 R09: ffff88007964fc70 [ 2325.022982] R10: 0000000000000000 R11: 0000000000000400 R12: ffff880079f4c800 [ 2325.022982] R13: ffff880079bcb000 R14: ffff8800797f7da8 R15: ffff880079f4c860 [ 2325.022982] FS: 0000000000000000(0000) GS:ffff88007f900000(0000) knlGS:0000000000000000 [ 2325.022982] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2325.022982] CR2: 0000000000000000 CR3: 000000007a3ef000 CR4: 00000000000006e0 [ 2325.022982] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2325.022982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2325.022982] Stack: [ 2325.022982] ffff8800797f7de0 ffffffffa0191c6e ffff8800797f7da8 ffff8800797f7da8 [ 2325.022982] ffff880079f4c810 ffff880079bcb000 ffffffff81cc26c0 ffff880079c1f010 [ 2325.022982] ffff880079bcb070 ffff8800797f7e28 ffffffffa01977f2 ffff8800797f7df0 [ 2325.022982] Call Trace: [ 2325.022982] [<ffffffffa0191c6e>] destroy_client+0x32e/0x3b0 [nfsd] [ 2325.022982] [<ffffffffa01977f2>] nfs4_state_shutdown_net+0x1a2/0x220 [nfsd] [ 2325.022982] [<ffffffffa01700b8>] nfsd_shutdown_net+0x38/0x70 [nfsd] [ 2325.022982] [<ffffffffa017013e>] nfsd_last_thread+0x4e/0x80 [nfsd] [ 2325.022982] [<ffffffffa001f1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc] [ 2325.022982] [<ffffffffa017064b>] nfsd_destroy+0x5b/0x80 [nfsd] [ 2325.022982] [<ffffffffa0170773>] nfsd+0x103/0x130 [nfsd] [ 2325.022982] [<ffffffffa0170670>] ? nfsd_destroy+0x80/0x80 [nfsd] [ 2325.022982] [<ffffffff810a8232>] kthread+0xd2/0xf0 [ 2325.022982] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40 [ 2325.022982] [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0 [ 2325.022982] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40 [ 2325.022982] Code: 48 83 e1 fc 48 89 10 0f 84 02 01 00 00 48 3b 41 10 0f 84 08 01 00 00 48 89 51 08 48 89 fa e9 74 ff ff ff 0f 1f 40 00 48 8b 50 10 <f6> 02 01 0f 84 93 00 00 00 48 8b 7a 10 48 85 ff 74 05 f6 07 01 [ 2325.022982] RIP [<ffffffff8133c18c>] rb_erase+0x14c/0x390 [ 2325.022982] RSP <ffff8800797f7d98> [ 2325.022982] CR2: 0000000000000000 [ 2325.022982] ---[ end trace 28c27ed011655e57 ]--- [ 228.064071] BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:558] [ 228.064428] Modules linked in: ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw nfsd(OF) auth_rpcgss nfs_acl lockd snd_intel8x0 snd_ac97_codec ac97_bus joydev snd_pcm snd_timer e1000 sunrpc snd ppdev parport_pc serio_raw pcspkr i2c_piix4 microcode parport soundcore i2c_core ata_generic pata_acpi [ 228.064539] CPU: 0 PID: 558 Comm: nfsd Tainted: GF O 3.14.0-rc8+ #2 [ 228.064539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 228.064539] task: ffff880076adec00 ti: ffff880074616000 task.ti: ffff880074616000 [ 228.064539] RIP: 0010:[<ffffffff8133ba17>] [<ffffffff8133ba17>] rb_next+0x27/0x50 [ 228.064539] RSP: 0018:ffff880074617de0 EFLAGS: 00000282 [ 228.064539] RAX: ffff880074478010 RBX: ffff88007446f860 RCX: 0000000000000014 [ 228.064539] RDX: ffff880074478010 RSI: 0000000000000000 RDI: ffff880074478010 [ 228.064539] RBP: ffff880074617de0 R08: 0000000000000000 R09: 0000000000000012 [ 228.064539] R10: 0000000000000001 R11: ffffffffffffffec R12: ffffea0001d11a00 [ 228.064539] R13: ffff88007f401400 R14: ffff88007446f800 R15: ffff880074617d50 [ 228.064539] FS: 0000000000000000(0000) GS:ffff88007f800000(0000) knlGS:0000000000000000 [ 228.064539] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 228.064539] CR2: 00007fe9ac6ec000 CR3: 000000007a5d6000 CR4: 00000000000006f0 [ 228.064539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 228.064539] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 228.064539] Stack: [ 228.064539] ffff880074617e28 ffffffffa01ab7db ffff880074617df0 ffff880074617df0 [ 228.064539] ffff880079273000 ffffffff81cc26c0 ffffffff81cc26c0 0000000000000000 [ 228.064539] 0000000000000000 ffff880074617e48 ffffffffa01840b8 ffffffff81cc26c0 [ 228.064539] Call Trace: [ 228.064539] [<ffffffffa01ab7db>] nfs4_state_shutdown_net+0x18b/0x220 [nfsd] [ 228.064539] [<ffffffffa01840b8>] nfsd_shutdown_net+0x38/0x70 [nfsd] [ 228.064539] [<ffffffffa018413e>] nfsd_last_thread+0x4e/0x80 [nfsd] [ 228.064539] [<ffffffffa00aa1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc] [ 228.064539] [<ffffffffa018464b>] nfsd_destroy+0x5b/0x80 [nfsd] [ 228.064539] [<ffffffffa0184773>] nfsd+0x103/0x130 [nfsd] [ 228.064539] [<ffffffffa0184670>] ? nfsd_destroy+0x80/0x80 [nfsd] [ 228.064539] [<ffffffff810a8232>] kthread+0xd2/0xf0 [ 228.064539] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40 [ 228.064539] [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0 [ 228.064539] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40 [ 228.064539] Code: 1f 44 00 00 55 48 8b 17 48 89 e5 48 39 d7 74 3b 48 8b 47 08 48 85 c0 75 0e eb 25 66 0f 1f 84 00 00 00 00 00 48 89 d0 48 8b 50 10 <48> 85 d2 75 f4 5d c3 66 90 48 3b 78 08 75 f6 48 8b 10 48 89 c7 Fixes: ac55fdc408039 (nfsd: move the confirmed and unconfirmed hlists...) Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28aio: v4 ensure access to ctx->ring_pages is correctly serialised for migrationBenjamin LaHaise
As reported by Tang Chen, Gu Zheng and Yasuaki Isimatsu, the following issues exist in the aio ring page migration support. As a result, for example, we have the following problem: thread 1 | thread 2 | aio_migratepage() | |-> take ctx->completion_lock | |-> migrate_page_copy(new, old) | | *NOW*, ctx->ring_pages[idx] == old | | | *NOW*, ctx->ring_pages[idx] == old | aio_read_events_ring() | |-> ring = kmap_atomic(ctx->ring_pages[0]) | |-> ring->head = head; *HERE, write to the old ring page* | |-> kunmap_atomic(ring); | |-> ctx->ring_pages[idx] = new | | *BUT NOW*, the content of | | ring_pages[idx] is old. | |-> release ctx->completion_lock | As above, the new ring page will not be updated. Fix this issue, as well as prevent races in aio_ring_setup() by holding the ring_lock mutex during kioctx setup and page migration. This avoids the overhead of taking another spinlock in aio_read_events_ring() as Tang's and Gu's original fix did, pushing the overhead into the migration code. Note that to handle the nesting of ring_lock inside of mmap_sem, the migratepage operation uses mutex_trylock(). Page migration is not a 100% critical operation in this case, so the ocassional failure can be tolerated. This issue was reported by Sasha Levin. Based on feedback from Linus, avoid the extra taking of ctx->completion_lock. Instead, make page migration fully serialised by mapping->private_lock, and have aio_free_ring() simply disconnect the kioctx from the mapping by calling put_aio_ring_file() before touching ctx->ring_pages[]. This simplifies the error handling logic in aio_migratepage(), and should improve robustness. v4: always do mutex_unlock() in cases when kioctx setup fails. Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: stable@vger.kernel.org
2014-03-27svcrpc: explicitly reject compounds that are not padded out to 4-byte multipleJeff Layton
We have a WARN_ON in the nfsd4_decode_write() that tells us when the client has sent a request that is not padded out properly according to RFC4506. A WARN_ON really isn't appropriate in this case though since this indicates a client bug, not a server one. Move this check out to the top-level compound decoder and have it just explicitly return an error. Also add a dprintk() that shows the client address and xid to help track down clients and frames that trigger it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27nfsd: notify_change needs elevated write countJ. Bruce Fields
Looks like this bug has been here since these write counts were introduced, not sure why it was just noticed now. Thanks also to Jan Kara for pointing out the problem. Cc: stable@vger.kernel.org Reported-by: Matthew Rahtz <mrahtz@rapitasystems.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27nfsd4: fix test_stateid error reply encodingJ. Bruce Fields
If the entire operation fails then there's nothing to encode. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27nfsd4: leave reply buffer space for failed setattrJ. Bruce Fields
This fixes an ommission from 18032ca062e621e15683cb61c066ef3dc5414a7b "NFSD: Server implementation of MAC Labeling", which increased the size of the setattr error reply without increasing COMPOUND_ERR_SLACK_SPACE. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27nfsd4: make set of large acl return efbig, not resourceJ. Bruce Fields
If a client attempts to set an excessively large ACL, return NFS4ERR_FBIG instead of NFS4ERR_RESOURCE. I'm not sure FBIG is correct, but I'm positive RESOURCE is wrong (it isn't even a well-defined error any more for NFS versions since 4.1). Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27nfsd4: session needs room for following op to error outJ. Bruce Fields
Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27nfsd4: buffer-length check for SUPPATTR_EXCLCREATJ. Bruce Fields
This was an omission from 8c18f2052e756e7d5dea712fc6e7ed70c00e8a39 "nfsd41: SUPPATTR_EXCLCREAT attribute". Cc: Benny Halevy <bhalevy@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27vfs: Allocate anon_inode_inode in anon_inode_init()Jan Kara
Currently we allocated anon_inode_inode in anon_inodefs_mount. This is somewhat fragile as if that function ever gets called again, it will overwrite anon_inode_inode pointer. So move the initialization of anon_inode_inode to anon_inode_init(). Signed-off-by: Jan Kara <jack@suse.cz> [ Further simplified on suggestion from Dave Jones ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-25Revert "sysfs, driver-core: remove unused ↵Greg Kroah-Hartman
{sysfs|device}_schedule_callback_owner()" This reverts commit d1ba277e79889085a2faec3b68b91ce89c63f888. As reported by Stephen, this patch breaks linux-next as a ppc patch suddenly (after 2 years) started using this old api call. So revert it for now, it will go away in 3.15-rc2 when we can change the PPC call to the new api. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Tejun Heo <tj@kernel.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-25fs: remove now stale label in anon_inode_init()Linus Torvalds
The previous commit removed the register_filesystem() call and the associated error handling, but left the label for the error path that no longer exists. Remove that too. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-25fs: Avoid userspace mounting anon_inodefs filesystemJan Kara
anon_inodefs filesystem is a kernel internal filesystem userspace shouldn't mess with. Remove registration of it so userspace cannot even try to mount it (which would fail anyway because the filesystem is MS_NOUSER). This fixes an oops triggered by trinity when it tried mounting anon_inodefs which overwrote anon_inode_inode pointer while other CPU has been in anon_inode_getfile() between ihold() and d_instantiate(). Thus effectively creating dentry pointing to an inode without holding a reference to it. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-25Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fix frm Bruce Fields: "J R Okajima sent this early and I was just slow to pass it along, apologies. Fortunately it's a simple fix" * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: nfsd: fix lost nfserrno() call in nfsd_setattr()
2014-03-24ext4: fix comment typoMatthew Wilcox
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-24ext4: make ext4_block_zero_page_range staticMatthew Wilcox
It's only called within inode.c, so make it static, remove its prototype from ext4.h and move it above all of its callers so it doesn't need a prototype within inode.c. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-24ext4: atomically set inode->i_flags in ext4_set_inode_flags()Theodore Ts'o
Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2014-03-24ext4: optimize Hurd tests when reading/writing inodesTheodore Ts'o
Set a in-memory superblock flag to indicate whether the file system is designed to support the Hurd. Also, add a sanity check to make sure the 64-bit feature is not set for Hurd file systems, since i_file_acl_high conflicts with a Hurd-specific field. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-24VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.Rusty Russell
Summary of http://lkml.org/lkml/2014/3/14/363 : Ted: module_param(queue_depth, int, 444) Joe: 0444! Rusty: User perms >= group perms >= other perms? Joe: CLASS_ATTR, DEVICE_ATTR, SENSOR_ATTR and SENSOR_ATTR_2? Side effect of stricter permissions means removing the unnecessary S_IFREG from several callers. Note that the BUILD_BUG_ON_ZERO((perm) & 2) test was removed: a fair number of drivers fail this test, so that will be the debate for a future patch. Suggested-by: Joe Perches <joe@perches.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> for drivers/pci/slot.c Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-23rcuwalk: recheck mount_lock after mountpoint crossing attemptsAl Viro
We can get false negative from __lookup_mnt() if an unrelated vfsmount gets moved. In that case legitimize_mnt() is guaranteed to fail, and we will fall back to non-RCU walk... unless we end up running into a hard error on a filesystem object we wouldn't have reached if not for that false negative. IOW, delaying that check until the end of pathname resolution is wrong - we should recheck right after we attempt to cross the mountpoint. We don't need to recheck unless we see d_mountpoint() being true - in that case even if we have just raced with mount/umount, we can simply go on as if we'd come at the moment when the sucker wasn't a mountpoint; if we run into a hard error as the result, it was a legitimate outcome. __lookup_mnt() returning NULL is different in that respect, since it might've happened due to operation on completely unrelated mountpoint. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-23make prepend_name() work correctly when called with negative *buflenAl Viro
In all callchains leading to prepend_name(), the value left in *buflen is eventually discarded unused if prepend_name() has returned a negative. So we are free to do what prepend() does, and subtract from *buflen *before* checking for underflow (which turns into checking the sign of subtraction result, of course). Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-23vfs: Don't let __fdget_pos() get FMODE_PATH filesEric Biggers
Commit bd2a31d522344 ("get rid of fget_light()") introduced the __fdget_pos() function, which returns the resulting file pointer and fdput flags combined in an 'unsigned long'. However, it also changed the behavior to return files with FMODE_PATH set, which shouldn't happen because read(), write(), lseek(), etc. aren't allowed on such files. This commit restores the old behavior. This regression actually had no effect on read() and write() since FMODE_READ and FMODE_WRITE are not set on file descriptors opened with O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH to fail with ESPIPE rather than EBADF. Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>