summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-09-07NFSv4: Disallow security negotiation for lookups when 'sec=' is specifiedTrond Myklebust
Ensure that nfs4_proc_lookup_common respects the NFS_MOUNT_SECFLAVOUR flag. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 2 (of many) from Al Viro: "Mostly Miklos' series this time" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: constify dcache.c inlined helpers where possible fuse: drop dentry on failed revalidate fuse: clean up return in fuse_dentry_revalidate() fuse: use d_materialise_unique() sysfs: use check_submounts_and_drop() nfs: use check_submounts_and_drop() gfs2: use check_submounts_and_drop() afs: use check_submounts_and_drop() vfs: check unlinked ancestors before mount vfs: check submounts and drop atomically vfs: add d_walk() vfs: restructure d_genocide()
2013-09-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace changes from Eric Biederman: "This is an assorted mishmash of small cleanups, enhancements and bug fixes. The major theme is user namespace mount restrictions. nsown_capable is killed as it encourages not thinking about details that need to be considered. A very hard to hit pid namespace exiting bug was finally tracked and fixed. A couple of cleanups to the basic namespace infrastructure. Finally there is an enhancement that makes per user namespace capabilities usable as capabilities, and an enhancement that allows the per userns root to nice other processes in the user namespace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Kill nsown_capable it makes the wrong thing easy capabilities: allow nice if we are privileged pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD userns: Allow PR_CAPBSET_DROP in a user namespace. namespaces: Simplify copy_namespaces so it is clear what is going on. pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup sysfs: Restrict mounting sysfs userns: Better restrictions on when proc and sysfs can be mounted vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces kernel/nsproxy.c: Improving a snippet of code. proc: Restrict mounting the proc filesystem vfs: Lock in place mounts from more privileged users
2013-09-07Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Nothing major for this kernel, just maintenance updates" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits) apparmor: add the ability to report a sha1 hash of loaded policy apparmor: export set of capabilities supported by the apparmor module apparmor: add the profile introspection file to interface apparmor: add an optional profile attachment string for profiles apparmor: add interface files for profiles and namespaces apparmor: allow setting any profile into the unconfined state apparmor: make free_profile available outside of policy.c apparmor: rework namespace free path apparmor: update how unconfined is handled apparmor: change how profile replacement update is done apparmor: convert profile lists to RCU based locking apparmor: provide base for multiple profiles to be replaced at once apparmor: add a features/policy dir to interface apparmor: enable users to query whether apparmor is enabled apparmor: remove minimum size check for vmalloc() Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes Smack: network label match fix security: smack: add a hash table to quicken smk_find_entry() security: smack: fix memleak in smk_write_rules_list() xattr: Constify ->name member of "struct xattr". ...
2013-09-07NFSv4: Fix security auto-negotiationTrond Myklebust
NFSv4 security auto-negotiation has been broken since commit 4580a92d44e2b21c2254fa5fef0f1bfb43c82318 (NFS: Use server-recommended security flavor by default (NFSv3)) because nfs4_try_mount() will automatically select AUTH_SYS if it sees no auth flavours. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com>
2013-09-07NFS: Clean up nfs_parse_security_flavors()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07NFS: Clean up the auth flavour array messTrond Myklebust
What is the point of having a 'auth_flavor_len' field, if it is always set to 1, and can't be used to determine if the user has selected an auth flavour? This cleanup goes back to using auth_flavor_len for its original intended purpose, and gets rid of the ad-hoc replacements. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07um: hostfs: Fix writebackRichard Weinberger
We have to implement ->release() and trigger writeback from it. Otherwise we might lose dirty pages at munmap(). Signed-off-by: Richard Weinberger <richard@nod.at>
2013-09-06ceph: use d_invalidate() to invalidate aliasesYan, Zheng
d_invalidate() is the standard VFS method to invalidate dentry. compare to d_delete(), it also try shrinking children dentries. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-09-06ceph: remove ceph_lookup_inode()Yan, Zheng
commit 6f60f889 (ceph: fix freeing inode vs removing session caps race) introduced ceph_lookup_inode(). But there is already a ceph_find_inode() which provides similar function. So remove ceph_lookup_inode(), use ceph_find_inode() instead. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <alex.elder@linary.org> Reviewed-by: Sage Weil <sage@inktank.com>
2013-09-06NFSv4.1 Use MDS auth flavor for data server connectionAndy Adamson
Commit 4edaa308 "NFS: Use "krb5i" to establish NFSv4 state whenever possible" uses the nfs_client cl_rpcclient for all state management operations, and will use krb5i or auth_sys with no regard to the mount command authflavor choice. The MDS, as any NFSv4.1 mount point, uses the nfs_server rpc client for all non-state management operations with a different nfs_server for each fsid encountered traversing the mount point, each with a potentially different auth flavor. pNFS data servers are not mounted in the normal sense as there is no associated nfs_server structure. Data servers can also export multiple fsids, each with a potentially different auth flavor. Data servers need to use the same authflavor as the MDS server rpc client for non-state management operations. Populate a list of rpc clients with the MDS server rpc client auth flavor for the DS to use. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-06ceph: trivial buildbot warnings fixMilosz Tanski
The linux-next build bot found a three of warnings, this addresses all of them. * non-ANSI function declaration of function 'ceph_fscache_register' and 'ceph_fscache_unregister' * symbol 'ceph_cache_netfs' was not declared, now it's extern in the header. * warning: "pr_fmt" redefined Signed-off-by: Milosz Tanski <milosz@adfin.com>
2013-09-06ceph: Do not do invalidate if the filesystem is mounted nofscMilosz Tanski
Previously we would always try to enqueue work even if the filesystem is not mounted with fscache enabled (or the file has no cookie). In the case of the filesystem mouned nofsc (but with fscache compiled in) this would lead to a crash. Signed-off-by: Milosz Tanski <milosz@adfin.com>
2013-09-06ceph: page still marked private_2Milosz Tanski
Previous patch that allowed us to cleanup most of the issues with pages marked as private_2 when calling ceph_readpages. However, there seams to be a case in the error case clean up in start read that still trigers this from time to time. I've only seen this one a couple times. BUG: Bad page state in process petabucket pfn:335b82 page:ffffea000cd6e080 count:0 mapcount:0 mapping: (null) index:0x0 page flags: 0x200000000001000(private_2) Call Trace: [<ffffffff81563442>] dump_stack+0x46/0x58 [<ffffffff8112c7f7>] bad_page+0xc7/0x120 [<ffffffff8112cd9e>] free_pages_prepare+0x10e/0x120 [<ffffffff8112e580>] free_hot_cold_page+0x40/0x160 [<ffffffff81132427>] __put_single_page+0x27/0x30 [<ffffffff81132d95>] put_page+0x25/0x40 [<ffffffffa02cb409>] ceph_readpages+0x2e9/0x6f0 [ceph] [<ffffffff811313cf>] __do_page_cache_readahead+0x1af/0x260 Signed-off-by: Milosz Tanski <milosz@adfin.com> Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-06ceph: ceph_readpage_to_fscache didn't check if markedMilosz Tanski
Previously ceph_readpage_to_fscache did not call if page was marked as cached before calling fscache_write_page resulting in a BUG inside of fscache. FS-Cache: Assertion failed ------------[ cut here ]------------ kernel BUG at fs/fscache/page.c:874! invalid opcode: 0000 [#1] SMP Call Trace: [<ffffffffa02e6566>] __ceph_readpage_to_fscache+0x66/0x80 [ceph] [<ffffffffa02caf84>] readpage_nounlock+0x124/0x210 [ceph] [<ffffffffa02cb08d>] ceph_readpage+0x1d/0x40 [ceph] [<ffffffff81126db6>] generic_file_aio_read+0x1f6/0x700 [<ffffffffa02c6fcc>] ceph_aio_read+0x5fc/0xab0 [ceph] Signed-off-by: Milosz Tanski <milosz@adfin.com> Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-06ceph: clean PgPrivate2 on returning from readpagesMilosz Tanski
In some cases the ceph readapages code code bails without filling all the pages already marked by fscache. When we return back to readahead code this causes a BUG. Signed-off-by: Milosz Tanski <milosz@adfin.com>
2013-09-06ceph: use fscache as a local presisent cacheMilosz Tanski
Adding support for fscache to the Ceph filesystem. This would bring it to on par with some of the other network filesystems in Linux (like NFS, AFS, etc...) In order to mount the filesystem with fscache the 'fsc' mount option must be passed. Signed-off-by: Milosz Tanski <milosz@adfin.com> Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-06Merge tag 'fscache-fixes-for-ceph' into wip-fscacheMilosz Tanski
Patches for Ceph FS-Cache support
2013-09-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "The usual trivial updates all over the tree -- mostly typo fixes and documentation updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits) doc: Documentation/cputopology.txt fix typo treewide: Convert retrun typos to return Fix comment typo for init_cma_reserved_pageblock Documentation/trace: Correcting and extending tracepoint documentation mm/hotplug: fix a typo in Documentation/memory-hotplug.txt power: Documentation: Update s2ram link doc: fix a typo in Documentation/00-INDEX Documentation/printk-formats.txt: No casts needed for u64/s64 doc: Fix typo "is is" in Documentations treewide: Fix printks with 0x%# zram: doc fixes Documentation/kmemcheck: update kmemcheck documentation doc: documentation/hwspinlock.txt fix typo PM / Hibernate: add section for resume options doc: filesystems : Fix typo in Documentations/filesystems scsi/megaraid fixed several typos in comments ppc: init_32: Fix error typo "CONFIG_START_KERNEL" treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks page_isolation: Fix a comment typo in test_pages_isolated() doc: fix a typo about irq affinity ...
2013-09-06Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext3, reiserfs, udf & isofs fixes from Jan Kara: "The contains a bunch of ext3 cleanups and minor improvements, major reiserfs locking changes which should hopefully fix deadlocks introduced by BKL removal, and udf/isofs changes to refuse mounting fs rw instead of mounting it ro automatically which makes eject button work as expected for all media (see the changelog for why userspace should be ok with this change)" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: jbd: use a single printk for jbd_debug() reiserfs: locking, release lock around quota operations reiserfs: locking, handle nested locks properly reiserfs: locking, push write lock out of xattr code jbd: relocate assert after state lock in journal_commit_transaction() udf: Refuse RW mount of the filesystem instead of making it RO udf: Standardize return values in mount sequence isofs: Refuse RW mount of the filesystem instead of making it RO ext3: allow specifying external journal by pathname mount option jbd: remove unneeded semicolon
2013-09-06Merge tag 'for-f2fs-3.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set includes the following major enhancement patches: - support inline xattrs - add sysfs support to control GCs explicitly - add proc entry to show the current segment usage information - improve the GC/SSR performance The other bug fixes are as follows: - avoid the overflow on status calculation - fix some error handling routines - fix inconsistent xattr states after power-off-recovery - fix incorrect xattr node offset definition - fix deadlock condition in fsync - fix the fdatasync routine for power-off-recovery" * tag 'for-f2fs-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: optimize gc for better performance f2fs: merge more bios of node block writes f2fs: avoid an overflow during utilization calculation f2fs: trigger GC when there are prefree segments f2fs: use strncasecmp() simplify the string comparison f2fs: fix omitting to update inode page f2fs: support the inline xattrs f2fs: add the truncate_xattr_node function f2fs: introduce __find_xattr for readability f2fs: reserve the xattr space dynamically f2fs: add flags for inline xattrs f2fs: fix error return code in init_f2fs_fs() f2fs: fix wrong BUG_ON condition f2fs: fix memory leak when init f2fs filesystem fail f2fs: fix a compound statement label error f2fs: avoid writing inode redundantly when creating a file f2fs: alloc_page() doesn't return an ERR_PTR f2fs: should cover i_xattr_nid with its xattr node page lock f2fs: check the free space first in new_node_page f2fs: clean up the needless end 'return' of void function ...
2013-09-06NFS: Don't check lock owner compatability unless file is locked (part 2)Trond Myklebust
When coalescing requests into a single READ or WRITE RPC call, and there is no file locking involved, we don't have to refuse coalescing for requests where the lock owner information doesn't match. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-06fscache: Netfs function for cleanup post readpagesMilosz Tanski
Currently the fscache code expect the netfs to call fscache_readpages_or_alloc inside the aops readpages callback. It marks all the pages in the list provided by readahead with PG_private_2. In the cases that the netfs fails to read all the pages (which is legal) it ends up returning to the readahead and triggering a BUG. This happens because the page list still contains marked pages. This patch implements a simple fscache_readpages_cancel function that the netfs should call before returning from readpages. It will revoke the pages from the underlying cache backend and unmark them. The problem was originally worked out in the Ceph devel tree, but it also occurs in CIFS. It appears that NFS, AFS and 9P are okay as read_cache_pages() will clean up the unprocessed pages in the case of an error. This can be used to address the following oops: [12410647.597278] BUG: Bad page state in process petabucket pfn:3d504e [12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping: (null) index:0x0 [12410647.597298] page flags: 0x200000000001000(private_2) ... [12410647.597334] Call Trace: [12410647.597345] [<ffffffff815523f2>] dump_stack+0x19/0x1b [12410647.597356] [<ffffffff8111def7>] bad_page+0xc7/0x120 [12410647.597359] [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120 [12410647.597361] [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170 [12410647.597363] [<ffffffff81123507>] __put_single_page+0x27/0x30 [12410647.597365] [<ffffffff81123df5>] put_page+0x25/0x40 [12410647.597376] [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph] [12410647.597379] [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260 [12410647.597382] [<ffffffff81122ea1>] ra_submit+0x21/0x30 [12410647.597384] [<ffffffff81118f64>] filemap_fault+0x254/0x490 [12410647.597387] [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0 [12410647.597391] [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0 [12410647.597395] [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0 [12410647.597398] [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930 [12410647.597401] [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110 [12410647.597403] [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10 [12410647.597405] [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [12410647.597407] [<ffffffff8113f361>] handle_mm_fault+0x251/0x370 [12410647.597411] [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30 [12410647.597414] [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550 [12410647.597418] [<ffffffff8108011d>] ? up_write+0x1d/0x20 [12410647.597422] [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0 [12410647.597425] [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240 [12410647.597427] [<ffffffff8155c3ae>] do_page_fault+0xe/0x10 [12410647.597431] [<ffffffff81558818>] page_fault+0x28/0x30 Signed-off-by: Milosz Tanski <milosz@adfin.com> Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-06CacheFiles: Implement interface to check cache consistencyDavid Howells
Implement the FS-Cache interface to check the consistency of a cache object in CacheFiles. Original-author: Hongyi Jia <jiayisuse@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Hongyi Jia <jiayisuse@gmail.com> cc: Milosz Tanski <milosz@adfin.com>
2013-09-06FS-Cache: Add interface to check consistency of a cached objectDavid Howells
Extend the fscache netfs API so that the netfs can ask as to whether a cache object is up to date with respect to its corresponding netfs object: int fscache_check_consistency(struct fscache_cookie *cookie) This will call back to the netfs to check whether the auxiliary data associated with a cookie is correct. It returns 0 if it is and -ESTALE if it isn't; it may also return -ENOMEM and -ERESTARTSYS. The backends now have to implement a mandatory operation pointer: int (*check_consistency)(struct fscache_object *object) that corresponds to the above API call. FS-Cache takes care of pinning the object and the cookie in memory and managing this call with respect to the object state. Original-author: Hongyi Jia <jiayisuse@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Hongyi Jia <jiayisuse@gmail.com> cc: Milosz Tanski <milosz@adfin.com>
2013-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc changes from David Miller: "Several bug fixes (from Kirill Tkhai, Geery Uytterhoeven, and Alexey Dobriyan) and some support for Fujitsu sparc64x chips (from Allen Pais)" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Export flush_ptrace_access() (needed by lustre) sparc: fix PCI device proc file mmap(2) sparc64: Remove RWSEM export leftovers sparc64: Fix off by one in trampoline TLB mapping installation loop. sparc64: Fix ITLB handler of null page esp_scsi: Fix tag state corruption when autosensing. sparc64: Fix not SRA'ed %o5 in 32-bit traced syscall sparc64: cleanup: Rename ret_from_syscall to ret_from_fork sparc32: Fix exit flag passed from traced sys_sigreturn sparc64: Fix wrong syscall return value passed to trace_sys_exit() support sparc64x chip type in cpumap.c cpu hw caps support for sparc64x
2013-09-05NFS: Don't check lock owner compatibility in writes unless file is lockedTrond Myklebust
If we're doing buffered writes, and there is no file locking involved, then we don't have to worry about whether or not the lock owner information is identical. By relaxing this check, we ensure that fork()ed child processes can write to a page without having to first sync dirty data that was written by the parent to disk. Reported-by: Quentin Barnes <qbarnes@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Quentin Barnes <qbarnes@gmail.com>
2013-09-05fuse: drop dentry on failed revalidateAnand Avati
Drop a subtree when we find that it has moved or been delated. This can be done as long as there are no submounts under this location. If the directory was moved and we come across the same directory in a future lookup it will be reconnected by d_materialise_unique(). Signed-off-by: Anand Avati <avati@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05fuse: clean up return in fuse_dentry_revalidate()Miklos Szeredi
On errors unrelated to the filesystem's state (ENOMEM, ENOTCONN) return the error itself from ->d_revalidate() insted of returning zero (invalid). Also make a common label for invalidating the dentry. This will be used by the next patch. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05fuse: use d_materialise_unique()Miklos Szeredi
Use d_materialise_unique() instead of d_splice_alias(). This allows dentry subtrees to be moved to a new place if there moved, even if something is referencing a dentry in the subtree (open fd, cwd, etc..). This will also allow us to drop a subtree if it is found to be replaced by something else. In this case the disconnected subtree can later be reconnected to its new location. d_materialise_unique() ensures that a directory entry only ever has one alias. We keep fc->inst_mutex around the calls for d_materialise_unique() on directories to prevent a race with mkdir "stealing" the inode. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05sysfs: use check_submounts_and_drop()Miklos Szeredi
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically. check_submounts_and_drop() can deal with negative dentries and non-directories as well. Non-directories can also be mounted on. And just like directories we don't want these to disappear with invalidation. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05nfs: use check_submounts_and_drop()Miklos Szeredi
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically. check_submounts_and_drop() can deal with negative dentries and non-directories as well. Non-directories can also be mounted on. And just like directories we don't want these to disappear with invalidation. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05gfs2: use check_submounts_and_drop()Miklos Szeredi
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically. check_submounts_and_drop() can deal with negative dentries and non-directories as well. Non-directories can also be mounted on. And just like directories we don't want these to disappear with invalidation. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05afs: use check_submounts_and_drop()Miklos Szeredi
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically. check_submounts_and_drop() can deal with negative dentries as well. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05vfs: check unlinked ancestors before mountMiklos Szeredi
We check submounts before doing d_drop() on a non-empty directory dentry in NFS (have_submounts()), but we do not exclude a racing mount. Nor do we prevent mounts to be added to the disconnected subtree using relative paths after the d_drop(). This patch fixes these issues by checking for unlinked (unhashed, non-root) ancestors before proceeding with the mount. This is done with rename seqlock taken for write and with ->d_lock grabbed on each ancestor in turn, including our dentry itself. This ensures that the only one of check_submounts_and_drop() or has_unlinked_ancestor() can succeed. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05vfs: check submounts and drop atomicallyMiklos Szeredi
We check submounts before doing d_drop() on a non-empty directory dentry in NFS (have_submounts()), but we do not exclude a racing mount. Process A: have_submounts() -> returns false Process B: mount() -> success Process A: d_drop() This patch prepares the ground for the fix by doing the following operations all under the same rename lock: have_submounts() shrink_dcache_parent() d_drop() This is actually an optimization since have_submounts() and shrink_dcache_parent() both traverse the same dentry tree separately. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: David Howells <dhowells@redhat.com> CC: Steven Whitehouse <swhiteho@redhat.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05vfs: add d_walk()Miklos Szeredi
This one replaces three instances open coded tree walking (have_submounts, select_parent, d_genocide) with a common helper. In addition to slightly reducing the kernel size, this simplifies the callers and makes them less bug prone. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05vfs: restructure d_genocide()Miklos Szeredi
It shouldn't matter when we decrement the refcount during the walk as long as we do it exactly once. Restructure d_genocide() to do the killing on entering the dentry instead of when leaving it. This helps creating a common helper for tree walking. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05sparc: fix PCI device proc file mmap(2)Alexey Dobriyan
Commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in /proc entries" must have broken mmapping of PCI device proc files on Sparc. Notice how it adds wrapper around ->mmap but doesn't do it around ->get_unmapped_area. Add wrapper around ->get_unmapped_area. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 1 from Al Viro: "Unfortunately, this merge window it'll have a be a lot of small piles - my fault, actually, for not keeping #for-next in anything that would resemble a sane shape ;-/ This pile: assorted fixes (the first 3 are -stable fodder, IMO) and cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last components) + several long-standing patches from various folks. There definitely will be a lot more (starting with Miklos' check_submount_and_drop() series)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) direct-io: Handle O_(D)SYNC AIO direct-io: Implement generic deferred AIO completions add formats for dentry/file pathnames kvm eventfd: switch to fdget powerpc kvm: use fdget switch fchmod() to fdget switch epoll_ctl() to fdget switch copy_module_from_fd() to fdget git simplify nilfs check for busy subtree ibmasmfs: don't bother passing superblock when not needed don't pass superblock to hypfs_{mkdir,create*} don't pass superblock to hypfs_diag_create_files don't pass superblock to hypfs_vm_create_files() oprofile: get rid of pointless forward declarations of struct super_block oprofilefs_create_...() do not need superblock argument oprofilefs_mkdir() doesn't need superblock argument don't bother with passing superblock to oprofile_create_stats_files() oprofile: don't bother with passing superblock to ->create_files() don't bother passing sb to oprofile_create_files() coh901318: don't open-code simple_read_from_buffer() ...
2013-09-05nfs4: Map NFS4ERR_WRONG_CRED to EPERMWeston Andros Adamson
Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED write and commit supportWeston Andros Adamson
WRITE and COMMIT can use the machine credential. If WRITE is supported and COMMIT is not, make all (mach cred) writes FILE_SYNC4. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED stateid supportWeston Andros Adamson
TEST_STATEID and FREE_STATEID can use the machine credential. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED secinfo supportWeston Andros Adamson
SECINFO and SECINFO_NONAME can use the machine credential. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED cleanup supportWeston Andros Adamson
CLOSE and LOCKU can use the machine credential. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add state protection handlerWeston Andros Adamson
Add nfs4_state_protect - the function responsible for switching to the machine credential and the correct rpc client when SP4_MACH_CRED is in use. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Minimal SP4_MACH_CRED implementationWeston Andros Adamson
This is a minimal client side implementation of SP4_MACH_CRED. It will attempt to negotiate SP4_MACH_CRED iff the EXCHANGE_ID is using krb5i or krb5p auth. SP4_MACH_CRED will be used if the server supports the minimal operations: BIND_CONN_TO_SESSION EXCHANGE_ID CREATE_SESSION DESTROY_SESSION DESTROY_CLIENTID This patch only includes the EXCHANGE_ID negotiation code because the client will already use the machine cred for these operations. If the server doesn't support SP4_MACH_CRED or doesn't support the minimal operations, the exchange id will be resent with SP4_NONE. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05GFS2: dirty inode correctly in gfs2_write_endBenjamin Marzinski
GFS2 was only setting I_DIRTY_DATASYNC on files that it wrote to, when it actually increased the file size. If gfs2_fsync was called without I_DIRTY_DATASYNC set, it didn't flush the incore data to the log before returning, so any metadata or journaled data changes were not getting fsynced. This meant that writes to the middle of files were not always getting fsynced properly. This patch makes gfs2 set I_DIRTY_DATASYNC whenever metadata has been updated during a write. It also make gfs2_sync flush the incore log if I_DIRTY_PAGES is set, and the file is using data journalling. This will make sure that all incore logged data gets written to disk before returning from a fsync. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-05GFS2: Don't flag consistency error if first mounter is a spectatorBob Peterson
This patch checks for the first mounter being a specator. If so, it makes sure all the journals are clean. If there's a dirty journal, the mount fails. Testing results: # insmod gfs2.ko # mount -tgfs2 -o spectator /dev/sasdrives/scratch /mnt/gfs2 mount: permission denied # dmesg | tail -2 [ 3390.655996] GFS2: fsid=MUSKETEER:home: Now mounting FS... [ 3390.841336] GFS2: fsid=MUSKETEER:home.s: jid=0: Journal is dirty, so the first mounter must not be a spectator. # mount -tgfs2 /dev/sasdrives/scratch /mnt/gfs2 # umount /mnt/gfs2 # mount -tgfs2 -o spectator /dev/sasdrives/scratch /mnt/gfs2 # ls /mnt/gfs2|wc -l 352 # umount /mnt/gfs2 Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-05f2fs: optimize gc for better performanceJin Xu
This patch improves the gc efficiency by optimizing the victim selection policy. With this optimization, the random re-write performance could increase up to 20%. For f2fs, when disk is in shortage of free spaces, gc will selects dirty segments and moves valid blocks around for making more space available. The gc cost of a segment is determined by the valid blocks in the segment. The less the valid blocks, the higher the efficiency. The ideal victim segment is the one that has the most garbage blocks. Currently, it searches up to 20 dirty segments for a victim segment. The selected victim is not likely the best victim for gc when there are much more dirty segments. Why not searching more dirty segments for a better victim? The cost of searching dirty segments is negligible in comparison to moving blocks. In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make the search more aggressively for a possible better victim. Since it also applies to victim selection for SSR, it will likely improve the SSR efficiency as well. The test case is simple. It creates as many files until the disk full. The size for each file is 32KB. Then it writes as many as 100000 records of 4KB size to random offsets of random files in sync mode. The testing was done on a 2GB partition of a SDHC card. Let's see the test result of f2fs without and with the patch. --------------------------------------- 2GB partition, SDHC create 52023 files of size 32768 bytes random re-write 100000 records of 4KB --------------------------------------- | file creation (s) | rewrite time (s) | gc count | gc garbage blocks | [no patch] 341 4227 1174 174840 [patched] 324 2958 645 106682 It's obvious that, with the patch, f2fs finishes the test in 20+% less time than without the patch. And internally it does much less gc with higher efficiency than before. Since the performance improvement is related to gc, it might not be so obvious for other tests that do not trigger gc as often as this one ( This is because f2fs selects dirty segments for SSR use most of the time when free space is in shortage). The well-known iozone test tool was not used for benchmarking the patch becuase it seems do not have a test case that performs random re-write on a full disk. This patch is the revised version based on the suggestion from Jaegeuk Kim. Signed-off-by: Jin Xu <jinuxstyle@gmail.com> [Jaegeuk Kim: suggested simpler solution] Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>