summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-03-11NFS: load the rpc/rdma transport module automaticallyTom Talpey
When mounting an NFS/RDMA server with the "-o proto=rdma" or "-o rdma" options, attempt to dynamically load the necessary "xprtrdma" client transport module. Doing so improves usability, while avoiding a static module dependency and any unnecesary resources. Signed-off-by: Tom Talpey <tmtalpey@gmail.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFS: Kill the "defined but not used" compile error on nommu machinesTrond Myklebust
Bryan Wu reports that when compiling NFS on nommu machines he gets a "defined but not used" error on nfs_file_mmap(). The easiest fix is simply to get rid of the special casing in NFS, and just always call generic_file_mmap() to set up the file. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11Merge branch 'master' of ↵Felix Blyakher
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
2009-03-11NFS: Throttle page dirtying while we're flushing to diskTrond Myklebust
The following patch is a combination of a patch by myself and Peter Staubach. Trond: If we allow other processes to dirty pages while a process is doing a consistency sync to disk, we can end up never making progress. Peter: Attached is a patch which addresses a continuing problem with the NFS client generating out of order WRITE requests. While this is compliant with all of the current protocol specifications, there are servers in the market which can not handle out of order WRITE requests very well. Also, this may lead to sub-optimal block allocations in the underlying file system on the server. This may cause the read throughputs to be reduced when reading the file from the server. Peter: There has been a lot of work recently done to address out of order issues on a systemic level. However, the NFS client is still susceptible to the problem. Out of order WRITE requests can occur when pdflush is in the middle of writing out pages while the process dirtying the pages calls generic_file_buffered_write which calls generic_perform_write which calls balance_dirty_pages_rate_limited which ends up calling writeback_inodes which ends up calling back into the NFS client to writes out dirty pages for the same file that pdflush happens to be working with. Signed-off-by: Peter Staubach <staubach@redhat.com> [modification by Trond to merge the two similar patches] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFS: cleanup - remove struct nfs_inode->ncommitTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFSv4: Simplify some cache consistency post-op GETATTRsTrond Myklebust
Certain asynchronous operations such as write() do not expect (or care) that other metadata such as the file owner, mode, acls, ... change. All they want to do is update and/or check the change attribute, ctime, and mtime. By skipping the file owner and group update, we also avoid having to do a potential idmapper upcall for these asynchronous RPC calls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFSv4: A referral is assumed to always point to a directory.Trond Myklebust
Fix a bug whereby we would fail to create a mount point for a referral. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFSv4: Make decode_getfattr() set fattr->valid to reflect what was decodedTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFSv4: Clean up decode_getfattr()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFS: Fix the type of struct nfs_fattr->modeTrond Myklebust
There is no point in using anything other than umode_t, since we copy the content pretty much directly into inode->i_mode. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFS: Shrink the struct nfs_fattrTrond Myklebust
We don't need the bitmap[] field anymore, since the 'valid' field tells us all we need to know about which attributes were filled in... Also move the pre-op attributes in order to improve the structure packing. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFSv4: Support NFSv4 optional attributes in the struct nfs_fattrTrond Myklebust
Currently, filling struct nfs_fattr is more or less an all or nothing operation, since NFSv2 and NFSv3 have only mandatory attributes. In NFSv4, some attributes are optional, and so we may simply not be able to fill in those fields. Furthermore, NFSv4 allows you to specify which attributes you are interested in retrieving, thus permitting you to optimise away retrieval of attributes that you know will no change... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFSv4: Ignore errors on the post-op attributes in SETATTR callsTrond Myklebust
There is no need to fail or retry a SETATTR call just because the post-op GETATTR failed. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFS: flush cached directory information slightly more readily.NeilBrown
If cached directory contents becomes incorrect, there is no way to flush the contents. This contrasts with files where file locking is the recommended way to ensure cache consistency between multiple applications (a read-lock always flushes the cache). Also while changes to files often change the size of the file (thus triggering a cache flush), changes to directories often do not change the apparent size (as the size is often rounded to a block size). So it is particularly important with directories to avoid the possibility of an incorrect cache wherever possible. When the link count on a directory changes it implies a change in the number of child directories, and so a change in the contents of this directory. So use that as a trigger to flush cached contents. When the ctime changes but the mtime does not, there are two possible reasons. 1/ The owner/mode information has been changed. 2/ utimes has been used to set the mtime backwards. In the first case, a data-cache flush is not required. In the second case it is. So on the basis that correctness trumps performance, flush the directory contents cache in this case also. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11NFS: Minor __nfs_revalidate_inode cleanupSuresh Jayaraman
Remove redundant NFS_STALE() check, a leftover due to the commit 691beb13cdc88358334ef0ba867c080a247a760f Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11dlm: fix length calculation in compat codeDavid Teigland
Using offsetof() to calculate name length does not work because it does not produce consistent results with with structure packing. This caused memcpy to corrupt memory by copying 4 extra bytes off the end of the buffer on 64 bit kernels with 32 bit userspace (the only case where this 32/64 compat code is used). The fix is to calculate name length directly from the start instead of trying to derive it later using count and offsetof. Signed-off-by: David Teigland <teigland@redhat.com>
2009-03-11dlm: ignore cancel on granted lockDavid Teigland
Return immediately from dlm_unlock(CANCEL) if the lock is granted and not being converted; there's nothing to cancel. Signed-off-by: David Teigland <teigland@redhat.com>
2009-03-11dlm: clear defunct cancel stateDavid Teigland
When a conversion completes successfully and finds that a cancel of the convert is still in progress (which is now a moot point), preemptively clear the state associated with outstanding cancel. That state could cause a subsequent conversion to be ignored. Also, improve the consistency and content of error and debug messages in this area. Signed-off-by: David Teigland <teigland@redhat.com>
2009-03-11dlm: replace idr with hash table for connectionsChristine Caulfield
Integer nodeids can be too large for the idr code; use a hash table instead. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2009-03-11proc: fix kflags to uflags copying in /proc/kpageflagsWu Fengguang
Fix kpf_copy_bit(src,dst) to be kpf_copy_bit(dst,src) to match the actual call patterns, e.g. kpf_copy_bit(kflags, KPF_LOCKED, PG_locked). This misplacement of src/dst only affected reporting of PG_writeback, PG_reclaim and PG_buddy. For others kflags==uflags so not affected. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-10Bug 11061, NFS mounts droppedIan Dall
Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=11061 sockaddr structures can't be reliably compared using memcmp() because there are padding bytes in the structure which can't be guaranteed to be the same even when the sockaddr structures refer to the same socket. Instead compare all the relevant fields. In the case of IPv6 sin6_flowinfo is not compared because it only affects QoS and sin6_scope_id is only compared if the address is "link local" because "link local" addresses need only be unique to a specific link. Signed-off-by: Ian Dall <ian@beware.dropbear.id.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-10NFS: Handle -ESTALE error in access()Suresh Jayaraman
Hi Trond, I have been looking at a bugreport where trying to open applications on KDE on a NFS mounted home fails temporarily. There have been multiple reports on different kernel versions pointing to this common issue: http://bugzilla.kernel.org/show_bug.cgi?id=12557 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/269954 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508866.html This issue can be reproducible consistently by doing this on a NFS mounted home (KDE): 1. Open 2 xterm sessions 2. From one of the xterm session, do "ssh -X <remote host>" 3. "stat ~/.Xauthority" on the remote SSH session 4. Close the two xterm sessions 5. On the server do a "stat ~/.Xauthority" 6. Now on the client, try to open xterm This will fail. Even if the filehandle had become stale, the NFS client should invalidate the cache/inode and should repeat LOOKUP. Looking at the packet capture when the failure occurs shows that there were two subsequent ACCESS() calls with the same filehandle and both fails with -ESTALE error. I have tested the fix below. Now the client issue a LOOKUP after the ACCESS() call fails with -ESTALE. If all this makes sense to you, can you consider this for inclusion? Thanks, If the server returns an -ESTALE error due to stale filehandle in response to an ACCESS() call, we need to invalidate the cache and inode so that LOOKUP() can be retried. Without this change, the nfs client retries ACCESS() with the same filehandle, fails again and could lead to temporary failure of applications running on nfs mounted home. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-10NLM: Fix GRANT callback address comparison when IPv6 is enabledChuck Lever
The NFS mount command may pass an AF_INET server address to lockd. If lockd happens to be using a PF_INET6 listener, the nlm_cmp_addr() in nlmclnt_grant() will fail to match requests from that host because they will all have a mapped IPv4 AF_INET6 address. Adopt the same solution used in nfs_sockaddr_match_ipaddr() for NFSv4 callbacks: if either address is AF_INET, map it to an AF_INET6 address before doing the comparison. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-10NFSv3: Fix posix ACL codeTrond Myklebust
Fix a memory leak due to allocation in the XDR layer. In cases where the RPC call needs to be retransmitted, we end up allocating new pages without clearing the old ones. Fix this by moving the allocation into nfs3_proc_setacls(). Also fix an issue discovered by Kevin Rudd, whereby the amount of memory reserved for the acls in the xdr_buf->head was miscalculated, and causing corruption. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-10NFS: Fix misparsing of nfsv4 fs_locations attribute (take 2)Trond Myklebust
The changeset ea31a4437c59219bf3ea946d58984b01a45a289c (nfs: Fix misparsing of nfsv4 fs_locations attribute) causes the mountpath that is calculated at the beginning of try_location() to be clobbered when we later strncpy a non-nul terminated hostname using an incorrect buffer length. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-10devpts: remove graffitiAlexey Dobriyan
Very annoying when working with containters. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-10ext4: fix header check in ext4_ext_search_right() for deep extent trees.Eric Sandeen
The ext4_ext_search_right() function is confusing; it uses a "depth" variable which is 0 at the root and maximum at the leaves, but the on-disk metadata uses a "depth" (actually eh_depth) which is opposite: maximum at the root, and 0 at the leaves. The ext4_ext_check_header() function is given a depth and checks the header agaisnt that depth; it expects the on-disk semantics, but we are giving it the opposite in the while loop in this function. We should be giving it the on-disk notion of "depth" which we can get from (p_depth - depth) - and if you look, the last (more commonly hit) call to ext4_ext_check_header() does just this. Sending in the wrong depth results in (incorrect) messages about corruption: EXT4-fs error (device sdb1): ext4_ext_search_right: bad header in inode #2621457: unexpected eh_depth - magic f30a, entries 340, max 340(0), depth 1(2) http://bugzilla.kernel.org/show_bug.cgi?id=12821 Reported-by: David Dindorp <ddi@dubex.dk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-03-10Merge branches 'tracing/ftrace', 'tracing/textedit' and 'linus' into ↵Ingo Molnar
tracing/core
2009-03-10Btrfs: Clear space_info full when adding new devicesChris Mason
The full flag on the space info structs tells the allocator not to try and allocate more chunks because the devices in the FS are fully allocated. When more devices are added, we need to clear the full flag so the allocator knows it has more space available. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-03-10Btrfs: Fix locking around adding new space_infoChris Mason
Storage allocated to different raid levels in btrfs is tracked by a btrfs_space_info structure, and all of the current space_infos are collected into a list_head. Most filesystems have 3 or 4 of these structs total, and the list is only changed when new raid levels are added or at unmount time. This commit adds rcu locking on the list head, and properly frees things at unmount time. It also clears the space_info->full flag whenever new space is added to the FS. The locking for the space info list goes like this: reads: protected by rcu_read_lock() writes: protected by the chunk_mutex At unmount time we don't need special locking because all the readers are gone. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-03-10Merge branches 'tracing/doc', 'tracing/ftrace', 'tracing/printk' and 'linus' ↵Ingo Molnar
into tracing/core
2009-03-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix spinlock assertions on UP systems
2009-03-09Btrfs: fix spinlock assertions on UP systemsChris Mason
btrfs_tree_locked was being used to make sure a given extent_buffer was properly locked in a few places. But, it wasn't correct for UP compiled kernels. This switches it to using assert_spin_locked instead, and renames it to btrfs_assert_tree_locked to better reflect how it was really being used. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-03-08Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix ext4_free_inode() vs. ext4_claim_inode() race
2009-03-08UBIFS: amend key_hash return valueArtem Bityutskiy
... which should be uint32_t, not int. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-03-08UBIFS: improve find function interfaceArtem Bityutskiy
Make 'ubifs_find_free_space()' return offset where free space starts, rather than the amount of free space. This is just more appropriat for its caller. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-03-06xfs: only issues a cache flush on unmount if barriers are enabledChristoph Hellwig
Currently we unconditionally issue a flush from xfs_free_buftarg, but since 2.6.29-rc1 this gives a warning in the style of end_request: I/O error, dev vdb, sector 0 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-03-06xfs: prevent lockdep false positive in xfs_iget_cache_missChristoph Hellwig
The inode can't be locked by anyone else as we just created it a few lines above and it's not been added to any lookup data structure yet. So use a trylock that must succeed to get around the lockdep warnings. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-03-06xfs: prevent kernel crash due to corrupted inode log formatChristoph Hellwig
Andras Korn reported an oops on log replay causes by a corrupted xfs_inode_log_format_t passing a 0 size to kmem_zalloc. This patch handles to small or too large numbers of log regions gracefully by rejecting the log replay with a useful error message. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Andras Korn <korn-sgi.com@chardonnay.math.bme.hu> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-03-06xfs: include header files for prototypesHannes Eder
Fix this sparse warnings: fs/xfs/linux-2.6/xfs_ioctl.c:72:1: warning: symbol 'xfs_find_handle' was not declared. Should it be static? fs/xfs/linux-2.6/xfs_ioctl.c:249:1: warning: symbol 'xfs_open_by_handle' was not declared. Should it be static? fs/xfs/linux-2.6/xfs_ioctl.c:361:1: warning: symbol 'xfs_readlink_by_handle' was not declared. Should it be static? fs/xfs/linux-2.6/xfs_ioctl.c:496:1: warning: symbol 'xfs_attrmulti_attr_get' was not declared. Should it be static? fs/xfs/linux-2.6/xfs_ioctl.c:525:1: warning: symbol 'xfs_attrmulti_attr_set' was not declared. Should it be static? fs/xfs/linux-2.6/xfs_ioctl.c:555:1: warning: symbol 'xfs_attrmulti_attr_remove' was not declared. Should it be static? fs/xfs/linux-2.6/xfs_ioctl.c:657:1: warning: symbol 'xfs_ioc_space' was not declared. Should it be static? fs/xfs/linux-2.6/xfs_ioctl.c:1340:1: warning: symbol 'xfs_file_ioctl' was not declared. Should it be static? fs/xfs/support/debug.c:65:1: warning: symbol 'xfs_fs_vcmn_err' was not declared. Should it be static? fs/xfs/support/debug.c:112:1: warning: symbol 'xfs_hex_dump' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-03-06xfs: make symbols staticHannes Eder
Instead of the keyword 'static' the macro 'STATIC' is used, so the symbols are still global with CONFIG_XFS_DEBUG. Fix this sparse warnings: fs/xfs/linux-2.6/xfs_super.c:638:1: warning: symbol 'xfs_blkdev_get' was not declared. Should it be static? fs/xfs/linux-2.6/xfs_super.c:655:1: warning: symbol 'xfs_blkdev_put' was not declared. Should it be static? fs/xfs/linux-2.6/xfs_super.c:876:1: warning: symbol 'xfsaild' was not declared. Should it be static? fs/xfs/xfs_bmap.c:6208:1: warning: symbol 'xfs_check_block' was not declared. Should it be static? fs/xfs/xfs_dir2_leaf.c:553:1: warning: symbol 'xfs_dir2_leaf_check' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-03-06xfs: move declaration to header fileHannes Eder
Fix this sparse warning: fs/xfs/xfs_da_btree.c:1550:26: warning: symbol 'xfs_default_nameops' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-03-05Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tokenring/tmspci.c drivers/net/ucc_geth_mii.c
2009-03-05Squashfs: frag_size should be signed, as it can hold an error resultRoel Kluin
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2009-03-04ext4: Use struct flex_groups to calculate get_orlov_stats()Theodore Ts'o
Instead of looping over all of the block groups in a flex group summing their summary statistics, start tracking used_dirs in struct flex_groups, and use struct flex_groups instead. This should save a bit of CPU for mkdir-heavy workloads. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-03-05Squashfs: Fix oops when reading fsfuzzer corrupted filesystemsPhillip Lougher
This fixes a code regression caused by the recent mainlining changes. The recent code changes call zlib_inflate repeatedly, decompressing into separate 4K buffers, this code didn't check for the possibility that zlib_inflate might ask for too many buffers when decompressing corrupted data. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2009-03-04ext4: Use atomic_t's in struct flex_groupsTheodore Ts'o
Reduce pressure on the sb_bgl_lock family of locks by using atomic_t's to track the number of free blocks and inodes in each flex_group. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-03-31ext4: remove /proc tuning knobsTheodore Ts'o
Remove tuning knobs in /proc/fs/ext4/<dev/* since they have been replaced by knobs in sysfs at /sys/fs/ext4/<dev>/*. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-03-31ext4: Add sysfs supportTheodore Ts'o
Add basic sysfs support so that information about the mounted filesystem and various tuning parameters can be accessed via /sys/fs/ext4/<dev>/*. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-03-04ext4: fix ext4_free_inode() vs. ext4_claim_inode() raceEric Sandeen
I was seeing fsck errors on inode bitmaps after a 4 thread dbench run on a 4 cpu machine: Inode bitmap differences: -50736 -(50752--50753) etc... I believe that this is because ext4_free_inode() uses atomic bitops, and although ext4_new_inode() *used* to also use atomic bitops for synchronization, commit 393418676a7602e1d7d3f6e560159c65c8cbd50e changed this to use the sb_bgl_lock, so that we could also synchronize against read_inode_bitmap and initialization of uninit inode tables. However, that change left ext4_free_inode using atomic bitops, which I think leaves no synchronization between setting & unsetting bits in the inode table. The below patch fixes it for me, although I wonder if we're getting at all heavy-handed with this spinlock... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>