summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-10-02btrfs: hooks for readaheadArne Jansen
This adds the hooks needed for readahead. In the readpage_end_io_hook, the extent state is checked for the EXTENT_READAHEAD flag. Only in this case the readahead hook is called, to keep the impact on non-ra as low as possible. Additionally, a hook for a failed IO is added, otherwise readahead would wait indefinitely for the extent to finish. Changes for v2: - eliminate race condition Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-10-02btrfs: initial readahead code and prototypesArne Jansen
This is the implementation for the generic read ahead framework. To trigger a readahead, btrfs_reada_add must be called. It will start a read ahead for the given range [start, end) on tree root. The returned handle can either be used to wait on the readahead to finish (btrfs_reada_wait), or to send it to the background (btrfs_reada_detach). The read ahead works as follows: On btrfs_reada_add, the root of the tree is inserted into a radix_tree. reada_start_machine will then search for extents to prefetch and trigger some reads. When a read finishes for a node, all contained node/leaf pointers that lie in the given range will also be enqueued. The reads will be triggered in sequential order, thus giving a big win over a naive enumeration. It will also make use of multi-device layouts. Each disk will have its on read pointer and all disks will by utilized in parallel. Also will no two disks read both sides of a mirror simultaneously, as this would waste seeking capacity. Instead both disks will read different parts of the filesystem. Any number of readaheads can be started in parallel. The read order will be determined globally, i.e. 2 parallel readaheads will normally finish faster than the 2 started one after another. Changes v2: - protect root->node by transaction instead of node_lock - fix missed branches: The readahead had a too simple check to determine if a branch from a node should be checked or not. It now also records the upper bound of each node to see if the requested RA range lies within. - use KERN_CONT to debug output, to avoid line breaks - defer reada_start_machine to worker to avoid deadlock Changes v3: - protect root->node by rcu Changes v5: - changed EIO-semantics of reada_tree_block_flagged - remove spin_lock from reada_control and make elems an atomic_t - remove unused read_total from reada_control - kill reada_key_cmp, use btrfs_comp_cpu_keys instead - use kref-style release functions where possible - return struct reada_control * instead of void * from btrfs_reada_add Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-10-02btrfs: state information for readaheadArne Jansen
Add state information for readahead to btrfs_fs_info and btrfs_device Changes v2: - don't wait in radix_trees - add own set of workers for readahead Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-10-02btrfs: add READAHEAD extent buffer flagArne Jansen
Add a READAHEAD extent buffer flag. Add a function to trigger a read with this flag set. Changes v2: - use extent buffer flags instead of extent state flags Changes v5: - adapt to changed read_extent_buffer_pages interface - don't return eb from reada_tree_block_flagged if it has CORRUPT flag set Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-10-02btrfs: add an extra wait mode to read_extent_buffer_pagesArne Jansen
read_extent_buffer_pages currently has two modes, either trigger a read without waiting for anything, or wait for the I/O to finish. The former also bails when it's unable to lock the page. This patch now adds an additional parameter to allow it to block on page lock, but don't wait for completion. Changes v5: - merge the 2 wait parameters into one and define WAIT_NONE, WAIT_COMPLETE and WAIT_PAGE_LOCK Change v6: - fix bug introduced in v5 Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-09-30Merge branch 'btrfs-3.0' into for-linusChris Mason
2011-09-30Btrfs: force a page fault if we have a shorty copy on a page boundaryJosef Bacik
A user reported a problem where ceph was getting into 100% cpu usage while doing some writing. It turns out it's because we were doing a short write on a not uptodate page, which means we'd fall back at one page at a time and fault the page in. The problem is our position is on the page boundary, so our fault in logic wasn't actually reading the page, so we'd just spin forever or until the page got read in by somebody else. This will force a readpage if we end up doing a short copy. Alexandre could reproduce this easily with ceph and reports it fixes his problem. I also wrote a reproducer that no longer hangs my box with this patch. Thanks, Reported-and-tested-by: Alexandre Oliva <aoliva@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-09-29btrfs: integrating raid-repair and scrub-fixup-nodatasumJan Schmidt
This ties nodatasum fixup in scrub together with raid repair patches. While both series are working fine alone, scrub will report uncorrectable errors if they occur in a nodatasum extent *and* the page is in the page cache. Previously, we would have triggered readpage to find good data and do the repair. However, readpage wouldn't read anything in the case where the page is up to date in the cache. So, we simply take that good data we have and call repair_io_failure directly (unless the page in the cache is dirty). Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs: Moved repair code from inode.c to extent_io.cJan Schmidt
The raid-retry code in inode.c can be generalized so that it works for metadata as well. Thus, this patch moves it to extent_io.c and makes the raid-retry code a raid-repair code. Repair works that way: Whenever a read error occurs and we have more mirrors to try, note the failed mirror, and retry another. If we find a good one, check if we did note a failure earlier and if so, do not allow the read to complete until after the bad sector was written with the good data we just fetched. As we have the extent locked while reading, no one can change the data in between. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs: Put mirror_num in bi_bdevJan Schmidt
The error correction code wants to make sure that only the bad mirror is rewritten. Thus, we need to know which mirror is the bad one. I did not find a more apropriate field than bi_bdev. But I think using this is fine, because it is modified by the block layer, anyway, and should not be read after the bio returned. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs: Do not use bio->bi_bdev after submissionJan Schmidt
The block layer modifies bio->bi_bdev and bio->bi_sector while working on the bio, they do _not_ come back unmodified in the completion callback. To call add_page, we need at least some bi_bdev set, which is why the code was working, previously. With this patch, we use the latest_bdev from fsinfo instead of the leftover in the bio. This gives us the possibility to use the bi_bdev field for another purpose. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs: btrfs_multi_bio replaced with btrfs_bioJan Schmidt
btrfs_bio is a bio abstraction able to split and not complete after the last bio has returned (like the old btrfs_multi_bio). Additionally, btrfs_bio tracks the mirror_num used to read data which can be used for error correction purposes. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs: new ioctls to do logical->inode and inode->path resolvingJan Schmidt
these ioctls make use of the new functions initially added for scrub. they return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and all paths belonging to an inode (BTRFS_IOC_INO_PATHS). Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs scrub: add fixup code for errors on nodatasum filesJan Schmidt
This removes a FIXME comment and introduces the first part of nodatasum fixup: It gets the corresponding inode for a logical address and triggers a regular readpage for the corrupted sector. Once we have on-the-fly error correction our error will be automatically corrected. The correction code is expected to clear the newly introduced EXTENT_DAMAGED flag, making scrub report that error as "corrected" instead of "uncorrectable" eventually. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs scrub: use int for mirror_num, not u64Jan Schmidt
the rest of the code uses int mirror_num, and so should scrub Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs: add mirror_num to extent_read_full_pageJan Schmidt
Currently, extent_read_full_page always assumes we are trying to read mirror 0, which generally is the best we can do. To add flexibility, pass it as a parameter. This will be needed by scrub fixup code. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs scrub: bugfix: mirror_num off by oneJan Schmidt
Fix the mirror_num determination in scrub_stripe. The rest of the scrub code did not use mirror_num for anything important and that error went unnoticed. The nodatasum fixup patch of this set depends on a correct mirror_num. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs scrub: print paths of corrupted filesJan Schmidt
While scrubbing, we may encounter various errors. Previously, a logical address was printed to the log only. Now, all paths belonging to that address are resolved and printed separately. That should work for hardlinks as well as reflinks. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs scrub: added unverified_errorsJan Schmidt
In normal operation, scrub is reading data sequentially in large portions. In case of an i/o error, we try to find the corrupted area(s) by issuing page sized read requests. With this commit we increment the unverified_errors counter if all of the small size requests succeed. Userland patches carrying such conspicous events to the administrator should already be around. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs: added helper functions to iterate backrefsJan Schmidt
These helper functions iterate back references and call a function for each backref. There is also a function to resolve an inode to a path in the file system. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-27CIFS: Don't free volume_info->UNC until we are entirely done with it.Jesper Juhl
In cleanup_volume_info_contents() we kfree(volume_info->UNC); and then proceed to use that variable on the very next line. This causes (at least) Coverity Prevent to complain about use-after-free of that variable (and I guess other checkers may do that as well). There's not any /real/ problem here since we are just using the value of the pointer, not actually dereferencing it, but it's still trivial to silence the tool, so why not? To me at least it also just seems nicer to defer freeing the variable until we are entirely done with it in all respects. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-27doc: fix broken referencesPaul Bolle
There are numerous broken references to Documentation files (in other Documentation files, in comments, etc.). These broken references are caused by typo's in the references, and by renames or removals of the Documentation files. Some broken references are simply odd. Fix these broken references, sometimes by dropping the irrelevant text they were part of. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-27vfs: remove LOOKUP_NO_AUTOMOUNT flagLinus Torvalds
That flag no longer makes sense, since we don't look up automount points as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT handling was buggy to begin with: it would avoid automounting even for cases where we really *needed* to do the automount handling, and could return ENOENT for autofs entries that hadn't been instantiated yet. With our new non-eager automount semantics, one discussion has been about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the newfstatat() and fstatat64() system calls), but it's probably not worth it: you can always force at least directory automounting by simply adding the final '/' to the filename, which works for *all* of the stat family system calls, old and new. So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a result of our bad default behavior. Acked-by: Ian Kent <raven@themaw.net> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-26VFS: Fix the remaining automounter semantics regressionsTrond Myklebust
The concensus seems to be that system calls such as stat() etc should not trigger an automount. Neither should the l* versions. This patch therefore adds a LOOKUP_AUTOMOUNT flag to tag those lookups that _should_ trigger an automount on the last path element. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [ Edited to leave out the cases that are already covered by LOOKUP_OPEN, LOOKUP_DIRECTORY and LOOKUP_CREATE - all of which also fundamentally force automounting for their own reasons - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-26vfs pathname lookup: Add LOOKUP_AUTOMOUNT flagLinus Torvalds
Since we've now turned around and made LOOKUP_FOLLOW *not* force an automount, we want to add the ability to force an automount event on lookup even if we don't happen to have one of the other flags that force it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..) Most cases will never want to use this, since you'd normally want to delay automounting as long as possible, which usually implies LOOKUP_OPEN (when we open a file or directory, we really cannot avoid the automount any more). But Trond argued sufficiently forcefully that at a minimum bind mounting a file and quotactl will want to force the automount lookup. Some other cases (like nfs_follow_remote_path()) could use it too, although LOOKUP_DIRECTORY would work there as well. This commit just adds the flag and logic, no users yet, though. It also doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and was made irrelevant by the same change that made us not follow on LOOKUP_FOLLOW. Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Ian Kent <raven@themaw.net> Cc: Jeff Layton <jlayton@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-26sysfs: add unsigned long cast to prevent compile warningHeiko Carstens
"sysfs: use rb-tree for inode number lookup" added a new printk which causes a new compile warning on s390 (and few other architectures): fs/sysfs/dir.c: In function 'sysfs_link_sibling': fs/sysfs/dir.c:63:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'ino_t' [-Wform Add an explicit unsigned long cast since ino_t is an unsigned long on most architectures. Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-09-26nfsd4: look up stateid's per clientidJ. Bruce Fields
Use a separate stateid idr per client, and lookup a stateid by first finding the client, then looking up the stateid relative to that client. Also some minor refactoring. This allows us to improve error returns: we can return expired when the clientid is not found and bad_stateid when the clientid is found but not the stateid, as opposed to returning expired for both cases. I hope this will also help to replace the state lock mostly by a per-client lock, but that hasn't been done yet. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-26nfsd4: assume test_stateid always has sessionJ. Bruce Fields
Test_stateid is 4.1-only and only allowed after a sequence operation, so this check is unnecessary. Cc: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-26nfsd4: use idr for stateid'sJ. Bruce Fields
The idr system is designed exactly for generating id and looking up integer id's. Thanks to Trond for pointing it out. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-26nfsd4: move client * to nfs4_stateid, add init_stid helperJ. Bruce Fields
This will be convenient. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-21Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-block: floppy: use del_timer_sync() in init cleanup blk-cgroup: be able to remove the record of unplugged device block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request mm: Add comment explaining task state setting in bdi_forker_thread() mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread() block: simplify force plug flush code a little bit block: change force plug flush call order block: Fix queue_flag update when rq_affinity goes from 2 to 1 block: separate priority boosting from REQ_META block: remove READ_META and WRITE_META xen-blkback: fixed indentation and comments xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
2011-09-21teach /proc/$pid/numa_maps about transparent hugepagesDave Hansen
This is modeled after the smaps code. It detects transparent hugepages and then does a single gather_stats() for the page as a whole. This has two benifits: 1. It is more efficient since it does many pages in a single shot. 2. It does not have to break down the huge page. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-21break out numa_maps gather_pte_stats() checksDave Hansen
gather_pte_stats() does a number of checks on a target page to see whether it should even be considered for statistics. This breaks that code out in to a separate function so that we can use it in the transparent hugepage case in the next patch. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Christoph Lameter <cl@gentwo.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-21make /proc/$pid/numa_maps gather_stats() take variable page sizeDave Hansen
We need to teach the numa_maps code about transparent huge pages. The first step is to teach gather_stats() that the pte it is dealing with might represent more than one page. Note that will we use this in a moment for transparent huge pages since they have use a single pmd_t which _acts_ as a "surrogate" for a bunch of smaller pte_t's. I'm a _bit_ unhappy that this interface counts in hugetlbfs page sizes for hugetlbfs pages and PAGE_SIZE for normal pages. That means that to figure out how many _bytes_ "dirty=1" means, you must first know the hugetlbfs page size. That's easier said than done especially if you don't have visibility in to the mount. But, that's probably a discussion for another day especially since it would change behavior to fix it. But, just in case anyone wonders why this patch only passes a '1' in the hugetlb case... Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-21leases: split up generic_setlease into lock/unlock casesJ. Bruce Fields
Eventually we should probably do the same thing to the file operations as well. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-20Merge branch 'for-linus' of git://github.com/chrismason/linuxLinus Torvalds
* 'for-linus' of git://github.com/chrismason/linux: Btrfs: reserve sufficient space for ioctl clone
2011-09-20Merge branch 'btrfs-3.0' into for-linusChris Mason
2011-09-20Btrfs: reserve sufficient space for ioctl cloneSage Weil
Fix a crash/BUG_ON in the clone ioctl due to insufficient reservation. We need to reserve space for: - adjusting the old extent (possibly splitting it) - adding the new extent - updating the inode Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-09-20nfsd4: make op_cacheresult another flagJ. Bruce Fields
I'm not sure why I used a new field for this originally. Also, the differences between some of these flags are a little subtle; add some comments to explain. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-20nfsd4: fix open downgrade, againJ. Bruce Fields
Yet another open-management regression: - nfs4_file_downgrade() doesn't remove the BOTH access bit on downgrade, so the server's idea of the stateid's access gets out of sync with the client's. If we want to keep an O_RDWR open in this case, we should do that in the file_put_access logic rather than here. - We forgot to convert v4 access to an open mode here. This logic has proven too hard to get right. In the future we may consider: - reexamining the lock/openowner relationship (locks probably don't really need to take their own references here). - adding open upgrade/downgrade support to the vfs. - removing the atomic operations. They're redundant as long as this is all under some other lock. Also, maybe some kind of additional static checking would help catch O_/NFS4_SHARE_ACCESS confusion. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-19cifs: Fix broken sec=ntlmv2/i sec option (try #2)Shirish Pargaonkar
Fix sec=ntlmv2/i authentication option during mount of Samba shares. cifs client was coding ntlmv2 response incorrectly. All that is needed in temp as specified in MS-NLMP seciton 3.3.2 "Define ComputeResponse(NegFlg, ResponseKeyNT, ResponseKeyLM, CHALLENGE_MESSAGE.ServerChallenge, ClientChallenge, Time, ServerName) as Set temp to ConcatenationOf(Responserversion, HiResponserversion, Z(6), Time, ClientChallenge, Z(4), ServerName, Z(4)" is MsvAvNbDomainName. For sec=ntlmsspi, build_av_pair is not used, a blob is plucked from type 2 response sent by the server to use in authentication. I tested sec=ntlmv2/i and sec=ntlmssp/i mount options against Samba (3.6) and Windows - XP, 2003 Server and 7. They all worked. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-09-19Fix the conflict between rwpidforward and rw mount optionsSteve French
Both these options are started with "rw" - that's why the first one isn't switched on even if it is specified. Fix this by adding a length check for "rw" option check. Cc: <stable@kernel.org> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-09-19CIFS: Fix ERR_PTR dereference in cifs_get_rootPavel Shilovsky
move it to the beginning of the loop. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-09-19cifs: fix possible memory corruption in CIFSFindNextJeff Layton
The name_len variable in CIFSFindNext is a signed int that gets set to the resume_name_len in the cifs_search_info. The resume_name_len however is unsigned and for some infolevels is populated directly from a 32 bit value sent by the server. If the server sends a very large value for this, then that value could look negative when converted to a signed int. That would make that value pass the PATH_MAX check later in CIFSFindNext. The name_len would then be used as a length value for a memcpy. It would then be treated as unsigned again, and the memcpy scribbles over a ton of memory. Fix this by making the name_len an unsigned value in CIFSFindNext. Cc: <stable@kernel.org> Reported-by: Darren Lavender <dcl@hppine99.gbr.hp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-09-19Merge branch 'for-linus' of git://github.com/chrismason/linuxLinus Torvalds
* 'for-linus' of git://github.com/chrismason/linux: Btrfs: only clear the need lookup flag after the dentry is setup BTRFS: Fix lseek return value for error Btrfs: don't change inode flag of the dest clone file Btrfs: don't make a file partly checksummed through file clone Btrfs: fix pages truncation in btrfs_ioctl_clone() btrfs: fix d_off in the first dirent
2011-09-19nfsd4: hash closed stateid's like any otherJ. Bruce Fields
Look up closed stateid's in the stateid hash like any other stateid rather than searching the close lru. This is simpler, and fixes a bug: currently we handle only the case of a close that is the last close for a given stateowner, but not the case of a close for a stateowner that still has active opens on other files. Thus in a case like: open(owner, file1) open(owner, file2) close(owner, file2) close(owner, file2) the final close won't be recognized as a retransmission. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-19nfsd4: construct stateid from clientid and counterJ. Bruce Fields
Including the full clientid in the on-the-wire stateid allows more reliable detection of bad vs. expired stateid's, simplifies code, and ensures we won't reuse the opaque part of the stateid (as we currently do when the same openowner closes and reopens the same file). Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-18Btrfs: only clear the need lookup flag after the dentry is setupJosef Bacik
We can race with readdir and the RCU path walking stuff. This is because we clear the need lookup flag before actually instantiating the inode. This will lead the RCU path walk stuff to find a dentry it thinks is valid without a d_inode attached. So instead unhash the dentry when we first start the lookup, and then clear the flag after we've instantiated the dentry so we're garunteed to either try the slow lookup, or have the d_inode set properly. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-09-18BTRFS: Fix lseek return value for errorJeff Liu
The recent reworking of btrfs' lseek lead to incorrect values being returned. This adds checks for seeking beyond EOF in SEEK_HOLE and makes sure the error values come back correct. Andi Kleen also sent in similar patches. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-09-18Merge branch 'btrfs-3.0' into for-linusChris Mason