summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-08-31Merge tag 'for_linus-20110831' of git://github.com/tytso/ext4Linus Torvalds
* tag 'for_linus-20110831' of git://github.com/tytso/ext4: ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining
2011-08-31nfsd4: cleanup seqid op stateowner usageJ. Bruce Fields
Now that the replay owner is in the cstate we can remove it from a lot of other individual operations and further simplify nfs4_preprocess_seqid_op(). Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31nfsd4: centralize handling of replay ownersJ. Bruce Fields
Set the stateowner associated with a replay in one spot in nfs4_preprocess_seqid_op() and keep it in cstate. This allows removing a few lines of boilerplate from all the nfs4_preprocess_seqid_op() callers. Also turn ENCODE_SEQID_OP_TAIL into a function while we're here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31nfsd4: make delegation stateid's seqid start at 1J. Bruce Fields
Thanks to Casey for reminding me that 5661 gives a special meaning to a value of 0 in the stateid's seqid field, so all stateid's should start out with si_generation 1. We were doing that in the open and lock cases for minorversion 1, but not for the delegation stateid, and not for openstateid's with v4.0. It doesn't *really* matter much for v4.0 or for delegation stateid's (which never get the seqid field incremented), but we may as well do the same for all of them. Reported-by: Casey Bodley <cbodley@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31nfsd4: simplify stateid generation code, fix wraparoundJ. Bruce Fields
Follow the recommendation from rfc3530bis for stateid generation number wraparound, simplify some code, and fix or remove incorrect comments. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31nfsd4: consolidate lock & open stateid tablesJ. Bruce Fields
There's no reason to have two separate hash tables for open and lock stateid's. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31nfsd4: simplify distinguishing lock & open stateid'sJ. Bruce Fields
The trick free_stateid is using is a little cheesy, and we'll have more uses for this field later. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31nfsd4: remove typoed replay fieldJ. Bruce Fields
Wow, I wonder how long that typo's been there. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31nfsd4: fix off-by-one-error in SEQUENCE replyJ. Bruce Fields
The values here represent highest slotid numbers. Since slotid's are numbered starting from zero, the highest should be one less than the number of slots. Reported-by: Rick Macklem <rmacklem@uoguelph.ca> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31ext4: call ext4_handle_dirty_metadata with correct inode in ext4_dx_add_entryTheodore Ts'o
ext4_dx_add_entry manipulates bh2 and frames[0].bh, which are two buffer_heads that point to directory blocks assigned to the directory inode. However, the function calls ext4_handle_dirty_metadata with the inode of the file that's being added to the directory, not the directory inode itself. Therefore, correct the code to dirty the directory buffers with the directory inode, not the file inode. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-31ext4: ext4_mkdir should dirty dir_block with newly created directory inodeDarrick J. Wong
ext4_mkdir calls ext4_handle_dirty_metadata with dir_block and the inode "dir". Unfortunately, dir_block belongs to the newly created directory (which is "inode"), not the parent directory (which is "dir"). Fix the incorrect association. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-31ext4: ext4_rename should dirty dir_bh with the correct directoryDarrick J. Wong
When ext4_rename performs a directory rename (move), dir_bh is a buffer that is modified to update the '..' link in the directory being moved (old_inode). However, ext4_handle_dirty_metadata is called with the old parent directory inode (old_dir) and dir_bh, which is incorrect because dir_bh does not belong to the parent inode. Fix this error. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-31ext4: fake direct I/O mode for data=journalTheodore Ts'o
Currently attempts to open a file with O_DIRECT in data=journal mode causes the open to fail with -EINVAL. This makes it very hard to test data=journal mode. So we will let the open succeed, but then always fall back to O_DSYNC buffered writes. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-31ext2,ext3,ext4: don't inherit APPEND_FL or IMMUTABLE_FL for new inodesTheodore Ts'o
This doesn't make much sense, and it exposes a bug in the kernel where attempts to create a new file in an append-only directory using O_CREAT will fail (but still leave a zero-length file). This was discovered when xfstests #79 was generalized so it could run on all file systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc:stable@kernel.org
2011-08-31ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complainingJiaying Zhang
The i_mutex lock and flush_completed_IO() added by commit 2581fdc810 in ext4_evict_inode() causes lockdep complaining about potential deadlock in several places. In most/all of these LOCKDEP complaints it looks like it's a false positive, since many of the potential circular locking cases can't take place by the time the ext4_evict_inode() is called; but since at the very least it may mask real problems, we need to address this. This change removes the flush_completed_IO() and i_mutex lock in ext4_evict_inode(). Instead, we take a different approach to resolve the software lockup that commit 2581fdc810 intends to fix. Rather than having ext4-dio-unwritten thread wait for grabing the i_mutex lock of an inode, we use mutex_trylock() instead, and simply requeue the work item if we fail to grab the inode's i_mutex lock. This should speed up work queue processing in general and also prevents the following deadlock scenario: During page fault, shrink_icache_memory is called that in turn evicts another inode B. Inode B has some pending io_end work so it calls ext4_ioend_wait() that waits for inode B's i_ioend_count to become zero. However, inode B's ioend work was queued behind some of inode A's ioend work on the same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten thread on that cpu is processing inode A's ioend work, it tries to grab inode A's i_mutex lock. Since the i_mutex lock of inode A is still hold before the page fault happened, we enter a deadlock. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-31nfsd: remove include/linux/nfsd/syscall.hJ. Bruce Fields
We don't need this any more. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-30ext2: fix the outdated comment in ext2_nfs_get_inode()Li Haifeng
Signed-off-by: Li Haifeng <omycle@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2011-08-27nfsd4: remove redundant is_open_owner checkJ. Bruce Fields
When called with OPEN_STATE, preprocess_seqid_op only returns an open stateid, hence only an open owner. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: get lock checks out of preprocess_seqid_opJ. Bruce Fields
We've got some lock-specific code here in nfs4_preprocess_seqid_op which is only used by nfsd4_lock(). Move it to the caller. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: simplify lock openmode checkJ. Bruce Fields
Note that the special handling for the lock stateid case is already done by nfs4_check_openmode() (as of 02921914170e3b7fea1cd82dac9713685d2de5e2 "nfsd4: fix openmode checking on IO using lock stateid") so we no longer need these two cases in the caller. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: cleanup and consolidate seqid_mutating_errJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: remove HAS_SESSIONJ. Bruce Fields
This flag doesn't really buy us anything. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: cleanup lock/stateowner initializationJ. Bruce Fields
Share some common code, stop doing silly things like initializing a list head immediately before adding it to a list, etc. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: name openowner data structures more clearlyJ. Bruce Fields
These appear to be generic (for both open and lock owners), but they're actually just for open owners. This has confused me more than once. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: replace some macros by functionsJ. Bruce Fields
For all the usual reasons. (Type safety, readability.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: stop using nfserr_resource for transitory errorsJ. Bruce Fields
The server is returning nfserr_resource for both permanent errors and for errors (like allocation failures) that might be resolved by retrying later. Save nfserr_resource for the former and use delay/jukebox for the latter. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: fix failure to end nfsd4 grace periodBoaz Harrosh
Even if we fail to write a recovery record, we should still mark the client as having acquired its first state. Otherwise we leave 4.1 clients with indefinite ERR_GRACE returns. However, an inability to write stable storage records may cause failures of reboot recovery, and the problem should still be brought to the server administrator's attention. So, make sure the error is logged. These errors shouldn't normally be triggered on a corectly functioning server--this isn't a case where a misconfigured client could spam the logs. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: simplify recovery dir settingJ. Bruce Fields
Move around some of this code, simplify a bit. Reviewed-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd: prettify NFSD_MAY_* flag definitionsJ. Bruce Fields
Acked-by: Jim Rees <rees@umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: permit read opens of executable-only filesJ. Bruce Fields
A client that wants to execute a file must be able to read it. Read opens over nfs are therefore implicitly allowed for executable files even when those files are not readable. NFSv2/v3 get this right by using a passed-in NFSD_MAY_OWNER_OVERRIDE on read requests, but NFSv4 has gotten this wrong ever since dc730e173785e29b297aa605786c94adaffe2544 "nfsd4: fix owner-override on open", when we realized that the file owner shouldn't override permissions on non-reclaim NFSv4 opens. So we can't use NFSD_MAY_OWNER_OVERRIDE to tell nfsd_permission to allow reads of executable files. So, do the same thing we do whenever we encounter another weird NFS permission nit: define yet another NFSD_MAY_* flag. The industry's future standardization on 128-bit processors will be motivated primarily by the need for integers with enough bits for all the NFSD_MAY_* flags. Reported-by: Leonardo Borda <leonardoborda@gmail.com> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26Remove include/linux/nfsd/const.hJ. Bruce Fields
Userspace shouldn't have a use for these constants. Nothing here is used outside fs/nfsd. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26nfsd4: it's OK to return nfserr_symlinkJ. Bruce Fields
The nfsd4 code has a bunch of special exceptions for error returns which map nfserr_symlink to other errors. In fact, the spec makes it clear that nfserr_symlink is to be preferred over less specific errors where possible. The patch that introduced it back in 2.6.4 is "kNFSd: correct symlink related error returns.", which claims that these special exceptions are represent an NFSv4 break from v2/v3 tradition--when in fact the symlink error was introduced with v4. I suspect what happened was pynfs tests were written that were overly faithful to the (known-incomplete) rfc3530 error return lists, and then code was fixed up mindlessly to make the tests pass. Delete these unnecessary exceptions. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26nfsd4: fix incorrect comment in nfsd4_set_nfs4_aclJ. Bruce Fields
Zero means "I don't care what kind of file this is". And that's probably what we want--acls are also settable at least on directories, and if the filesystem doesn't want them on other objects, leave it to it to complain. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26nfsd: clean up nfsd_mode_check()J. Bruce Fields
Add some more comments, simplify logic, do & S_IFMT just once, name "type" more helpfully. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26nfsd: open-code special directory-hardlink checkJ. Bruce Fields
We allow the fh_verify caller to specify that any object *except* those of a given type is allowed, by passing a negative type. But only one caller actually uses it. Open-code that check in the one caller. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26nfsd4: clean up S_IS -> NF4 file type mappingJ. Bruce Fields
A slightly unconventional approach to make the code more compact I could live with, but let's give the poor reader *some* chance. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26All Arch: remove linkage for sys_nfsservctl system callNeilBrown
The nfsservctl system call is now gone, so we should remove all linkage for it. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-26UBIFS: fix the dark space calculationsrimugunthan dhandapani
The dark space calculation should be 64 bit type-casted, when assigning to tmp64 (similar to how total_free is calculated). Overflow will occur for very large flashes. Signed-off-by: srimugunthan <srimugunthan.dhandapani@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
2011-08-25lockdep: Add helper function for dir vs file i_mutex annotationJosh Boyer
Purely in-memory filesystems do not use the inode hash as the dcache tells us if an entry already exists. As a result, they do not call unlock_new_inode, and thus directory inodes do not get put into a different lockdep class for i_sem. We need the different lockdep classes, because the locking order for i_mutex is different for directory inodes and regular inodes. Directory inodes can do "readdir()", which takes i_mutex *before* possibly taking mm->mmap_sem (due to a page fault while copying the directory entry to user space). In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem before accessing i_mutex. The two cases can never happen for the same inode, so no real deadlock can occur, but without the different lockdep classes, lockdep cannot understand that. As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this can lead to false positives from lockdep like below: find/645 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac but task is already holding lock: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>] vfs_readdir+0x5b/0xb4 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}: [<ffffffff8108ac26>] lock_acquire+0xbf/0x103 [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361 [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45 [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110 [<ffffffff81111557>] mmap_region+0x258/0x432 [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306 [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a [<ffffffff8100c858>] sys_mmap+0x22/0x24 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b -> #0 (&mm->mmap_sem){++++++}: [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7 [<ffffffff8108ac26>] lock_acquire+0xbf/0x103 [<ffffffff81109541>] might_fault+0x89/0xac [<ffffffff81149cff>] filldir+0x6f/0xc7 [<ffffffff811586ea>] dcache_readdir+0x67/0x205 [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4 [<ffffffff8114a073>] sys_getdents+0x7e/0xd1 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b This patch moves the directory vs file lockdep annotation into a helper function that can be called by in-memory filesystems and has hugetlbfs call it. Signed-off-by: Josh Boyer <jwboyer@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25xfs: deprecate the nodelaylog mount optionChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-24NFSv4: renewd needs to be able to handle the NFS4ERR_CB_PATH_DOWN errorTrond Myklebust
The NFSv4 spec does not specify that the server must repeat that error, so in order to avoid having the delegations revoked, we should handle it immediately. Also note that NFS4ERR_CB_PATH_DOWN does in fact renew the lease... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegationTrond Myklebust
RFC3530 states that if the client holds a delegation, then it is obliged to continue to send RENEW calls once every lease period in order to allow the server to return NFS4ERR_CB_PATH_DOWN if the callback path is unreachable. This is not required for NFSv4.1, since the server can at any time set the SEQ4_STATUS_CB_PATH_DOWN_SESSION in any SEQUENCE operation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24NFSv4: nfs4_proc_renew should be declared staticTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24NFSv4: nfs4_proc_async_renew should use a GFP_NOFS allocationTrond Myklebust
We shouldn't allow the renew daemon to do direct reclaim on the NFS partition. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message fuse: mark pages accessed when written to fuse: delete dead .write_begin and .write_end aops fuse: fix flock fuse: fix non-ANSI void function notation
2011-08-24fuse: check size of FUSE_NOTIFY_INVAL_ENTRY messageMiklos Szeredi
FUSE_NOTIFY_INVAL_ENTRY didn't check the length of the write so the message processing could overrun and result in a "kernel BUG at fs/fuse/dev.c:629!" Reported-by: Han-Wen Nienhuys <hanwenn@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org
2011-08-23Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix tracing builds inside the source tree xfs: remove subdirectories xfs: don't expect xfs headers to be in subdirectories
2011-08-23block: add GENHD_FL_NO_PART_SCANTejun Heo
There are cases where suppressing partition scan is useful - e.g. for lo devices and pseudo SATA devices which advertise to be a disk but get upset on partition scan (some port multiplier control devices show such behavior). This patch adds GENHD_FL_NO_PART_SCAN which suppresses partition scan regardless of the number of possible partitions. disk_partitionable() is renamed to disk_part_scan_enabled() as suppressing partition scan doesn't imply the device can't be partitioned using BLKPG_ADD/DEL_PARTITION calls from userland. show_partition() now directly tests disk_max_parts() to maintain backward-compatibility. -v2: Updated to make it clear that only partition scan is suppressed not partitioning itself as suggested by Kay Sievers. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-23block: separate priority boosting from REQ_METAChristoph Hellwig
Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule, and lave REQ_META purely for marking requests as metadata in blktrace. All existing callers of REQ_META except for XFS are updated to also set REQ_PRIO for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-23block: remove READ_META and WRITE_METAChristoph Hellwig
Replace all occurnanced of the undocumented READ_META with READ | REQ_META and remove the unused WRITE_META define. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>