linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2011-09-03	nfsd4: move double-confirm test to open_confirm	J. Bruce Fields
	I don't see the point of having this check in nfs4_preprocess_seqid_op() when it's only needed by the one caller. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-02	nfsd4: simplify check_open logic	J. Bruce Fields
	Sometimes the single-exit style is good, sometimes it's unnecessarily convoluted.... Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-02	nfsd4: share common seqid checks	J. Bruce Fields
	Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-02	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs	Linus Torvalds
	* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix ->write_inode return values xfs: fix xfs_mark_inode_dirty during umount xfs: deprecate the nodelaylog mount option
2011-09-01	nfsd4: eliminate unused lt_stateowner	J. Bruce Fields
	This is used only as a local variable. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-01	nfsd4: drop most stateowner refcounting	J. Bruce Fields
	Maybe we'll bring it back some day, but we don't have much real use for it now. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-01	xfs: fix ->write_inode return values	Christoph Hellwig
	Currently we always redirty an inode that was attempted to be written out synchronously but has been cleaned by an AIL pushed internall, which is rather bogus. Fix that by doing the i_update_core check early on and return 0 for it. Also include async calls for it, as doing any work for those is just as pointless. While we're at it also fix the sign for the EIO return in case of a filesystem shutdown, and fix the completely non-sensical locking around xfs_log_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com> (cherry picked from commit 297db93bb74cf687510313eb235a7aec14d67e97) Signed-off-by: Alex Elder <aelder@sgi.com>
2011-09-01	nfsd4: eliminate impossible open replay case	J. Bruce Fields
	If open fails with any error other than nfserr_replay_me, then the main nfsd4_proc_compound() loop continues unconditionally to nfsd4_encode_operation(), which will always call encode_seqid_op_tail. Thus the condition we check for here does not occur. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-01	nfsd4: extend state lock over seqid replay logic	J. Bruce Fields
	There are currently a couple races in the seqid replay code: a retransmission could come while we're still encoding the original reply, or a new seqid-mutating call could come as we're encoding a replay. So, extend the state lock over the encoding (both encoding of a replayed reply and caching of the original encoded reply). I really hate doing this, and previously added the stateowner reference-counting code to avoid it (which was insufficient)--but I don't see a less complicated alternative at the moment. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31	xfs: fix xfs_mark_inode_dirty during umount	Christoph Hellwig
	During umount we do not add a dirty inode to the lru and wait for it to become clean first, but force writeback of data and metadata with I_WILL_FREE set. Currently there is no way for XFS to detect that the inode has been redirtied for metadata operations, as we skip the mark_inode_dirty call during teardown. Fix this by setting i_update_core nanually in that case, so that the inode gets flushed during inode reclaim. Alternatively we could enable calling mark_inode_dirty for inodes in I_WILL_FREE state, and let the VFS dirty tracking handle this. I decided against this as we will get better I/O patterns from reclaim compared to the synchronous writeout in write_inode_now, and always marking the inode dirty in some way from xfs_mark_inode_dirty is a better safetly net in either case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com> (cherry picked from commit da6742a5a4cc844a9982fdd936ddb537c0747856) Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-31	Merge tag 'for_linus-20110831' of git://github.com/tytso/ext4	Linus Torvalds
	* tag 'for_linus-20110831' of git://github.com/tytso/ext4: ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining
2011-08-31	nfsd4: cleanup seqid op stateowner usage	J. Bruce Fields
	Now that the replay owner is in the cstate we can remove it from a lot of other individual operations and further simplify nfs4_preprocess_seqid_op(). Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31	nfsd4: centralize handling of replay owners	J. Bruce Fields
	Set the stateowner associated with a replay in one spot in nfs4_preprocess_seqid_op() and keep it in cstate. This allows removing a few lines of boilerplate from all the nfs4_preprocess_seqid_op() callers. Also turn ENCODE_SEQID_OP_TAIL into a function while we're here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31	nfsd4: make delegation stateid's seqid start at 1	J. Bruce Fields
	Thanks to Casey for reminding me that 5661 gives a special meaning to a value of 0 in the stateid's seqid field, so all stateid's should start out with si_generation 1. We were doing that in the open and lock cases for minorversion 1, but not for the delegation stateid, and not for openstateid's with v4.0. It doesn't really matter much for v4.0 or for delegation stateid's (which never get the seqid field incremented), but we may as well do the same for all of them. Reported-by: Casey Bodley <cbodley@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31	nfsd4: simplify stateid generation code, fix wraparound	J. Bruce Fields
	Follow the recommendation from rfc3530bis for stateid generation number wraparound, simplify some code, and fix or remove incorrect comments. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31	nfsd4: consolidate lock & open stateid tables	J. Bruce Fields
	There's no reason to have two separate hash tables for open and lock stateid's. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31	nfsd4: simplify distinguishing lock & open stateid's	J. Bruce Fields
	The trick free_stateid is using is a little cheesy, and we'll have more uses for this field later. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31	nfsd4: remove typoed replay field	J. Bruce Fields
	Wow, I wonder how long that typo's been there. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31	nfsd4: fix off-by-one-error in SEQUENCE reply	J. Bruce Fields
	The values here represent highest slotid numbers. Since slotid's are numbered starting from zero, the highest should be one less than the number of slots. Reported-by: Rick Macklem <rmacklem@uoguelph.ca> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31	ext4: call ext4_handle_dirty_metadata with correct inode in ext4_dx_add_entry	Theodore Ts'o
	ext4_dx_add_entry manipulates bh2 and frames[0].bh, which are two buffer_heads that point to directory blocks assigned to the directory inode. However, the function calls ext4_handle_dirty_metadata with the inode of the file that's being added to the directory, not the directory inode itself. Therefore, correct the code to dirty the directory buffers with the directory inode, not the file inode. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-31	ext4: ext4_mkdir should dirty dir_block with newly created directory inode	Darrick J. Wong
	ext4_mkdir calls ext4_handle_dirty_metadata with dir_block and the inode "dir". Unfortunately, dir_block belongs to the newly created directory (which is "inode"), not the parent directory (which is "dir"). Fix the incorrect association. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-31	ext4: ext4_rename should dirty dir_bh with the correct directory	Darrick J. Wong
	When ext4_rename performs a directory rename (move), dir_bh is a buffer that is modified to update the '..' link in the directory being moved (old_inode). However, ext4_handle_dirty_metadata is called with the old parent directory inode (old_dir) and dir_bh, which is incorrect because dir_bh does not belong to the parent inode. Fix this error. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-31	ext4: fake direct I/O mode for data=journal	Theodore Ts'o
	Currently attempts to open a file with O_DIRECT in data=journal mode causes the open to fail with -EINVAL. This makes it very hard to test data=journal mode. So we will let the open succeed, but then always fall back to O_DSYNC buffered writes. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-31	ext2,ext3,ext4: don't inherit APPEND_FL or IMMUTABLE_FL for new inodes	Theodore Ts'o
	This doesn't make much sense, and it exposes a bug in the kernel where attempts to create a new file in an append-only directory using O_CREAT will fail (but still leave a zero-length file). This was discovered when xfstests #79 was generalized so it could run on all file systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc:stable@kernel.org
2011-08-31	ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining	Jiaying Zhang
	The i_mutex lock and flush_completed_IO() added by commit 2581fdc810 in ext4_evict_inode() causes lockdep complaining about potential deadlock in several places. In most/all of these LOCKDEP complaints it looks like it's a false positive, since many of the potential circular locking cases can't take place by the time the ext4_evict_inode() is called; but since at the very least it may mask real problems, we need to address this. This change removes the flush_completed_IO() and i_mutex lock in ext4_evict_inode(). Instead, we take a different approach to resolve the software lockup that commit 2581fdc810 intends to fix. Rather than having ext4-dio-unwritten thread wait for grabing the i_mutex lock of an inode, we use mutex_trylock() instead, and simply requeue the work item if we fail to grab the inode's i_mutex lock. This should speed up work queue processing in general and also prevents the following deadlock scenario: During page fault, shrink_icache_memory is called that in turn evicts another inode B. Inode B has some pending io_end work so it calls ext4_ioend_wait() that waits for inode B's i_ioend_count to become zero. However, inode B's ioend work was queued behind some of inode A's ioend work on the same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten thread on that cpu is processing inode A's ioend work, it tries to grab inode A's i_mutex lock. Since the i_mutex lock of inode A is still hold before the page fault happened, we enter a deadlock. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-31	nfsd: remove include/linux/nfsd/syscall.h	J. Bruce Fields
	We don't need this any more. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-30	ext2: fix the outdated comment in ext2_nfs_get_inode()	Li Haifeng
	Signed-off-by: Li Haifeng <omycle@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2011-08-27	nfsd4: remove redundant is_open_owner check	J. Bruce Fields
	When called with OPEN_STATE, preprocess_seqid_op only returns an open stateid, hence only an open owner. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: get lock checks out of preprocess_seqid_op	J. Bruce Fields
	We've got some lock-specific code here in nfs4_preprocess_seqid_op which is only used by nfsd4_lock(). Move it to the caller. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: simplify lock openmode check	J. Bruce Fields
	Note that the special handling for the lock stateid case is already done by nfs4_check_openmode() (as of 02921914170e3b7fea1cd82dac9713685d2de5e2 "nfsd4: fix openmode checking on IO using lock stateid") so we no longer need these two cases in the caller. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: cleanup and consolidate seqid_mutating_err	J. Bruce Fields
	Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: remove HAS_SESSION	J. Bruce Fields
	This flag doesn't really buy us anything. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: cleanup lock/stateowner initialization	J. Bruce Fields
	Share some common code, stop doing silly things like initializing a list head immediately before adding it to a list, etc. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: name openowner data structures more clearly	J. Bruce Fields
	These appear to be generic (for both open and lock owners), but they're actually just for open owners. This has confused me more than once. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: replace some macros by functions	J. Bruce Fields
	For all the usual reasons. (Type safety, readability.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: stop using nfserr_resource for transitory errors	J. Bruce Fields
	The server is returning nfserr_resource for both permanent errors and for errors (like allocation failures) that might be resolved by retrying later. Save nfserr_resource for the former and use delay/jukebox for the latter. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: fix failure to end nfsd4 grace period	Boaz Harrosh
	Even if we fail to write a recovery record, we should still mark the client as having acquired its first state. Otherwise we leave 4.1 clients with indefinite ERR_GRACE returns. However, an inability to write stable storage records may cause failures of reboot recovery, and the problem should still be brought to the server administrator's attention. So, make sure the error is logged. These errors shouldn't normally be triggered on a corectly functioning server--this isn't a case where a misconfigured client could spam the logs. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: simplify recovery dir setting	J. Bruce Fields
	Move around some of this code, simplify a bit. Reviewed-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd: prettify NFSD_MAY_* flag definitions	J. Bruce Fields
	Acked-by: Jim Rees <rees@umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: permit read opens of executable-only files	J. Bruce Fields
	A client that wants to execute a file must be able to read it. Read opens over nfs are therefore implicitly allowed for executable files even when those files are not readable. NFSv2/v3 get this right by using a passed-in NFSD_MAY_OWNER_OVERRIDE on read requests, but NFSv4 has gotten this wrong ever since dc730e173785e29b297aa605786c94adaffe2544 "nfsd4: fix owner-override on open", when we realized that the file owner shouldn't override permissions on non-reclaim NFSv4 opens. So we can't use NFSD_MAY_OWNER_OVERRIDE to tell nfsd_permission to allow reads of executable files. So, do the same thing we do whenever we encounter another weird NFS permission nit: define yet another NFSD_MAY_* flag. The industry's future standardization on 128-bit processors will be motivated primarily by the need for integers with enough bits for all the NFSD_MAY_* flags. Reported-by: Leonardo Borda <leonardoborda@gmail.com> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	Remove include/linux/nfsd/const.h	J. Bruce Fields
	Userspace shouldn't have a use for these constants. Nothing here is used outside fs/nfsd. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	nfsd4: it's OK to return nfserr_symlink	J. Bruce Fields
	The nfsd4 code has a bunch of special exceptions for error returns which map nfserr_symlink to other errors. In fact, the spec makes it clear that nfserr_symlink is to be preferred over less specific errors where possible. The patch that introduced it back in 2.6.4 is "kNFSd: correct symlink related error returns.", which claims that these special exceptions are represent an NFSv4 break from v2/v3 tradition--when in fact the symlink error was introduced with v4. I suspect what happened was pynfs tests were written that were overly faithful to the (known-incomplete) rfc3530 error return lists, and then code was fixed up mindlessly to make the tests pass. Delete these unnecessary exceptions. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	nfsd4: fix incorrect comment in nfsd4_set_nfs4_acl	J. Bruce Fields
	Zero means "I don't care what kind of file this is". And that's probably what we want--acls are also settable at least on directories, and if the filesystem doesn't want them on other objects, leave it to it to complain. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	nfsd: clean up nfsd_mode_check()	J. Bruce Fields
	Add some more comments, simplify logic, do & S_IFMT just once, name "type" more helpfully. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	nfsd: open-code special directory-hardlink check	J. Bruce Fields
	We allow the fh_verify caller to specify that any object except those of a given type is allowed, by passing a negative type. But only one caller actually uses it. Open-code that check in the one caller. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	nfsd4: clean up S_IS -> NF4 file type mapping	J. Bruce Fields
	A slightly unconventional approach to make the code more compact I could live with, but let's give the poor reader some chance. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	All Arch: remove linkage for sys_nfsservctl system call	NeilBrown
	The nfsservctl system call is now gone, so we should remove all linkage for it. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-26	UBIFS: fix the dark space calculation	srimugunthan dhandapani
	The dark space calculation should be 64 bit type-casted, when assigning to tmp64 (similar to how total_free is calculated). Overflow will occur for very large flashes. Signed-off-by: srimugunthan <srimugunthan.dhandapani@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
2011-08-25	lockdep: Add helper function for dir vs file i_mutex annotation	Josh Boyer
	Purely in-memory filesystems do not use the inode hash as the dcache tells us if an entry already exists. As a result, they do not call unlock_new_inode, and thus directory inodes do not get put into a different lockdep class for i_sem. We need the different lockdep classes, because the locking order for i_mutex is different for directory inodes and regular inodes. Directory inodes can do "readdir()", which takes i_mutex before possibly taking mm->mmap_sem (due to a page fault while copying the directory entry to user space). In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem before accessing i_mutex. The two cases can never happen for the same inode, so no real deadlock can occur, but without the different lockdep classes, lockdep cannot understand that. As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this can lead to false positives from lockdep like below: find/645 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac but task is already holding lock: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>] vfs_readdir+0x5b/0xb4 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}: [<ffffffff8108ac26>] lock_acquire+0xbf/0x103 [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361 [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45 [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110 [<ffffffff81111557>] mmap_region+0x258/0x432 [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306 [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a [<ffffffff8100c858>] sys_mmap+0x22/0x24 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b -> #0 (&mm->mmap_sem){++++++}: [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7 [<ffffffff8108ac26>] lock_acquire+0xbf/0x103 [<ffffffff81109541>] might_fault+0x89/0xac [<ffffffff81149cff>] filldir+0x6f/0xc7 [<ffffffff811586ea>] dcache_readdir+0x67/0x205 [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4 [<ffffffff8114a073>] sys_getdents+0x7e/0xd1 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b This patch moves the directory vs file lockdep annotation into a helper function that can be called by in-memory filesystems and has hugetlbfs call it. Signed-off-by: Josh Boyer <jwboyer@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25	xfs: deprecate the nodelaylog mount option	Christoph Hellwig
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>