linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2011-08-27	nfsd4: simplify lock openmode check	J. Bruce Fields
	Note that the special handling for the lock stateid case is already done by nfs4_check_openmode() (as of 02921914170e3b7fea1cd82dac9713685d2de5e2 "nfsd4: fix openmode checking on IO using lock stateid") so we no longer need these two cases in the caller. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: cleanup and consolidate seqid_mutating_err	J. Bruce Fields
	Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: remove HAS_SESSION	J. Bruce Fields
	This flag doesn't really buy us anything. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: cleanup lock/stateowner initialization	J. Bruce Fields
	Share some common code, stop doing silly things like initializing a list head immediately before adding it to a list, etc. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: name openowner data structures more clearly	J. Bruce Fields
	These appear to be generic (for both open and lock owners), but they're actually just for open owners. This has confused me more than once. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: replace some macros by functions	J. Bruce Fields
	For all the usual reasons. (Type safety, readability.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: stop using nfserr_resource for transitory errors	J. Bruce Fields
	The server is returning nfserr_resource for both permanent errors and for errors (like allocation failures) that might be resolved by retrying later. Save nfserr_resource for the former and use delay/jukebox for the latter. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: fix failure to end nfsd4 grace period	Boaz Harrosh
	Even if we fail to write a recovery record, we should still mark the client as having acquired its first state. Otherwise we leave 4.1 clients with indefinite ERR_GRACE returns. However, an inability to write stable storage records may cause failures of reboot recovery, and the problem should still be brought to the server administrator's attention. So, make sure the error is logged. These errors shouldn't normally be triggered on a corectly functioning server--this isn't a case where a misconfigured client could spam the logs. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: simplify recovery dir setting	J. Bruce Fields
	Move around some of this code, simplify a bit. Reviewed-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd: prettify NFSD_MAY_* flag definitions	J. Bruce Fields
	Acked-by: Jim Rees <rees@umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27	nfsd4: permit read opens of executable-only files	J. Bruce Fields
	A client that wants to execute a file must be able to read it. Read opens over nfs are therefore implicitly allowed for executable files even when those files are not readable. NFSv2/v3 get this right by using a passed-in NFSD_MAY_OWNER_OVERRIDE on read requests, but NFSv4 has gotten this wrong ever since dc730e173785e29b297aa605786c94adaffe2544 "nfsd4: fix owner-override on open", when we realized that the file owner shouldn't override permissions on non-reclaim NFSv4 opens. So we can't use NFSD_MAY_OWNER_OVERRIDE to tell nfsd_permission to allow reads of executable files. So, do the same thing we do whenever we encounter another weird NFS permission nit: define yet another NFSD_MAY_* flag. The industry's future standardization on 128-bit processors will be motivated primarily by the need for integers with enough bits for all the NFSD_MAY_* flags. Reported-by: Leonardo Borda <leonardoborda@gmail.com> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	Remove include/linux/nfsd/const.h	J. Bruce Fields
	Userspace shouldn't have a use for these constants. Nothing here is used outside fs/nfsd. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	nfsd4: it's OK to return nfserr_symlink	J. Bruce Fields
	The nfsd4 code has a bunch of special exceptions for error returns which map nfserr_symlink to other errors. In fact, the spec makes it clear that nfserr_symlink is to be preferred over less specific errors where possible. The patch that introduced it back in 2.6.4 is "kNFSd: correct symlink related error returns.", which claims that these special exceptions are represent an NFSv4 break from v2/v3 tradition--when in fact the symlink error was introduced with v4. I suspect what happened was pynfs tests were written that were overly faithful to the (known-incomplete) rfc3530 error return lists, and then code was fixed up mindlessly to make the tests pass. Delete these unnecessary exceptions. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	nfsd4: fix incorrect comment in nfsd4_set_nfs4_acl	J. Bruce Fields
	Zero means "I don't care what kind of file this is". And that's probably what we want--acls are also settable at least on directories, and if the filesystem doesn't want them on other objects, leave it to it to complain. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	nfsd: clean up nfsd_mode_check()	J. Bruce Fields
	Add some more comments, simplify logic, do & S_IFMT just once, name "type" more helpfully. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	nfsd: open-code special directory-hardlink check	J. Bruce Fields
	We allow the fh_verify caller to specify that any object except those of a given type is allowed, by passing a negative type. But only one caller actually uses it. Open-code that check in the one caller. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	nfsd4: clean up S_IS -> NF4 file type mapping	J. Bruce Fields
	A slightly unconventional approach to make the code more compact I could live with, but let's give the poor reader some chance. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26	All Arch: remove linkage for sys_nfsservctl system call	NeilBrown
	The nfsservctl system call is now gone, so we should remove all linkage for it. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-26	UBIFS: fix the dark space calculation	srimugunthan dhandapani
	The dark space calculation should be 64 bit type-casted, when assigning to tmp64 (similar to how total_free is calculated). Overflow will occur for very large flashes. Signed-off-by: srimugunthan <srimugunthan.dhandapani@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
2011-08-25	lockdep: Add helper function for dir vs file i_mutex annotation	Josh Boyer
	Purely in-memory filesystems do not use the inode hash as the dcache tells us if an entry already exists. As a result, they do not call unlock_new_inode, and thus directory inodes do not get put into a different lockdep class for i_sem. We need the different lockdep classes, because the locking order for i_mutex is different for directory inodes and regular inodes. Directory inodes can do "readdir()", which takes i_mutex before possibly taking mm->mmap_sem (due to a page fault while copying the directory entry to user space). In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem before accessing i_mutex. The two cases can never happen for the same inode, so no real deadlock can occur, but without the different lockdep classes, lockdep cannot understand that. As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this can lead to false positives from lockdep like below: find/645 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac but task is already holding lock: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>] vfs_readdir+0x5b/0xb4 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}: [<ffffffff8108ac26>] lock_acquire+0xbf/0x103 [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361 [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45 [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110 [<ffffffff81111557>] mmap_region+0x258/0x432 [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306 [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a [<ffffffff8100c858>] sys_mmap+0x22/0x24 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b -> #0 (&mm->mmap_sem){++++++}: [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7 [<ffffffff8108ac26>] lock_acquire+0xbf/0x103 [<ffffffff81109541>] might_fault+0x89/0xac [<ffffffff81149cff>] filldir+0x6f/0xc7 [<ffffffff811586ea>] dcache_readdir+0x67/0x205 [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4 [<ffffffff8114a073>] sys_getdents+0x7e/0xd1 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b This patch moves the directory vs file lockdep annotation into a helper function that can be called by in-memory filesystems and has hugetlbfs call it. Signed-off-by: Josh Boyer <jwboyer@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25	xfs: deprecate the nodelaylog mount option	Christoph Hellwig
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-24	NFSv4: renewd needs to be able to handle the NFS4ERR_CB_PATH_DOWN error	Trond Myklebust
	The NFSv4 spec does not specify that the server must repeat that error, so in order to avoid having the delegations revoked, we should handle it immediately. Also note that NFS4ERR_CB_PATH_DOWN does in fact renew the lease... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24	NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation	Trond Myklebust
	RFC3530 states that if the client holds a delegation, then it is obliged to continue to send RENEW calls once every lease period in order to allow the server to return NFS4ERR_CB_PATH_DOWN if the callback path is unreachable. This is not required for NFSv4.1, since the server can at any time set the SEQ4_STATUS_CB_PATH_DOWN_SESSION in any SEQUENCE operation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24	NFSv4: nfs4_proc_renew should be declared static	Trond Myklebust
	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24	NFSv4: nfs4_proc_async_renew should use a GFP_NOFS allocation	Trond Myklebust
	We shouldn't allow the renew daemon to do direct reclaim on the NFS partition. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message fuse: mark pages accessed when written to fuse: delete dead .write_begin and .write_end aops fuse: fix flock fuse: fix non-ANSI void function notation
2011-08-24	fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message	Miklos Szeredi
	FUSE_NOTIFY_INVAL_ENTRY didn't check the length of the write so the message processing could overrun and result in a "kernel BUG at fs/fuse/dev.c:629!" Reported-by: Han-Wen Nienhuys <hanwenn@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org
2011-08-23	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs	Linus Torvalds
	* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix tracing builds inside the source tree xfs: remove subdirectories xfs: don't expect xfs headers to be in subdirectories
2011-08-23	block: add GENHD_FL_NO_PART_SCAN	Tejun Heo
	There are cases where suppressing partition scan is useful - e.g. for lo devices and pseudo SATA devices which advertise to be a disk but get upset on partition scan (some port multiplier control devices show such behavior). This patch adds GENHD_FL_NO_PART_SCAN which suppresses partition scan regardless of the number of possible partitions. disk_partitionable() is renamed to disk_part_scan_enabled() as suppressing partition scan doesn't imply the device can't be partitioned using BLKPG_ADD/DEL_PARTITION calls from userland. show_partition() now directly tests disk_max_parts() to maintain backward-compatibility. -v2: Updated to make it clear that only partition scan is suppressed not partitioning itself as suggested by Kay Sievers. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-23	block: separate priority boosting from REQ_META	Christoph Hellwig
	Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule, and lave REQ_META purely for marking requests as metadata in blktrace. All existing callers of REQ_META except for XFS are updated to also set REQ_PRIO for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-23	block: remove READ_META and WRITE_META	Christoph Hellwig
	Replace all occurnanced of the undocumented READ_META with READ \| REQ_META and remove the unused WRITE_META define. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-22	sysfs: use rb-tree for inode number lookup	Mikulas Patocka
	sysfs: use rb-tree for inode number lookup This patch makes sysfs use red-black tree for inode number lookup. Together with a previous patch to use red-black tree for name lookup, this patch makes all sysfs lookups to have O(log n) complexity. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-22	sysfs: remove s_sibling hacks	Mikulas Patocka
	sysfs: remove s_sibling hacks s_sibling was used for three different purposes: 1) as a linked list of entries in the directory 2) as a linked list of entries to be deleted 3) as a pointer to "struct completion" This patch removes the hack and introduces new union u which holds pointers for cases 2) and 3). This change is needed for the following patch that removes s_sibling at all and replaces it with a rb tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-22	sysfs: use rb-tree for name lookups	Mikulas Patocka
	sysfs: use rb-tree for name lookups Use red-black tree for name lookups. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-22	sysfs: count subdirectories	Mikulas Patocka
	sysfs: count subdirectories This patch introduces a subdirectory counter for each sysfs directory. Without the patch, sysfs_refresh_inode would walk all entries of the directory to calculate the number of subdirectories. This patch improves time of "ls -la /sys/block" when there are 10000 block devices from 9 seconds to 0.19 seconds. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-22	debugfs: Fix a comment mistake	Harry Wei
	The file is fs/debugfs/inode.c but the comment says it is file.c. This patch can fix this little mistake. Signed-off-by: Harry Wei <harryxiyou@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-22	xfs: fix tracing builds inside the source tree	Christoph Hellwig
	The code really requires the current source directory to be in the header search path. We already do this if building with an object tree separate from the source, but it needs to be added manually if building inside the source. The cflags addition for it accidentally got removed when collapsing the xfs directory structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-22	ceph: fix memory leak	Noah Watkins
	kfree does not clean up indirect allocations in ceph_fs_client and ceph_options (e.g. snapdir_name). Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-08-22	UBIFS: introduce a helper to dump scanning info	Artem Bityutskiy
	This commit adds 'dbg_dump_sleb()' helper function to dump scanning information. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
2011-08-21	Btrfs: fix 64 bit divide problem	Josef Bacik
	This fixes a regression introduced by commit cdcb725c05fe ("Btrfs: check if there is enough space for balancing smarter"). We can't do 64-bit divides on 32-bit architectures. In cases where we need to divide/multiply by 2 we should just left/right shift respectively, and in cases where theres N number of devices use do_div. Also make the counters u64 to match up with rw_devices. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Acked-and-tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-21	Merge branch 'for_linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: flush any pending end_io requests before DIO reads w/dioread_nolock ext4: fix nomblk_io_submit option so it correctly converts uninit blocks ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode ext4: Fix ext4_should_writeback_data() for no-journal mode
2011-08-19	ext4: flush any pending end_io requests before DIO reads w/dioread_nolock	Jiaying Zhang
	There is a race between ext4 buffer write and direct_IO read with dioread_nolock mount option enabled. The problem is that we clear PageWriteback flag during end_io time but will do uninitialized-to-initialized extent conversion later with dioread_nolock. If an O_direct read request comes in during this period, ext4 will return zero instead of the recently written data. This patch checks whether there are any pending uninitialized-to-initialized extent conversion requests before doing O_direct read to close the race. Note that this is just a bandaid fix. The fundamental issue is that we clear PageWriteback flag before we really complete an IO, which is problem-prone. To fix the fundamental issue, we may need to implement an extent tree cache that we can use to look up pending to-be-converted extents. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-19	sunrpc: use better NUMA affinities	Eric Dumazet
	Use NUMA aware allocations to reduce latencies and increase throughput. sunrpc kthreads can use kthread_create_on_node() if pool_mode is "percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can also take into account NUMA node affinity for memory allocations. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: "J. Bruce Fields" <bfields@fieldses.org> CC: Neil Brown <neilb@suse.de> CC: David Miller <davem@davemloft.net> Reviewed-by: Greg Banks <gnb@fastmail.fm> [bfields@redhat.com: fix up caller nfs41_callback_up] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19	locks: setlease cleanup	J. Bruce Fields
	There's an incorrect comment here. Also clean up the logic: the "rdlease" and "wrlease" locals are confusingly named, and don't really add anything since we can make a decision as soon as we hit one of these cases. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19	locks: fix tracking of inprogress lease breaks	J. Bruce Fields
	We currently use a bit in fl_flags to record whether a lease is being broken, and set fl_type to the type (RDLCK or UNLCK) that it will eventually have. This means that once the lease break starts, we forget what the lease's type used to be. Breaking a read lease will then result in blocking read opens, even though there's no conflict--because the lease type is now F_UNLCK and we can no longer tell whether it was previously a read or write lease. So, instead keep fl_type as the original type (the type which we enforce), and keep track of whether we're unlocking or merely downgrading by replacing the single FL_INPROGRESS flag by FL_UNLOCK_PENDING and FL_DOWNGRADE_PENDING flags. To get this right we also need to track separate downgrade and break times, to handle the case where a write-leased file gets conflicting opens first for read, then later for write. (I first considered just eliminating the downgrade behavior completely--nfsv4 doesn't need it, and nobody as far as I can tell actually uses it currently--but Jeremy Allison tells me that Windows oplocks do behave this way, so Samba will probably use this some day.) Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19	locks: move F_INPROGRESS from fl_type to fl_flags field	J. Bruce Fields
	F_INPROGRESS isn't exposed to userspace. To me it makes more sense in fl_flags.... Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19	locks: minor lease cleanup	J. Bruce Fields
	Use a helper function, to simplify upcoming changes. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19	nfsd4: return nfserr_symlink on v4 OPEN of non-regular file	J. Bruce Fields
	Without this, an attempt to open a device special file without first stat'ing it will fail. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19	nfsd4: fix seqid_mutating_error	J. Bruce Fields
	The set of errors here does not agree with the set of errors specified in the rfc! While we're there, turn this macros into a function, for the usual reasons, and move it to the one place where it's actually used. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19	UBIFS: not build debug messages with CONFIG_UBIFS_FS_DEBUG disabled	Michal Marek
	With $ grep -e UBIFS_FS_DEBUG -e DYNAMIC_DEBUG .config # CONFIG_UBIFS_FS_DEBUG is not set CONFIG_DYNAMIC_DEBUG=y Debug messages are kept in the object files due to the dynamic_pr_debug() macro, even if they are never going to be printed: $ make fs/ubifs/super.o $ strings fs/ubifs/super.o \| grep 'compiled on' compiled on: Aug 11 2011 at 12:21:38 Use plain printk to fix this. Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>