summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2012-12-03NFSD: Lock state before calling fault injection functionBryan Schumaker
Each function touches state in some way, so getting the lock earlier can help simplify code. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03nfsd4: discard some unused nfsd4_verify xdr codeJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-02ext4: move extra inode read to a new functionTao Ma
Currently, in ext4_iget we do a simple check to see whether there does exist some information starting from the end of i_extra_size. With inline data added, this procedure is more complicated. So move it to a new function named ext4_iget_extra_inode. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A bunch of fixes; the last one is this cycle regression, the rest are -stable fodder." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix off-by-one in argument passed by iterate_fd() to callbacks lookup_one_len: don't accept . and .. cifs: get rid of blind d_drop() in readdir nfs_lookup_revalidate(): fix a leak don't do blind d_drop() in nfs_prime_dcache()
2012-11-30Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Two low risk, small fixes, that fix cifs regressions introduced in 3.7." * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix wrong buffer pointer usage in smb_set_file_info cifs: fix writeback race with file that is growing
2012-11-29fix off-by-one in argument passed by iterate_fd() to callbacksAl Viro
Noticed by Pavel Roskin; the thing in his patch I disagree with was compensating for that shite in callbacks instead of fixing it once in the iterator itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29lookup_one_len: don't accept . and ..Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29cifs: get rid of blind d_drop() in readdirAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29nfs_lookup_revalidate(): fix a leakAl Viro
We are leaking fattr and fhandle if we decide that dentry is not to be invalidated, after all (e.g. happens to be a mountpoint). Just free both before that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29don't do blind d_drop() in nfs_prime_dcache()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29ext4: fix possible use after free with metadata csumTheodore Ts'o
Commit fa77dcfafeaa introduces block bitmap checksum calculation into ext4_new_inode() in the case that block group was uninitialized. However we brelse() the bitmap buffer before we attempt to checksum it so we have no guarantee that the buffer is still there. Fix this by releasing the buffer after the possible checksum computation. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: stable@vger.kernel.org
2012-11-29ext4: restructure ext4_ext_direct_IO()Theodore Ts'o
Remove a level of indentation by moving the DIO read and extending write case to the beginning of the file. This results in no actual programmatic changes to the file, but makes it easier to read/understand. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-29blkdev_max_block: make private to fs/buffer.cLinus Torvalds
We really don't want to look at the block size for the raw block device accesses in fs/block-dev.c, because it may be changing from under us. So get rid of the max_block logic entirely, since the caller should already have done it anyway. That leaves the only user of this function in fs/buffer.c, so move the whole function there and make it static. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-29direct-io: don't read inode->i_blkbits multiple timesLinus Torvalds
Since directio can work on a raw block device, and the block size of the device can change under it, we need to do the same thing that fs/buffer.c now does: read the block size a single time, using ACCESS_ONCE(). Reading it multiple times can get different results, which will then confuse the code because it actually encodes the i_blksize in relationship to the underlying logical blocksize. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-29xfs: fix stray dquot unlock when reclaiming dquotsDave Chinner
When we fail to get a dquot lock during reclaim, we jump to an error handler that unlocks the dquot. This is wrong as we didn't lock the dquot, and unlocking it means who-ever is holding the lock has had it silently taken away, and hence it results in a lock imbalance. Found by inspection while modifying the code for the numa-lru patchset. This fixes a random hang I've been seeing on xfstest 232 for the past several months. cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-29xfs: fix direct IO nested transaction deadlock.Dave Chinner
The direct IO path can do a nested transaction reservation when writing past the EOF. The first transaction is the append transaction for setting the filesize at IO completion, but we can also need a transaction for allocation of blocks. If the log is low on space due to reservations and small log, the append transaction can be granted after wating for space as the only active transaction in the system. This then attempts a reservation for an allocation, which there isn't space in the log for, and the reservation sleeps. The result is that there is nothing left in the system to wake up all the processes waiting for log space to come free. The stack trace that shows this deadlock is relatively innocuous: xlog_grant_head_wait xlog_grant_head_check xfs_log_reserve xfs_trans_reserve xfs_iomap_write_direct __xfs_get_blocks xfs_get_blocks_direct do_blockdev_direct_IO __blockdev_direct_IO xfs_vm_direct_IO generic_file_direct_write xfs_file_dio_aio_writ xfs_file_aio_write do_sync_write vfs_write This was discovered on a filesystem with a log of only 10MB, and a log stripe unit of 256k whih increased the base reservations by 512k. Hence a allocation transaction requires 1.2MB of log space to be available instead of only 260k, and so greatly increased the chance that there wouldn't be enough log space available for the nested transaction to succeed. The key to reproducing it is this mkfs command: mkfs.xfs -f -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b $SCRATCH_DEV The test case was a 1000 fsstress processes running with random freeze and unfreezes every few seconds. Thanks to Eryu Guan (eguan@redhat.com) for writing the test that found this on a system with a somewhat unique default configuration.... cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andrew Dahl <adahl@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-29xfs: byte range granularity for XFS_IOC_ZERO_RANGEDave Chinner
XFS_IOC_ZERO_RANGE simply does not work properly for non page cache aligned ranges. Neither test 242 or 290 exercise this correctly, so the behaviour is completely busted even though the tests pass. Fix it to support full byte range granularity as was originally intended for this ioctl. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-29blockdev: remove bd_block_size_semaphore againLinus Torvalds
This reverts the block-device direct access code to the previous unlocked code, now that fs/buffer.c no longer needs external locking. With this, fs/block_dev.c is back to the original version, apart from a whitespace cleanup that I didn't want to revert. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-29fs/buffer.c: make block-size be per-page and protected by the page lockLinus Torvalds
This makes the buffer size handling be a per-page thing, which allows us to not have to worry about locking too much when changing the buffer size. If a page doesn't have buffers, we still need to read the block size from the inode, but we can do that with ACCESS_ONCE(), so that even if the size is changing, we get a consistent value. This doesn't convert all functions - many of the buffer functions are used purely by filesystems, which in turn results in the buffer size being fixed at mount-time. So they don't have the same consistency issues that the raw device access can have. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29do_coredump(): get rid of pt_regs argumentAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28get rid of pt_regs argument of ->load_binary()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28get rid of pt_regs argument of search_binary_handler()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28get rid of pt_regs argument of do_execve_common()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28get rid of pt_regs argument of do_execve()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28make compat_do_execve() static, lose pt_regs argumentAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28kill daemonize()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28ext4: rationalize ext4_extents.h inclusionTheodore Ts'o
Previously, ext4_extents.h was being included at the end of ext4.h, which was bad for a number of reasons: (a) it was not being included in the expected place, and (b) it caused the header to be included multiple times. There were #ifdef's to prevent this from causing any problems, but it still was unnecessary. By moving the function declarations that were in ext4_extents.h to ext4.h, which is standard practice for where the function declarations for the rest of ext4.h can be found, we can remove ext4_extents.h from being included in ext4.h at all, and then we can only include ext4_extents.h where it is needed in ext4's source files. It should be possible to move a few more things into ext4.h, and further reduce the number of source files that need to #include ext4_extents.h, but that's a cleanup for another day. Reported-by: Sachin Kamat <sachin.kamat@linaro.org> Reported-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28NFSD: Fold fault_inject.h into state.hBryan Schumaker
There were only a small number of functions in this file and since they all affect stored state I think it makes sense to put them in state.h instead. I also dropped most static inline declarations since there are no callers when fault injection is not enabled. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28ext4: fixed potential NULL dereference in ext4_calculate_overhead()Vahram Martirosyan
The memset operation before check can cause a BUG if the memory allocation failed. Since we are using get_zeroed_age, there is no need to use memset anyway. Found by the Spruce system in cooperation with the KEDR Framework. Signed-off-by: Vahram Martirosyan <vmartirosyan@linuxtesting.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28ext4: simple cleanup in fiemap codepathLukas Czerner
This commit is simple cleanup of fiemap codepath which has not been included in previous commit to make the changes clearer. In this commit we rename cbex variable to newex in ext4_fill_fiemap_extents() because callback is no longer present Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28ext4: prevent race while walking extent tree for fiemapLukas Czerner
Currently ext4_ext_walk_space() only takes i_data_sem for read when searching for the extent at given block with ext4_ext_find_extent(). Then it drops the lock and the extent tree can be changed at will. However later on we're searching for the 'next' extent, but the extent tree might already have changed, so the information might not be accurate. In fact we can hit BUG_ON(end <= start) if the extent got inserted into the tree after the one we found and before the block we were searching for. This has been reproduced by running xfstests 225 in loop on s390x architecture, but theoretically we could hit this on any other architecture as well, but probably not as often. Moreover the extent currently in delayed allocation might be allocated after we search the extent tree and before we search extent status tree delayed buffers resulting in those delayed buffers being completely missed, even though completely written and allocated. We fix all those problems in several steps: 1. remove unnecessary callback indirection 2. rename functions ext4_ext_walk_space -> ext4_fill_fiemap_extents ext4_ext_fiemap_cb -> ext4_find_delayed_extent 3. move fiemap_fill_next_extent() into ext4_fill_fiemap_extents() 4. hold the i_data_sem for: ext4_ext_find_extent() ext4_ext_next_allocated_block() ext4_find_delayed_extent() 5. call fiemap_fill_next_extent after releasing the i_data_sem 6. move path reinitialization into the critical section. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28cputime: Rename thread_group_times to thread_group_cputime_adjustedFrederic Weisbecker
We have thread_group_cputime() and thread_group_times(). The naming doesn't provide enough information about the difference between these two APIs. To lower the confusion, rename thread_group_times() to thread_group_cputime_adjusted(). This name better suggests that it's a version of thread_group_cputime() that does some stabilization on the raw cputime values. ie here: scale on top of CFS runtime stats and bound lower value for monotonicity. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-28CIFS: Fix wrong buffer pointer usage in smb_set_file_infoPavel Shilovsky
Commit 6bdf6dbd662176c0da5c3ac8ed10ac94e7776c85 caused a regression in setattr codepath that leads to files with wrong attributes. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-11-28nfsd: make NFSv4 grace time per netStanislav Kinsbursky
Grace time is a part of NFSv4 state engine, which is constructed per network namespace. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: make NFSv4 lease time per netStanislav Kinsbursky
Lease time is a part of NFSv4 state engine, which is constructed per network namespace. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: remove redundant declarationsStanislav Kinsbursky
This is a cleanup patch. Functions nfsd_pool_stats_open() and nfsd_pool_stats_release() are declared in fs/nfsd/nfsd.h. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: recovery - make in_grace per netStanislav Kinsbursky
Flag in_grace is a part of client tracking state, which is network namesapce aware. So let'a replace global static variable with per-net one. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: recovery - make rec_file per netStanislav Kinsbursky
Opening and closing of this file is done in client tracking init and exit operations. Client tracking is done in network namespace context already. So let's make this file opened and closed per network context - this will simlify it's management. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: call state init and shutdown twiceStanislav Kinsbursky
Split NFSv4 state init and shutdown into two different calls: per-net one and generic one. Per-net cwinit/shutdown pair have to be called for any namespace, generic pair - only once on NSFd kthreads start and shutdown respectively. Refresh of diff-nfsd-call-state-init-twice Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: cleanup NFSd state start a bitStanislav Kinsbursky
This patch renames nfs4_state_start_net() into nfs4_state_create_net(), where get_net() now performed. Also it introduces new nfs4_state_start_net(), which is now responsible for state creation and initializing all per-net data and which is now called from nfs4_state_start(). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: cleanup NFSd state shutdown a bitStanislav Kinsbursky
This patch renames __nfs4_state_shutdown_net() into nfs4_state_shutdown_net(), __nfs4_state_shutdown() into nfs4_state_shutdown_net() and moves all network related shutdown operations to nfs4_state_shutdown_net(). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: make delegations shutdown network namespace awareStanislav Kinsbursky
NFSv4 delegations are stored in global list. But they are nfs4_client dependent, which is network namespace aware already. State shutdown and laundromat are done per network namespace as well. So, delegations unhash have to be done in network namespace context. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: make client_lock per netStanislav Kinsbursky
This lock protects the client lru list and session hash table, which are allocated per network namespace already. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd4: remove state lock from nfs4_state_shutdownStanislav Kinsbursky
Protection of __nfs4_state_shutdown() with nfs4_lock_state() looks redundant. This function is called by the last NFSd thread on it's exit and state lock protects actually two functions (del_recall_lru is protected by recall_lock): 1) nfsd4_client_tracking_exit 2) __nfs4_state_shutdown_net "nfsd4_client_tracking_exit" doesn't require state lock protection, because it's state can be modified only by tracker callbacks. Here a re they: 1) create: is called only from nfsd4_proc_compound. 2) remove: is called from either nfsd4_proc_compound or nfs4_laundromat. 3) check: is called only from nfsd4_proc_compound. 4) grace_done; called only from nfs4_laundromat. nfsd4_proc_compound is called onll by NFSd kthread, which is exiting right now. nfs4_laundromat is called by laundry_wq. But laundromat_work was canceled already. "__nfs4_state_shutdown_net" also doesn't require state lock protection, because all NFSd kthreads are dead, and no race can happen with NFSd start, because "nfsd_up" flag is still set. Moreover, all Nfsd shutdown is protected with global nfsd_mutex. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd4: remove state lock from nfsd4_load_reboot_recovery_dataJ. Bruce Fields
That function is only called under nfsd_mutex: we know that because the only caller is nfsd_svc, via nfsd_svc nfsd_startup nfs4_state_start nfsd4_client_tracking_init client_tracking_ops->init == nfsd4_load_reboot_recovery_data The shared state accessed here includes: - user_recovery_dirname: used here, modified only by nfs4_reset_recoverydir, which can be verified to only be called under nfsd_mutex. - filesystem state, protected by i_mutex (handwaving slightly here) - rec_file, reclaim_str_hashtbl, reclaim_str_hashtbl_size: other than here, used only from code called from nfsd or laundromat threads, both of which should be started only after this runs (see nfsd_svc) and stopped before this could run again (see nfsd_shutdown, called from nfsd_last_thread). Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-27nfsd4: return badname, not inval, on "." or "..", or "/"J. Bruce Fields
The spec requires badname, not inval, in these cases. Some callers want us to return enoent, but I can see no justification for that. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-27cifs: fix writeback race with file that is growingJeff Layton
Commit eddb079deb4 created a regression in the writepages codepath. Previously, whenever it needed to check the size of the file, it did so by consulting the inode->i_size field directly. With that patch, the i_size was fetched once on entry into the writepages code and that value was used henceforth. If the file is changing size though (for instance, if someone is writing to it or has truncated it), then that value is likely to be wrong. This can lead to data corruption. Pages past the EOF at the time that the writepages call was issued may be silently dropped and ignored because cifs_writepages wrongly assumes that the file must have been truncated in the interim. Fix cifs_writepages to properly fetch the size from the inode->i_size field instead to properly account for this possibility. Original bug report is here: https://bugzilla.kernel.org/show_bug.cgi?id=50991 Reported-and-Tested-by: Maxim Britov <ungifted01@gmail.com> Reviewed-by: Suresh Jayaraman <sjayaraman@suse.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-11-26Merge branch 'akpm' (Fixes from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "8 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (8 patches) futex: avoid wake_futex() for a PI futex_q watchdog: using u64 in get_sample_period() writeback: put unused inodes to LRU after writeback completion mm: vmscan: check for fatal signals iff the process was throttled Revert "mm: remove __GFP_NO_KSWAPD" proc: check vma->vm_file before dereferencing UAPI: strip the _UAPI prefix from header guards during header installation include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALID
2012-11-26Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext3 regression fix from Jan Kara: "Fix an ext3 regression introduced during 3.7 merge window. It leads to deadlock if you stress the filesystem in the right way (luckily only if blocksize < pagesize)." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: jbd: Fix lock ordering bug in journal_unmap_buffer()