summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-01-06ext4: Make printk's consistently prefixed with "EXT4-fs: "Theodore Ts'o
Previously, some were "ext4: ", and some were "EXT4: "; change them to be consistent with most ext4 printk's, which is to use "EXT4-fs: ". Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-06ext4: Add sanity checks for the superblock before mounting the filesystemTheodore Ts'o
This avoids insane superblock configurations that could lead to kernel oops due to null pointer derefences. http://bugzilla.kernel.org/show_bug.cgi?id=12371 Thanks to David Maciejak at Fortinet's FortiGuard Global Security Research Team who discovered this bug independently (but at approximately the same time) as Thiemo Nagel, who submitted the patch. Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2009-01-05ext4: Add mount option to set kjournald's I/O priorityTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jens Axboe <jens.axboe@oracle.com>
2009-01-06[XFS] use scalable vmap APINick Piggin
Implement XFS's large buffer support with the new vmap APIs. See the vmap rewrite (db64fe02) for some numbers. The biggest improvement that comes from using the new APIs is avoiding the global KVA allocation lock on every call. Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2009-01-06[XFS] remove old vmap cacheNick Piggin
XFS's vmap batching simply defers a number (up to 64) of vunmaps, and keeps track of them in a list. To purge the batch, it just goes through the list and calls vunamp on each one. This is pretty poor: a global TLB flush is generally still performed on each vunmap, with the most expensive parts of the operation being the broadcast IPIs and locking involved in the SMP callouts, and the locking involved in the vmap management -- none of these are avoided by just batching up the calls. I'm actually surprised it ever made much difference. (Now that the lazy vmap allocator is upstream, this description is not quite right, but the vunmap batching still doesn't seem to do much) Rip all this logic out of XFS completely. I will improve vmap performance and scalability directly in subsequent patch. Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2009-01-05Btrfs: drop EXPORT symbols from extent_io.cChris Mason
They should stay out until this is turned into generic code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: fs/dlm/ast.c: fix warning dlm: add new debugfs entry dlm: add time stamp of blocking callback dlm: change lock time stamping dlm: improve how bast mode handling dlm: remove extra blocking callback check dlm: replace schedule with cond_resched dlm: remove kmap/kunmap dlm: trivial annotation of be16 value dlm: fix up memory allocation flags
2009-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (27 commits) GFS2: Use DEFINE_SPINLOCK GFS2: Fix use-after-free bug on umount (try #2) Revert "GFS2: Fix use-after-free bug on umount" GFS2: Streamline alloc calculations for writes GFS2: Send useful information with uevent messages GFS2: Fix use-after-free bug on umount GFS2: Remove ancient, unused code GFS2: Move four functions from super.c GFS2: Fix bug in gfs2_lock_fs_check_clean() GFS2: Send some sensible sysfs stuff GFS2: Kill two daemons with one patch GFS2: Move gfs2_recoverd into recovery.c GFS2: Fix "truncate in progress" hang GFS2: Clean up & move gfs2_quotad GFS2: Add more detail to debugfs glock dumps GFS2: Banish struct gfs2_rgrpd_host GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd GFS2: Move rg_igeneration into struct gfs2_rgrpd GFS2: Banish struct gfs2_dinode_host GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize ...
2009-01-05Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (138 commits) ocfs2: Access the right buffer_head in ocfs2_merge_rec_left. ocfs2: use min_t in ocfs2_quota_read() ocfs2: remove unneeded lvb casts ocfs2: Add xattr support checking in init_security ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle ocfs2: calculate and reserve credits for xattr value in mknod ocfs2/xattr: fix credits calculation during index create ocfs2/xattr: Always updating ctime during xattr set. ocfs2/xattr: Remove extend_trans call and add its credits from the beginning ocfs2/dlm: Fix race during lockres mastery ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler() ocfs2/dlm: Fix a race between migrate request and exit domain ocfs2: One more hamming code optimization. ocfs2: Another hamming code optimization. ocfs2: Don't hand-code xor in ocfs2_hamming_encode(). ocfs2: Enable metadata checksums. ocfs2: Validate superblock with checksum and ecc. ocfs2: Checksum and ECC for directory blocks. ...
2009-01-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: inotify: fix type errors in interfaces fix breakage in reiserfs_new_inode() fix the treatment of jfs special inodes vfs: remove duplicate code in get_fs_type() add a vfs_fsync helper sys_execve and sys_uselib do not call into fsnotify zero i_uid/i_gid on inode allocation inode->i_op is never NULL ntfs: don't NULL i_op isofs check for NULL ->i_op in root directory is dead code affs: do not zero ->i_op kill suid bit only for regular files vfs: lseek(fd, 0, SEEK_CUR) race condition
2009-01-05Btrfs: Fix checkpatch.pl warningsChris Mason
There were many, most are fixed now. struct-funcs.c generates some warnings but these are bogus. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-05Btrfs: Fix free block discard calls down to the block layerLiu Hui
This is a patch to fix discard semantic to make Btrfs work with FTL and SSD. We can improve FTL's performance by telling it which sectors are freed by file system. But if we don't tell FTL the information of free sectors in proper time, the transaction mechanism of Btrfs will be destroyed and Btrfs could not roll back the previous transaction under the power loss condition. There are some problems in the old implementation: 1, In __free_extent(), the pinned down extents should not be discarded. 2, In free_extents(), the free extents are all pinned, so they need to be discarded in transaction committing time instead of free_extents(). 3, The reserved extent used by log tree should be discard too. This patch change discard behavior as follows: 1, For the extents which need to be free at once, we discard them in update_block_group(). 2, Delay discarding the pinned extent in btrfs_finish_extent_commit() when committing transaction. 3, Remove discarding from free_extents() and __free_extent() 4, Add discard interface into btrfs_free_reserved_extent() 5, Discard sectors before updating the free space cache, otherwise, FTL will destroy file system data.
2009-01-05Btrfs: avoid orphan inode caused by log replayYan Zheng
drop_one_dir_item does not properly update inode's link count. It can be reproduced by executing following commands: #touch test #sync #rm -f test #dd if=/dev/zero bs=4k count=1 of=test conv=fsync #echo b > /proc/sysrq-trigger This fixes it by adding an BTRFS_ORPHAN_ITEM_KEY for the inode Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2009-01-05Btrfs: avoid potential super block corruptionYan Zheng
The data in fs_info->super_for_commit are zeros before the first transaction commit. If tree log sync and system crash both occur before the first transaction commit, super block will get corrupted. This fixes it by properly filling in the super_for_commit field at open time. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2009-01-05Btrfs: do not call kfree if kmalloc failed in btrfs_sysfs_add_superShen Feng
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
2009-01-05Btrfs: fix a memory leak in btrfs_get_sbShen Feng
subvol_name should be freed if error occurs. Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
2009-01-05Btrfs: Fix typo in clear_state_cbLiu Hui
In clear_state_cb, we should check 'tree->ops->clear_bit_hook' instead of 'tree->ops->set_bit_hook'. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-05Btrfs: Fix memset length in btrfs_file_writeyanhai zhu
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-05Btrfs: update directory's size when creating subvol/snapshotYan Zheng
Make sure directory's size properly updated when creating subvol/snapshot. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2009-01-05Btrfs: add permission checks to the ioctlsChris Mason
Only root can add/remove devices Only root can defrag subtrees Only files open for writing can be defragged Only files open for writing can be the destination for a clone Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-05inotify: fix type errors in interfacesMichael Kerrisk
The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes. For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is currently 'u32', it should be '__s32' . That is Robert's suggestion, and is consistent with the other declarations of watch descriptors in the kernel source, in particular, the inotify_event structure in include/linux/inotify.h: struct inotify_event { __s32 wd; /* watch descriptor */ __u32 mask; /* watch mask */ __u32 cookie; /* cookie to synchronize two events */ __u32 len; /* length (including nulls) of name */ char name[0]; /* stub for possible name */ }; The patch makes the changes needed for inotify_rm_watch(). Signed-off-by: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Robert Love <rlove@google.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05fix breakage in reiserfs_new_inode()Al Viro
now that we use ih.key earlier, we need to do all its setup early enough Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05fix the treatment of jfs special inodesAl Viro
We used to put them on a single list, without any locking. Racy. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05vfs: remove duplicate code in get_fs_type()Li Zefan
save 14 bytes: text data bss dec hex filename 1354 32 4 1390 56e fs/filesystems.o.before text data bss dec hex filename 1340 32 4 1376 560 fs/filesystems.o Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05add a vfs_fsync helperChristoph Hellwig
Fsync currently has a fdatawrite/fdatawait pair around the method call, and a mutex_lock/unlock of the inode mutex. All callers of fsync have to duplicate this, but we have a few and most of them don't quite get it right. This patch adds a new vfs_fsync that takes care of this. It's a little more complicated as usual as ->fsync might get a NULL file pointer and just a dentry from nfsd, but otherwise gets afile and we want to take the mapping and file operations from it when it is there. Notes on the fsync callers: - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the lower file - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host file, and returning 0 when ->fsync was missing - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor taking i_mutex. Now given that shared memory doesn't have disk backing not doing anything in fsync seems fine and I left it out of the vfs_fsync conversion for now, but in that case we might just not pass it through to the lower file at all but just call the no-op simple_sync_file directly. [and now actually export vfs_fsync] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05sys_execve and sys_uselib do not call into fsnotifyEric Paris
sys_execve and sys_uselib do not call into fsnotify so inotify does not get open events for these types of syscalls. This patch simply makes the requisite fsnotify calls. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05zero i_uid/i_gid on inode allocationAl Viro
... and don't bother in callers. Don't bother with zeroing i_blocks, while we are at it - it's already been zeroed. i_mode is not worth the effort; it has no common default value. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05inode->i_op is never NULLAl Viro
We used to have rather schizophrenic set of checks for NULL ->i_op even though it had been eliminated years ago. You'd need to go out of your way to set it to NULL explicitly _and_ a bunch of code would die on such inodes anyway. After killing two remaining places that still did that bogosity, all that crap can go away. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05ntfs: don't NULL i_opAl Viro
it's already set to empty table (and no, ntfs doesn't have any explicit checks for NULL ->i_op or NULL ->i_fop) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05isofs check for NULL ->i_op in root directory is dead codeAl Viro
for one thing it never happens, for another we check that inode is a directory right after that place anyway (and we'd already checked that reading it from disk has not failed). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05affs: do not zero ->i_opAl Viro
it is already set to empty table and should never be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05vfs: lseek(fd, 0, SEEK_CUR) race conditionAlain Knaff
This patch fixes a race condition in lseek. While it is expected that unpredictable behaviour may result while repositioning the offset of a file descriptor concurrently with reading/writing to the same file descriptor, this should not happen when merely *reading* the file descriptor's offset. Unfortunately, the only portable way in Unix to read a file descriptor's offset is lseek(fd, 0, SEEK_CUR); however executing this concurrently with read/write may mess up the position. [with fixes from akpm] Signed-off-by: Alain Knaff <alain@knaff.lu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05ocfs2: Access the right buffer_head in ocfs2_merge_rec_left.Tao Ma
In commit "ocfs2: Use metadata-specific ocfs2_journal_access_*() functions", the wrong buffer_head is accessed. So change it to the right buffer_head. Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: use min_t in ocfs2_quota_read()Mark Fasheh
This is preferred to min(). Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: remove unneeded lvb castsMark Fasheh
dlmglue.c has lots of code which casts the return value of ocfs2_dlm_lvb(). This is pointless however, as ocfs2_dlm_lvb() returns void *. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Add xattr support checking in init_securityTiger Yang
We must check whether ocfs2 volume support xattr in init_security, if not support xattr and security is enable, would cause failure of mknod. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: alloc xattr bucket in ocfs2_xattr_set_handleTiger Yang
In extreme situation, may need xattr bucket for setting security entry and acl entries during mknod. This only happens when block size is too small. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: calculate and reserve credits for xattr value in mknodTiger Yang
We extend the credits for xattr's large value in set_value_outside before, this can give rise to a credits issue when we set one security entry and two acl entries duing mknod. As we remove extend_trans form set_value_outside, we must calculate and reserve the credits for xattr's large value in mknod. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2/xattr: fix credits calculation during index createTao Ma
When creating a xattr index block, the old calculation forget to add credits for the meta change of the alloc file. So add more credits and more comments to explain it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2/xattr: Always updating ctime during xattr set.Tao Ma
In xattr set, we should always update ctime if the operation goes sucessfully. The old one mistakenly put it in ocfs2_xattr_set_entry which is only called when we set xattr in inode or xattr block. The side benefit is that it resolve the bug 1052 since in that scenario, ocfs2_calc_xattr_set_need only calc out the xattr set credits while ocfs2_xattr_set_entry update the inode also which isn't concerned with the process of xattr set. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2/xattr: Remove extend_trans call and add its credits from the beginningTao Ma
Actually, when setting a new xattr value, we know it from the very beginning, and it isn't like the extension of bucket in which case we can't figure it out. So remove ocfs2_extend_trans in that function and calculate it before the transaction. It also relieve acl operation from the worry about the side effect of ocfs2_extend_trans. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2/dlm: Fix race during lockres masterySunil Mushran
dlm_get_lock_resource() is supposed to return a lock resource with a proper master. If multiple concurrent threads attempt to lookup the lockres for the same lockid while the lock mastery in underway, one or more threads are likely to return a lockres without a proper master. This patch makes the threads wait in dlm_get_lock_resource() while the mastery is underway, ensuring all threads return the lockres with a proper master. This issue is known to be limited to users using the flock() syscall. For all other fs operations, the ocfs2 dlmglue layer serializes the dlm op for each lockid. Users encountering this bug will see flock() return EINVAL and dmesg have the following error: ERROR: Dlm error "DLM_BADARGS" while calling dlmlock on resource <LOCKID>: bad api args Reported-by: Coly Li <coyli@suse.de> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking listSunil Mushran
This patch adds a new lock, dlm->tracking_lock, to protect adding/removing lockres' to/from the dlm->tracking_list. We were previously using dlm->spinlock for the same, but that proved inadequate as we could be freeing a lockres from a context that did not hold that lock. As the new lock only protects this list, we can explicitly take it when removing the lockres from the tracking list. This bug was exposed when testing multiple processes concurrently flock() the same file. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migratingSunil Mushran
During lockres purge, o2dlm sends a drop reference message to the lockres master. This patch delays the message if the lockres is being migrated. Fixes oss bugzilla#1012 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1012 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler()Sunil Mushran
Patch cleans printed errors in dlm_proxy_ast_handler(). The errors now includes the node number that sent the (b)ast. Also it reduces the number of endian swaps of the cookie. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2/dlm: Fix a race between migrate request and exit domainSunil Mushran
Patch address a racing migrate request message and an exit domain message. Instead of blocking exit domains for the duration of the migrate, we ignore failure to deliver that message. This is because an exiting domain should not have any active locks and thus has no role to play in the migration. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: One more hamming code optimization.Joel Becker
The previous optimization used a fast find-highest-bit-set operation to give us a good starting point in calc_code_bit(). This version lets the caller cache the previous code buffer bit offset. Thus, the next call always starts where the last one left off. This reduces the calculation another 39%, for a total 80% reduction from the original, naive implementation. At least, on my machine. This also brings the parity calculation to within an order of magnitude of the crc32 calculation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Another hamming code optimization.Joel Becker
In the calc_code_bit() function, we must find all powers of two beneath the code bit number, *after* it's shifted by those powers of two. This requires a loop to see where it ends up. We can optimize it by starting at its most significant bit. This shaves 32% off the time, for a total of 67.6% shaved off of the original, naive implementation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Don't hand-code xor in ocfs2_hamming_encode().Joel Becker
When I wrote ocfs2_hamming_encode(), I was following documentation of the algorithm and didn't have quite the (possibly still imperfect) grasp of it I do now. As part of this, I literally hand-coded xor. I would test a bit, and then add that bit via xor to the parity word. I can, of course, just do a single xor of the parity word and the source word (the code buffer bit offset). This cuts CPU usage by 53% on a mostly populated buffer (an inode containing utmp.h inline). Joel Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Enable metadata checksums.Joel Becker
Add OCFS2_FEATURE_INCOMPAT_META_ECC to the list of supported features. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>