summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-03-11hostfs: fix a not needed double checkMarco Stornelli
With the commit 3be2be0a32c18b0fd6d623cda63174a332ca0de1 we removed vmtruncate, but actaully there is no need to call inode_newsize_ok() because the checks are already done in inode_change_ok() at the begin of the function. Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2013-03-10ext4: reserve metadata block for every delayed writeLukas Czerner
Currently we only reserve space (data+metadata) in delayed allocation if we're allocating from new cluster (which is always in non-bigalloc file system) which is ok for data blocks, because we reserve the whole cluster. However we have to reserve metadata for every delayed block we're going to write because every block could potentially require metedata block when we need to grow the extent tree. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2013-03-10ext4: update reserved space after the 'correction'Lukas Czerner
Currently in ext4_ext_map_blocks() in delayed allocation writeback we would update the reservation and after that check whether we claimed cluster outside of the range of the allocation and if so, we'll give the block back to the reservation pool. However this also means that if the number of reserved data block dropped to zero before the correction, we would release all the metadata reservation as well, however we might still need it because the we're not done with the delayed allocation and there might be more blocks to come. This will result in error messages such as: EXT4-fs warning (device sdb): ext4_da_update_reserve_space:361: ino 12, allocated 1 with only 0 reserved metadata blocks (releasing 1 blocks with reserved 1 data blocks) This will only happen on bigalloc file system and it can be easily reproduced using fiemap-tester from xfstests like this: ./src/fiemap-tester -m DHDHDHDHD -S -p0 /mnt/test/file Or using xfstests such as 225. Fix this by doing the correction first and updating the reservation after that so that we do not accidentally decrease i_reserved_data_blocks to zero. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-10ext4: do not use yield()Lukas Czerner
Using yield() is strongly discouraged (see sched/core.c) especially since we can just use cond_resched(). Replace all use of yield() with cond_resched(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-10ext4: remove unused variable in ext4_free_blocks()Lukas Czerner
Remove unused variable 'freed' in ext4_free_blocks(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-10ext4: fix WARN_ON from ext4_releasepage()Jan Kara
ext4_releasepage() warns when it is passed a page with PageChecked set. However this can correctly happen when invalidate_inode_pages2_range() invalidates pages - and we should fail the release in that case. Since the page was dirty anyway, it won't be discarded and no harm has happened but it's good to be safe. Also remove bogus page_has_buffers() check - we are guaranteed page has buffers in this function. Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Tested-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-10ext4: fix the wrong number of the allocated blocks in ext4_split_extent()Zheng Liu
This commit fixes a wrong return value of the number of the allocated blocks in ext4_split_extent. When the length of blocks we want to allocate is greater than the length of the current extent, we return a wrong number. Let's see what happens in the following case when we call ext4_split_extent(). map: [48, 72] ex: [32, 64, u] 'ex' will be split into two parts: ex1: [32, 47, u] ex2: [48, 64, w] 'map->m_len' is returned from this function, and the value is 24. But the real length is 16. So it should be fixed. Meanwhile in this commit we use right length of the allocated blocks when get_reserved_cluster_alloc in ext4_ext_handle_uninitialized_extents is called. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: stable@vger.kernel.org
2013-03-10ext4: update extent status tree after an extent is zeroed outZheng Liu
When we try to split an extent, this extent could be zeroed out and mark as initialized. But we don't know this in ext4_map_blocks because it only returns a length of allocated extent. Meanwhile we will mark this extent as uninitialized because we only check m_flags. This commit update extent status tree when we try to split an unwritten extent. We don't need to worry about the status of this extent because we always mark it as initialized. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-10ext4: fix wrong m_len value after unwritten extent conversionZheng Liu
The ext4_ext_handle_uninitialized_extents() function was assuming the return value of ext4_ext_map_blocks() is equal to map->m_len. This incorrect assumption was harmless until we started use status tree as a extent cache because we need to update status tree according to 'm_len' value. Meanwhile this commit marks EXT4_MAP_MAPPED flag after unwritten extent conversion. It shouldn't cause a bug because we update status tree according to checking EXT4_MAP_UNWRITTEN flag. But it should be fixed. After applied this commit, the following error message from self-testing infrastructure disappears. ... kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len 3 in ext4_map_blocks (allocation) ... Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-10ext4: add self-testing infrastructure to do a sanity checkDmitry Monakhov
This commit adds a self-testing infrastructure like extent tree does to do a sanity check for extent status tree. After status tree is as a extent cache, we'd better to make sure that it caches right result. After applied this commit, we will get a lot of messages when we run xfstests as below. ... kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len 3 in ext4_map_blocks (allocation) ... kernel: ES cache assertation failed for inode: 230 es_cached ex [974/2/4781/20] != found ex [974/1/4781/1000] ... kernel: ES insert assertation failed for inode: 635 ex_status [0/45/21388/w] != es_status [44/1/21432/u] ... Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-10ext4: avoid a potential overflow in ext4_es_can_be_merged()Zheng Liu
Check the length of an extent to avoid a potential overflow in ext4_es_can_be_merged(). Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace bugfixes from Eric Biederman: "This is three simple fixes against 3.9-rc1. I have tested each of these fixes and verified they work correctly. The userns oops in key_change_session_keyring and the BUG_ON triggered by proc_ns_follow_link were found by Dave Jones. I am including the enhancement for mount to only trigger requests of filesystem modules here instead of delaying this for the 3.10 merge window because it is both trivial and the kind of change that tends to bit-rot if left untouched for two months." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Use nd_jump_link in proc_ns_follow_link fs: Limit sys_mount to only request filesystem modules (Part 2). fs: Limit sys_mount to only request filesystem modules. userns: Stop oopsing in key_change_session_keyring
2013-03-09proc: Use nd_jump_link in proc_ns_follow_linkEric W. Biederman
Update proc_ns_follow_link to use nd_jump_link instead of just manually updating nd.path.dentry. This fixes the BUG_ON(nd->inode != parent->d_inode) reported by Dave Jones and reproduced trivially with mkdir /proc/self/ns/uts/a. Sigh it looks like the VFS change to require use of nd_jump_link happend while proc_ns_follow_link was baking and since the common case of proc_ns_follow_link continued to work without problems the need for making this change was overlooked. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "These are scattered fixes and one performance improvement. The biggest functional change is in how we throttle metadata changes. The new code bumps our average file creation rate up by ~13% in fs_mark, and lowers CPU usage. Stefan bisected out a regression in our allocation code that made balance loop on extents larger than 256MB." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: improve the delayed inode throttling Btrfs: fix a mismerge in btrfs_balance() Btrfs: enforce min_bytes parameter during extent allocation Btrfs: allow running defrag in parallel to administrative tasks Btrfs: avoid deadlock on transaction waiting list Btrfs: do not BUG_ON on aborted situation Btrfs: do not BUG_ON in prepare_to_reloc Btrfs: free all recorded tree blocks on error Btrfs: build up error handling for merge_reloc_roots Btrfs: check for NULL pointer in updating reloc roots Btrfs: fix unclosed transaction handler when the async transaction commitment fails Btrfs: fix wrong handle at error path of create_snapshot() when the commit fails Btrfs: use set_nlink if our i_nlink is 0
2013-03-08Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "A small set of cifs fixes which includes one for a recent regression in the write path (pointed out by Anton), some fixes for rename problems and as promised for 3.9 removing the obsolete sockopt mount option (and the accompanying deprecation warning)." * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix missing of oplock_read value in smb30_values structure cifs: don't try to unlock pagecache page after releasing it cifs: remove the sockopt= mount option cifs: Check server capability before attempting silly rename cifs: Fix bug when checking error condition in cifs_rename_pending_delete()
2013-03-08vfs: don't BUG_ON() if following a /proc fd pseudo-symlink results in a symlinkLinus Torvalds
It's "normal" - it can happen if the file descriptor you followed was opened with O_NOFOLLOW. Reported-by: Dave Jones <davej@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-08f2fs: fix overflow when calculating utilization on 32-bitChangman Lee
Use div_u64 to fix overflow when calculating utilization. *long int* is 4-bytes on 32-bit so (user blocks * 100) might be overflow if disk size is over e.g. 512GB. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-07Merge tag 'ecryptfs-3.9-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: "Minor code cleanups and new Kconfig option to disable /dev/ecryptfs The code cleanups fix up W=1 compiler warnings and some unnecessary checks. The new Kconfig option, defaulting to N, allows the rarely used eCryptfs kernel to userspace communication channel to be compiled out. This may be the first step in it being eventually removed." Hmm. I'm not sure whether these should be called "fixes", and it probably should have gone in the merge window. But I'll let it slide. * tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: allow userspace messaging to be disabled eCryptfs: Fix redundant error check on ecryptfs_find_daemon_by_euid() ecryptfs: ecryptfs_msg_ctx_alloc_to_free(): remove kfree() redundant null check eCryptfs: decrypt_pki_encrypted_session_key(): remove kfree() redundant null check eCryptfs: remove unneeded checks in virt_to_scatterlist() eCryptfs: Fix -Wmissing-prototypes warnings eCryptfs: Fix -Wunused-but-set-variable warnings eCryptfs: initialize payload_len in keystore.c
2013-03-07xfs: rearrange some code in xfs_bmap for better localityDave Chinner
xfs_bmap.c is a big file, and some of the related code is spread all throughout the file requiring function prototypes for static function and jumping all through the file to follow a single call path. Rearrange the code so that: a) related functionality is grouped together; and b) functions are grouped in call dependency order While the diffstat is large, there are no code changes in the patch; it is just moving the functionality around and removing the function prototypes at the top of the file. The resulting layout of the code is as follows (top of file to bottom): - miscellaneous helper functions - extent tree block counting routines - debug/sanity checking code - bmap free list manipulation functions - inode fork format manipulation functions - internal/external extent tree seach functions - extent tree manipulation functions used during allocation - functions used during extent read/allocate/removal operations (i.e. xfs_bmapi_write, xfs_bmapi_read, xfs_bunmapi and xfs_getbmap) This means that following logic paths through the bmapi code is much simpler - most of the code relevant to a specific operation is now clustered together rather than spread all over the file.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-07xfs: rename random32() to prandom_u32()Akinobu Mita
Use more preferable function name which implies using a pseudo-random number generator. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: <bpm@sgi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: xfs@oss.sgi.com Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-07xfs: don't verify buffers after IO errorsDave Chinner
When we read a buffer, we might get an error from the underlying block device and not the real data. Hence if we get an IO error, we shouldn't run the verifier but instead just pass the IO error straight through. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-07xfs: fix xfs_iomap_eof_prealloc_initial_size typeMark Tinguely
Fix the return type of xfs_iomap_eof_prealloc_initial_size() to xfs_fsblock_t to reflect the fact that the return value may be an unsigned 64 bits if XFS_BIG_BLKNOS is defined. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-07xfs: increase prealloc size to double that of the previous extentBrian Foster
The updated speculative preallocation algorithm for handling sparse files can becomes less effective in situations with a high number of concurrent, sequential writers. The number of writers and amount of available RAM affect the writeback bandwidth slicing algorithm, which in turn affects the block allocation pattern of XFS. For example, running 32 sequential writers on a system with 32GB RAM, preallocs become fixed at a value of around 128MB (instead of steadily increasing to the 8GB maximum as sequential writes proceed). Update the speculative prealloc heuristic to base the size of the next prealloc on double the size of the preceding extent. This preserves the original aggressive speculative preallocation behavior and continues to accomodate sparse files at a slight cost of increasing the size of preallocated data regions following holes of sparse files. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-07xfs: fix potential infinite loop in xfs_iomap_prealloc_size()Brian Foster
If freesp == 0, we could end up in an infinite loop while squashing the preallocation. Break the loop when we've killed the prealloc entirely. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-07Btrfs: improve the delayed inode throttlingChris Mason
The delayed inode code batches up changes to the btree in hopes of doing them in bulk. As the changes build up, processes kick off worker threads and wait for them to make progress. The current code kicks off an async work queue item for each delayed node, which creates a lot of churn. It also uses a fixed 1 HZ waiting period for the throttle, which allows us to build a lot of pending work and can slow down the commit. This changes us to watch a sequence counter as it is bumped during the operations. We kick off fewer work items and have each work item do more work. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-07fs: Limit sys_mount to only request filesystem modules (Part 2).Eric W. Biederman
Add missing MODULE_ALIAS_FS("ocfs2") how did I miss that? Remove unnecessary MODULE_ALIAS_FS("devpts") devpts can not be modular. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-06Btrfs: fix a mismerge in btrfs_balance()Ilya Dryomov
Raid56 merge (merge commit e942f88) had mistakenly removed a call to __cancel_balance(), which resulted in balance not cleaning up after itself after a successful finish. (Cleanup includes switching the state, removing the balance item and releasing mut_ex_op testnset lock.) Bring it back. Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-06CIFS: Fix missing of oplock_read value in smb30_values structurePavel Shilovsky
Cc: stable@vger.kernel.org Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-06cifs: don't try to unlock pagecache page after releasing itJeff Layton
We had a recent fix to fix the release of pagecache pages when cifs_writev_requeue writes fail. Unfortunately, it releases the page before trying to unlock it. At that point, the page might be gone by the time the unlock comes in. Unlock the page first before checking the value of "rc", and only then end writeback and release the pages. The page lock isn't required for any of those operations so this should be safe. Reported-by: Anton Altaparmakov <aia21@cam.ac.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-06cifs: remove the sockopt= mount optionJeff Layton
...as promised for 3.9. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-06Merge branch 'master' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9
2013-03-06cifs: Check server capability before attempting silly renameSachin Prabhu
cifs_rename_pending_delete() attempts to silly rename file using CIFSSMBRenameOpenFile(). This uses the SET_FILE_INFORMATION TRANS2 command with information level set to the passthru info-level SMB_SET_FILE_RENAME_INFORMATION. We need to check to make sure that the server support passthru info-levels before attempting the silly rename or else we will fail to rename the file. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-06cifs: Fix bug when checking error condition in cifs_rename_pending_delete()Sachin Prabhu
Fix check for error condition after setting attributes with CIFSSMBSetFileInfo(). Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-05Btrfs: enforce min_bytes parameter during extent allocationChris Mason
Commit 24542bf7ea5e4fdfdb5157ff544c093fa4dcb536 changed preallocation of extents to cap the max size we try to allocate. It's a valid change, but the extent reservation code is also used by balance, and that can't tolerate a smaller extent being allocated. __btrfs_prealloc_file_range already has a min_size parameter, which is used by relocation to request a specific extent size. This commit adds an extra check to enforce that minimum extent size. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Reported-by: Stefan Behrens <sbehrens@giantdisaster.de>
2013-03-04Btrfs: allow running defrag in parallel to administrative tasksStefan Behrens
Commit 5ac00add added a testnset mutex and code that disallows running administrative tasks in parallel. It is prevented that the device add/delete/balance/replace/resize operations are started in parallel. By mistake, the defragmentation operation was included in the check for mutually exclusiveness as well. This is fixed with this commit. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: avoid deadlock on transaction waiting listLiu Bo
Only let one trans handle to wait for other handles, otherwise we will get ABBA issues. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: do not BUG_ON on aborted situationLiu Bo
Btrfs balance can easily hit BUG_ON in these places, but we want to it bail out gracefully after we force the whole filesystem to readonly. So we use btrfs_std_error hook in place of BUG_ON. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: do not BUG_ON in prepare_to_relocLiu Bo
We can bail out from here gracefully instead of a cold BUG_ON. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: free all recorded tree blocks on errorLiu Bo
We've missed the 'free blocks' part on ENOMEM error. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: build up error handling for merge_reloc_rootsLiu Bo
We first use btrfs_std_error hook to replace with BUG_ON, and we also need to cleanup what is left, including reloc roots rbtree and reloc roots list. Here we use a helper function to cleanup both rbtree and list, and since this function can also be used in the balance recover path, we also make the change as well to keep code simple. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: check for NULL pointer in updating reloc rootsLiu Bo
Add a check for NULL pointer to avoid invalid reference. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: fix unclosed transaction handler when the async transaction ↵Miao Xie
commitment fails If the async transaction commitment failed, we need close the current transaction handler, or the current transaction will be blocked to commit because of this orphan handler. We fix the problem by doing sync transaction commitment, that is to invoke btrfs_commit_transaction(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: fix wrong handle at error path of create_snapshot() when the commit failsMiao Xie
There are several bugs at error path of create_snapshot() when the transaction commitment failed. - access the freed transaction handler. At the end of the transaction commitment, the transaction handler was freed, so we should not access it after the transaction commitment. - we were not aware of the error which happened during the snapshot creation if we submitted a async transaction commitment. - pending snapshot access vs pending snapshot free. when something wrong happened after we submitted a async transaction commitment, the transaction committer would cleanup the pending snapshots and free them. But the snapshot creators were not aware of it, they would access the freed pending snapshots. This patch fixes the above problems by: - remove the dangerous code that accessed the freed handler - assign ->error if the error happens during the snapshot creation - the transaction committer doesn't free the pending snapshots, just assigns the error number and evicts them before we unblock the transaction. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04Btrfs: use set_nlink if our i_nlink is 0Josef Bacik
We need to inc the nlink of deleted entries when running replay so we can do the unlink on the fs_root and get everything cleaned up and then have the orphan cleanup do the right thing. The problem is inc_nlink complains about this, even thought it still does the right thing. So use set_nlink() if our i_nlink is 0 to keep users from seeing the warnings during log replay. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-03eCryptfs: allow userspace messaging to be disabledKees Cook
When the userspace messaging (for the less common case of userspace key wrap/unwrap via ecryptfsd) is not needed, allow eCryptfs to build with it removed. This saves on kernel code size and reduces potential attack surface by removing the /dev/ecryptfs node. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2013-03-04ext4: invalidate extent status tree during extent migrationDmitry Monakhov
mext_replace_branches() will change inode's extents layout so we have to drop corresponding cache. TESTCASE: 301'th xfstest was not yet accepted to official xfstest's branch and can be found here: https://github.com/dmonakhov/xfstests/commit/7b7efeee30a41109201e2040034e71db9b66ddc0 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2013-03-04ext4: remove unnecessary wait for extent conversion in ext4_fallocate()Jan Kara
Now that we don't merge uninitialized extents anymore, ext4_fallocate() is free to operate on the inode while there are still some extent conversions pending - it won't disturb them in any way. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-04ext4: add warning to ext4_convert_unwritten_extents_endioDmitry Monakhov
Splitting extents inside endio is a bad thing, but unfortunately it is still possible. In fact we are pretty close to the moment when all related issues will be fixed. Let's warn developer if it still the case. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2013-03-04ext4: disable merging of uninitialized extentsDmitry Monakhov
Derived from Jan's patch:http://permalink.gmane.org/gmane.comp.file-systems.ext4/36470 Merging of uninitialized extents creates all sorts of interesting race possibilities when writeback / DIO races with fallocate. Thus ext4_convert_unwritten_extents_endio() has to deal with a case where extent to be converted needs to be split out first. That isn't nice for two reasons: 1) It may need allocation of extent tree block so ENOSPC is possible. 2) It complicates end_io handling code So we disable merging of uninitialized extents which allows us to simplify the code. Extents will get merged after they are converted to initialized ones. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2013-03-04ext4: ext4_split_extent should take care of extent zerooutDmitry Monakhov
When ext4_split_extent_at() ends up doing zeroout & conversion to initialized instead of split & conversion, ext4_split_extent() gets confused and can wrongly mark the extent back as uninitialized resulting in end IO code getting confused from large unwritten extents and may result in data loss. The example of problematic behavior is: lblk len lblk len ext4_split_extent() (ex=[1000,30,uninit], map=[1010,10]) ext4_split_extent_at() (split [1000,30,uninit] at 1020) ext4_ext_insert_extent() -> ENOSPC ext4_ext_zeroout() -> extent [1000,30] is now initialized ext4_split_extent_at() (split [1000,30,init] at 1010, MARK_UNINIT1 | MARK_UNINIT2) -> extent is split and parts marked as uninitialized Fix the problem by rechecking extent type after the first ext4_split_extent_at() returns. None of split_flags can not be applied to initialized extent so this patch also add BUG_ON to prevent similar issues in future. TESTCASE: https://github.com/dmonakhov/xfstests/commit/b8a55eb5ce28c6ff29e620ab090902fcd5833597 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>