summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-05-09GFS2: When adding a new dir entry, inc link count if it is a subdirSteven Whitehouse
This adds an increment of the link count when we add a new directory entry, if that entry is itself a directory. This means that we no longer need separate code to perform this operation. Now that both adding and removing directory entries automatically update the parent directory's link count if required, that makes the code shorter and simpler than before. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09GFS2: Make gfs2_dir_del update link count when requiredSteven Whitehouse
When we remove an entry from a directory, we can save ourselves some trouble if we know the type of the entry in question, since if it is itself a directory, we can update the link count of the parent at the same time as removing the directory entry. In addition this patch also merges the rmdir and unlink code which was almost identical anyway. This eliminates the calls to remove the . and .. directory entries on each rmdir (not needed since the directory will be deallocated, anyway) which was the only thing preventing passing the dentry to gfs2_dir_del(). The passing of the dentry rather than just the name allows us to figure out the type of the entry which is being removed, and thus adjust the link count when required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09GFS2: Don't use gfs2_change_nlink in link syscallSteven Whitehouse
There are three users of gfs2_change_nlink which add to the link count. Two of these are about to be removed in later patches, so this means that there will no callers, when that happens allowing removal of that function, also in a later patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09ext4: remove unneeded ext4_journal_get_undo_accessTheodore Ts'o
The block allocation code used to use jbd2_journal_get_undo_access as a way to make changes that wouldn't show up until the commit took place. The new multi-block allocation code has a its own way of preventing newly freed blocks from getting reused until the commit takes place (it avoids updating the buddy bitmaps until the commit is done), so we don't need to use jbd2_journal_get_undo_access(), which has extra overhead compared to jbd2_journal_get_write_access(). There was one last vestigal use of ext4_journal_get_undo_access() in ext4_add_groupblocks(); change it to use ext4_journal_get_write_access() and then remove the ext4_journal_get_undo_access() support. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: move ext4_add_groupblocks() to mballoc.cAmir Goldstein
In preparation for the next patch, the function ext4_add_groupblocks() is moved to mballoc.c, where it could use some static functions. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: remove redundant #ifdef in super.cAmerigo Wang
There is already an #ifdef CONFIG_QUOTA some lines above, so this one is totally useless. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: remove redundant check for first_not_zeroed in ext4_register_li_requestTao Ma
We have checked first_not_zeroed == ngroups already above, so remove this redundant check. sbi->s_li_request = NULL above is also removed since it is NULL already. Cc: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: use s_inodes_per_block directly in __ext4_get_inode_locTao Ma
In __ext4_get_inode_loc, we calculate inodes_per_block every time by EXT4_BLOCK_SIZE(sb) / EXT4_INODE_SIZE(sb). AFAICS, this function is a hot path for ext4, so we'd better use s_inodes_per_block directly instead of calculating every time. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: use EXT4FS_DEBUG instead of EXT4_DEBUG in fsync.cTao Ma
We have EXT4FS_DEBUG for some old debug and CONFIG_EXT4_DEBUG for the new mballoc debug, but there isn't any EXT4_DEBUG. As CONFIG_EXT4_DEBUG seems to be only used in mballoc, use EXT4FS_DEBUG in fsync.c. [ It doesn't really matter; although I'm including this commit for consistency's sake. The whole point of the #ifdef's is to disable the debugging code. In general you're not going to want to enable all of the code protected by EXT4FS_DEBUG at the same time. -- Ted ] Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09fs: fixup warning part_discard_alignment_show()Jens Axboe
Stephen reports: ----- After merging the block tree, today's linux-next build (x86_64 allmodconfig) produced this warning: fs/partitions/check.c: In function 'part_discard_alignment_show': fs/partitions/check.c:263: warning: format '%u' expects type 'unsigned int', but argument 3 has type 'long long unsigned int' Introduced by commit ("block: Remove extra discard_alignment from hd_struct") ----- Fix it up by just removing the cast, we return an int already. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-08jbd2: only print the debugging information for tid wraparound onceTheodore Ts'o
If we somehow wrap, we don't want to keep printing the warning message over and over again. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-08jbd2: Fix forever sleeping process in do_get_write_access()Jan Kara
In do_get_write_access() we wait on BH_Unshadow bit for buffer to get from shadow state. The waking code in journal_commit_transaction() has a bug because it does not issue a memory barrier after the buffer is moved from the shadow state and before wake_up_bit() is called. Thus a waitqueue check can happen before the buffer is actually moved from the shadow state and waiting process may never be woken. Fix the problem by issuing proper barrier. Reported-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-06block: Remove extra discard_alignment from hd_struct.Tao Ma
Currently, hd_struct.discard_alignment is only used when we show /sys/block/sdx/sdx/discard_alignment. So remove it and calculate when it is asked to show. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: handle errors from coalesce_t2 cifs: refactor mid finding loop in cifs_demultiplex_thread cifs: sanitize length checking in coalesce_t2 (try #3) cifs: check for bytes_remaining going to zero in CIFS_SessSetup cifs: change bleft in decode_unicode_ssetup back to signed type
2011-05-06Validate size of EFI GUID partition entries.Timo Warns
Otherwise corrupted EFI partition tables can cause total confusion. Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-06btrfs: remove old unused commented out codeDavid Sterba
Remove code which has been #if0-ed out for a very long time and does not seem to be related to current codebase anymore. Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-06btrfs: remove all unused functionsDavid Sterba
Remove static and global declarations and/or definitions. Reduces size of btrfs.ko by ~3.4kB. text data bss dec hex filename 402081 7464 200 409745 64091 btrfs.ko.base 398620 7144 200 405964 631cc btrfs.ko.remove-all Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-05GFS2: Don't use a try lock when promoting to a higher modeSteven Whitehouse
Previously we marked all locks being promoted to a higher mode with the try flag to avoid any potential deadlocks issues. The DLM is able to detect these and report them in way that GFS2 can deal with them correctly. So we can just request the required mode and wait for a response without needing to perform this check. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-05GFS2: Double check link count under glockSteven Whitehouse
To avoid any possible races relating to the link count, we need to recheck it under the inode's glock in all cases where it matters. Also to ensure we never get any nasty surprises, this patch also ensures that once the link count has hit zero it can never be elevated by rereading in data from disk. The only place we cannot provide a proper solution is in rename in the case where we are removing a target inode and we discover that the target inode has been already unlinked on another node. The race window is very small, and we return EAGAIN in this case to indicate what has happened. The proper solution would be to move the lookup parts of rename from the vfs into library calls which the fs could call directly, but that is potentially a very big job and this fix should cover most cases for now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: do not call __mark_dirty_inode under i_lock libceph: fix ceph_osdc_alloc_request error checks ceph: handle ceph_osdc_new_request failure in ceph_writepages_start libceph: fix ceph_msg_new error path ceph: use ihold() when i_lock is held
2011-05-04ceph: do not call __mark_dirty_inode under i_lockSage Weil
The __mark_dirty_inode helper now takes i_lock as of 250df6ed. Fix the one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the flags value so that the callers can do it outside of i_lock. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-04btrfs: remove unused function prototypesDavid Sterba
function prototypes without a body Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-03logfs: initialize superblock entries earlierLinus Torvalds
In particular, s_freeing_list needs to be initialized early, since it is used on some of the error paths when mounts fail. The mapping inode, for example, would be initialized and then free'd on an error path before s_freeing_list was initialized, but the inode drop operation needs the s_freeing_list to be set up. Normally you'd never see this, because not only is logfs fairly rare, but a successful mount will never have any issues. Reported-by: werner <w.landgraf@ru.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-03ceph: handle ceph_osdc_new_request failure in ceph_writepages_startHenry C Chang
We should unlock the page and return -ENOMEM if ceph_osdc_new_request failed. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-03ceph: use ihold() when i_lock is heldSage Weil
See 0444d76ae64fffc7851797fc1b6ebdbb44ac504a. Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-03ext4: reimplement convert and split_unwrittenYongqiang Yang
Reimplement ext4_ext_convert_to_initialized() and ext4_split_unwritten_extents() using ext4_split_extent() Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
2011-05-03ext4: add ext4_split_extent_at() and ext4_split_extent()Yongqiang Yang
Add two functions: ext4_split_extent_at(), which splits an extent into two extents at given logical block, and ext4_split_extent() which splits an extent into three extents. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
2011-05-03ext4: add a function merging extents right and leftYongqiang Yang
1) Rename ext4_ext_try_to_merge() to ext4_ext_try_to_merge_right(). 2) Add a new function ext4_ext_try_to_merge() which tries to merge an extent both left and right. 3) Use the new function in ext4_ext_convert_unwritten_endio() and ext4_ext_insert_extent(). Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
2011-05-03ext4: fix deadlock in ext4_symlink() in ENOSPC conditionsJan Kara
ext4_symlink() cannot call __page_symlink() with transaction open. __page_symlink() calls ext4_write_begin() which can wait for transaction commit if we are running out of space thus causing a deadlock. Also error recovery in ext4_truncate_failed_write() does not count with the transaction being already started (although I'm not aware of any particular deadlock here). Fix the problem by stopping a transaction before calling __page_symlink() (we have to be careful and put inode to orphan list so that it gets deleted in case of crash) and starting another one after __page_symlink() returns for addition of symlink into a directory. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-03ext4: Fix fs corruption when make_indexed_dir() failsJan Kara
When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated block for index tree root, we did not properly mark all changed buffers dirty. This lead to only some of these buffers being written out and thus effectively corrupting the directory. Fix the issue by marking all changed data dirty even in the error failure case. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-03ext4: set extents flag when migrating file to use extentsTheodore Ts'o
Fix a typo that was introduced in commit 07a038245b (in 2.6.36) which caused the extents flag not to be set at the conclusion of converting an inode to use extents. Reported-by: Peter Uchno <peter.uchno@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-03GFS2: Improve bug trap code in ->releasepage()Steven Whitehouse
If the buffer is dirty or pinned, then as well as printing a warning, we should also refuse to release the page in question. Currently this can occur if there is a race between mmap()ed writers and O_DIRECT on the same file. With the addition of ->launder_page() in the future, we should be able to close this gap. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-03GFS2: Fix ail list traversalSteven Whitehouse
In the recent patches to update the AIL list code, I managed to forget that the ail list lock got dropped, even though I added a comment specifically to remind myself :( Reported-by: Barry Marson <bmarson@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-03GFS2: make sure fallocate bytes is a multiple of blksizeBenjamin Marzinski
The GFS2 fallocate code chooses a target size to for allocating chunks of space. Whenever it can't find any resource groups with enough space free, it halves its target. Since this target is in bytes, eventually it will no longer be a multiple of blksize. As long as there is more space available in the resource group than the target, this isn't a problem, since gfs2 will use the actual space available, which is always a multiple of blksize. However, when gfs couldn't fallocate a bigger chunk than the target, it was using the non-blksize aligned number. This caused a BUG in later code that required blksize aligned offsets. GFS2 now ensures that bytes is always a multiple of blksize Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-03cifs: handle errors from coalesce_t2Jeff Layton
cifs_demultiplex_thread calls coalesce_t2 to try and merge follow-on t2 responses into the original mid buffer. coalesce_t2 however can return errors, but the caller doesn't handle that situation properly. Fix the thread to treat such a case as it would a malformed packet. Mark the mid as being malformed and issue the callback. Cc: stable@kernel.org Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-05-03cifs: refactor mid finding loop in cifs_demultiplex_threadJeff Layton
...to reduce the extreme indentation. This should introduce no behavioral changes. Cc: stable@kernel.org Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-05-03CRED: Fix load_flat_shared_library() to initialise bprm correctlyDavid Howells
Fix binfmt_flag's load_flat_shared_library() to initialise bprm correctly. Currently, prepare_binprm() is called with only .filename .file and .cred fields set in bprm, but the .cred_prepared and .per_clear fields at least need initialising. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-05-02timerfd: Allow timers to be cancelled when clock was setThomas Gleixner
Some applications must be aware of clock realtime being set backward. A simple example is a clock applet which arms a timer for the next minute display. If clock realtime is set backward then the applet displays a stale time for the amount of time which the clock was set backwards. Due to that applications poll the time because we don't have an interface. Extend the timerfd interface by adding a flag which puts the timer onto a different internal realtime clock. All timers on this clock are expired whenever the clock was set. The timerfd core records the monotonic offset when the timer is created. When the timer is armed, then the current offset is compared to the previous recorded offset. When it has changed, then timerfd_settime returns -ECANCELED. When a timer is read the offset is compared and if it changed -ECANCELED returned to user space. Periodic timers are not rearmed in the cancelation case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Chris Friesen <chris.friesen@genband.com> Tested-by: Kay Sievers <kay.sievers@vrfy.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davide Libenzi <davidel@xmailserver.org> Reviewed-by: Alexander Shishkin <virtuoso@slind.org> Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104271359580.3323%40ionos%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-02Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6Linus Torvalds
* 'for-linus' of git://git.infradead.org/ubifs-2.6: UBIFS: seek journal heads to the latest bud in replay UBIFS: do not free write-buffers when in R/O mode
2011-05-02UBIFS: seek journal heads to the latest bud in replayArtem Bityutskiy
This is the second fix of the following symptom: UBIFS error (pid 34456): could not find an empty LEB which sometimes happens after power cuts when we mount the file-system - UBIFS refuses it with the above error message which comes from the 'ubifs_rcvry_gc_commit()' function. I can reproduce this using the integck test with the UBIFS power cut emulation enabled. Analysis of the problem. Currently UBIFS replay seeks the journal heads to the last _replayed_ bud. But the buds are replayed out-of-order, so the replay basically seeks journal heads to the "random" bud belonging to this head, and not to the _last_ one. The result of this is that the GC head may be seeked to a full LEB with no free space, or very little free space. And 'ubifs_rcvry_gc_commit()' tries to find a fully or mostly dirty LEB to match the current GC head (because we need to garbage-collect that dirty LEB at one go, because we do not have @c->gc_lnum). So 'ubifs_find_dirty_leb()' fails and we fall back to finding an empty LEB and also fail. As a result - recovery fails and mounting fails. This patch teaches the replay to initialize the GC heads exactly to the latest buds, i.e. the buds which have the largest sequence number in corresponding log reference nodes. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org
2011-05-02UBIFS: do not free write-buffers when in R/O modeArtem Bityutskiy
Currently UBIFS has a small optimization - it frees write-buffers when it is re-mounted from R/W mode to R/O mode. Of course, when it is mounted R/O, it does not allocate write-buffers as well. This optimization is nice but it leads to subtle problems and complications in recovery, which I can reproduce using the integck test. The symptoms are that after a power cut the file-system cannot be mounted if we first mount it R/O, and then re-mount R/W - 'ubifs_rcvry_gc_commit()' prints: UBIFS error (pid 34456): could not find an empty LEB Analysis of the problem. When mounting R/W, the reply process sets journal heads to buds [1], but when mounting R/O - it does not do this, because the write-buffers are not allocated. So 'ubifs_rcvry_gc_commit()' works completely differently for the same file-system but for the following 2 cases: 1. mounting R/W after a power cut and recover 2. mounting R/O after a power cut, re-mounting R/W and run deferred recovery In the former case, we have journal heads seeked to the a bud, in the latter case, they are non-seeked (wbuf->lnum == -1). So in the latter case we do not try to recover the GC LEB by garbage-collecting to the GC head, but we just try to find an empty LEB, and there may be no empty LEBs, so we just fail. On the other hand, in the former case (mount R/W), we are able to make a GC LEB (@c->gc_lnum) by garbage-collecting. Thus, let's remove this small nice optimization and always allocate write-buffers. This should not make too big difference - we have only 3 of them, each of max. write unit size, which is usually 2KiB. So this is about 6KiB of RAM for the typical case, and only when mounted R/O. [1]: Note, currently the replay process is setting (seeking) the journal heads to _some_ buds, not necessarily to the buds which had been the journal heads before the power cut happened. This will be fixed separately. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org
2011-05-02btrfs: Document a mutex lock/unlock sequenceDavid Sterba
2011-05-02btrfs: drop unused parameter from btrfs_release_pathDavid Sterba
parameter tree root it's not used since commit 5f39d397dfbe140a14edecd4e73c34ce23c4f9ee ("Btrfs: Create extent_buffer interface for large blocksizes") Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-02btrfs: drop gfp parameter from alloc_extent_bufferDavid Sterba
pass GFP_NOFS directly to kmem_cache_alloc Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-02btrfs: drop gfp parameter from find_extent_bufferDavid Sterba
pass GFP_NOFS directly to kmem_cache_alloc Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-02btrfs: drop gfp parameter from alloc_extent_mapDavid Sterba
pass GFP_NOFS directly to kmem_cache_alloc Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-02btrfs: drop unused parameter from extent_map_tree_initDavid Sterba
the GFP flags are not stored anywhere and all allocations are done via alloc_extent_map(GFP_NOFS). Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-02btrfs: drop unused argument from extent_io_tree_initDavid Sterba
all callers pass GFP_NOFS, but the GFP mask argument is not used in the function; GFP_ATOMIC is passed to radix tree initialization and it's the only correct one, since we're using the preload/insert mechanism of radix tree. Let's drop the gfp mask from btrfs function, this will not change behaviour. Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-02btrfs: make functions static when possibleDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-02btrfs: unify checking of IS_ERR and nullDavid Sterba
use IS_ERR_OR_NULL when possible, done by this coccinelle script: @ match @ identifier id; @@ ( - BUG_ON(IS_ERR(id) || !id); + BUG_ON(IS_ERR_OR_NULL(id)); | - IS_ERR(id) || !id + IS_ERR_OR_NULL(id) | - !id || IS_ERR(id) + IS_ERR_OR_NULL(id) ) Signed-off-by: David Sterba <dsterba@suse.cz>