summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-01-16xfs: remove int casts from debug dquot soft limit timer assertsBrian Foster
The int casts here make it easy to trigger an assert with a large soft limit. For example, set a >4TB soft limit on an empty volume to reproduce a (0 > -x) comparison due to an overflow of d_blk_softlimit. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16xfs: fix the multi-segment log buffer formatMark Tinguely
Per Dave Chinner suggestion, this patch: 1) Corrects the detection of whether a multi-segment buffer is still tracking data. 2) Clears all the buffer log formats for a multi-segment buffer. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16xfs: fix segment in xfs_buf_item_format_segmentMark Tinguely
Not every segment in a multi-segment buffer is dirty in a transaction and they will not be outputted. The assert in xfs_buf_item_format_segment() that checks for the at least one chunk of data in the segment to be used is not necessary true for multi-segmented buffers. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16xfs: rename bli_format to avoid confusion with bli_formatsMark Tinguely
Rename the bli_format structure to __bli_format to avoid accidently confusing them with the bli_formats pointer. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16xfs: use b_maps[] for discontiguous buffersMark Tinguely
Commits starting at 77c1a08 introduced a multiple segment support to xfs_buf. xfs_trans_buf_item_match() could not find a multi-segment buffer in the transaction because it was looking at the single segment block number rather than the multi-segment b_maps[0].bm.bn. This results on a recursive buffer lock that can never be satisfied. This patch: 1) Changed the remaining b_map accesses to be b_maps[0] accesses. 2) Renames the single segment b_map structure to __b_map to avoid future confusion. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext3 and udf fixes from Jan Kara: "One ext3 performance regression fix and one udf regression fix (oops on interrupted mount)." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: UDF: Fix a null pointer dereference in udf_sb_free_partitions jbd: don't wake kjournald unnecessarily
2013-01-15time: create __getnstimeofday for WARNless callsKees Cook
The pstore RAM backend can get called during resume, and must be defensive against a suspended time source. Expose getnstimeofday logic that returns an error instead of a WARN. This can be detected and the timestamp can be zeroed out. Reported-by: Doug Anderson <dianders@chromium.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-01-15UBIFS: rename random32() to prandom_u32()Akinobu Mita
Use more preferable function name which implies using a pseudo-random number generator. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2013-01-15f2fs: fix the debugfs entry creation pathNamjae Jeon
As the "status" debugfs entry will be maintained for entire F2FS filesystem irrespective of the number of partitions. So, we can move the initialization to the init part of the f2fs and destroy will be done from exit part. After making changes, for individual partition mount - entry creation code will not be executed. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-15f2fs: add global mutex_lock to protect f2fs_stat_listmajianpeng
There is an race condition between umounting f2fs and reading f2fs/status, which results in oops. Fox example: Thread A Thread B umount f2fs cat f2fs/status f2fs_destroy_stats() { stat_show() { list_for_each_entry_safe(&f2fs_stat_list) list_del(&si->stat_list); mutex_lock(&si->stat_lock); si->sbi = NULL; mutex_unlock(&si->stat_lock); kfree(sbi->stat_info); } mutex_lock(&si->stat_lock) <- si is gone. ... } Solution with a global lock: f2fs_stat_mutex: Thread A Thread B umount f2fs cat f2fs/status f2fs_destroy_stats() { stat_show() { mutex_lock(&f2fs_stat_mutex); list_del(&si->stat_list); mutex_unlock(&f2fs_stat_mutex); kfree(sbi->stat_info); mutex_lock(&f2fs_stat_mutex); } list_for_each_entry_safe(&f2fs_stat_list) ... mutex_unlock(&f2fs_stat_mutex); } Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> [jaegeuk.kim@samsung.com: fix typos, description, and remove the existing lock] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-15f2fs: remove the blk_plug usage in f2fs_write_data_pagesNamjae Jeon
Let's consider the usage of blk_plug in f2fs_write_data_pages(). We can come up with the two issues: lock contention and task awareness. 1. Merging bios prior to grabing "queue lock" The f2fs merges consecutive IOs in the file system level before submitting any bios, which is similar with the back merge by the plugging mechanism in attempt_plug_merge(). Both of them need to acquire no queue lock. 2. Merging policy with respect to tasks The f2fs merges IOs as much as possible regardless of tasks, while blk-plugging is conducted on a basis of tasks. As we can understand there are trade-offs, f2fs tries to maximize the write performance with well-merged bios. As a result, if f2fs produces many consecutive but separated bios in writepages(), it would be good to use blk-plugging since f2fs would be able to avoid queue lock contention in the block layer by merging them. But, f2fs merges IOs and submit one bio, which means that there are not much chances to merge bios by attempt_plug_merge(). However, f2fs has already been used blk_plug by triggering generic_writepages() in f2fs_write_data_pages(). So to make the overall code consistency, I'd like to remove blk_plug there. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-14UDF: Fix a null pointer dereference in udf_sb_free_partitionsNamjae Jeon
This patch fixes a regression caused by commit bff943af6fe "udf: Fix memory leak when mounting" due to which it was triggering a kernel null point dereference in case of interrupted mount OR when allocating memory to sbi->s_partmaps failed in function udf_sb_alloc_partition_maps. Reported-and-tested-by: James Hogan <james@albanarts.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-14jbd: don't wake kjournald unnecessarilyEric Sandeen
Don't send an extra wakeup to kjournald in the case where we already have the proper target in j_commit_request, i.e. that commit has already been requested for commit. commit d9b0193 "jbd: fix fsync() tid wraparound bug" changed the logic leading to a wakeup, but it caused some extra wakeups which were found to lead to a measurable performance regression. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-14vfs: add missing virtual cache flush after editing partial pagesLinus Torvalds
Andrew Morton pointed this out a month ago, and then I completely forgot about it. If we read a partial last page of a block device, we will zero out the end of the page, but since that page can then be mapped into user space, we should also make sure to flush the cache on architectures that have virtual caches. We have the flush_dcache_page() function for this, so use it. Now, in practice this really never matters, because nobody sane uses virtual caches to begin with, and they largely exist on old broken RISC arhitectures. And even if you did run on one of those obsolete CPU's, the whole "mmap and access the last partial page of a block device" behavior probably doesn't actually exist. The normal IO functions (read/write) will never see the zeroed-out part of the page that migth not be coherent in the cache, because they honor the size of the device. So I'm marking this for stable (3.7 only), but I'm not sure anybody will ever care. Pointed-out-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org # 3.7 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-14btrfs: update timestamps on truncate()Eric Sandeen
truncate() vs. ftruncate() differ in the VFS; truncate() doesn't set (ATTR_CTIME | ATTR_MTIME), and it's up to the fs to do the timestamp updates if the size changes. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
2013-01-14btrfs: fix btrfs_cont_expand() freeing IS_ERR emZach Brown
btrfs_cont_expand() tries to free an IS_ERR em as it gets an error from btrfs_get_extent() and breaks out of its loop. An instance of -EEXIST was reported in the wild: https://bugzilla.redhat.com/show_bug.cgi?id=874407 I have no idea if that -EEXIST is surprising, or not. Regardless, this error handling should be cleaned up to handle other reasonable errors (ENOMEM, EIO; whatever). This seemed to be the only buggy freeing of the relatively rare IS_ERR em so I opted to fix the caller rather than teach free_extent_map() to use IS_ERR_OR_NULL(). Signed-off-by: Zach Brown <zab@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
2013-01-14Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extentsLiu Bo
xfstests case 285 complains. It it because btrfs did not try to find unwritten delalloc bytes(only dirty pages, not yet writeback) behind prealloc extents, it ends up finding nothing while we're with SEEK_DATA. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14Btrfs: fix off-by-one in lseekLiu Bo
Lock end is inclusive. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14Btrfs: reset path lock state to zeroLiu Bo
We forgot to reset the path lock state to zero after we unlock the path block, and this can lead to the ASSERT checker in tree unlock API. Reported-by: Slava Barinov <rayslava@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14Btrfs: let allocation start from the right raid typeLiu Bo
This'd avoid us empty looping. Say we have only one disk and the metadata raid type will be defaultly DUP, and we do not need to start from index=0(RAID10) and get over two empty loops to index=2(DUP). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14Btrfs: add orphan before truncating pagecacheJosef Bacik
Running xfstests 83 in a loop would sometimes fail the fsck. This happens because if we invalidate a page that already has an ordered extent setup for it we will complete the ordered extent ourselves, assuming that the truncate will clean everything up. The problem with this is there is plenty of time for the truncate to fail after we've done this work. So to fix this we need to add the orphan item first to make sure the cleanup gets done properly, and then we can truncate the pagecache and all that stuff and be safe. This fixes the btrfsck failures I was seeing while running 83 in a loop. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14Btrfs: set flushing if we're limited flushingJosef Bacik
We still need to say we're flushing if we're limit flushing to keep somebody from coming in and stealing our reservation. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14Btrfs: fix missing write access release in btrfs_ioctl_resize()Miao Xie
We forget to give up the write access after we find some device operation is going on. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14Btrfs: fix resize a readonly deviceMiao Xie
We should not resize a readonly device, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14Btrfs: do not delete a subvolume which is in a R/O subvolumeMiao Xie
Step to reproduce: # mkfs.btrfs <disk> # mount <disk> <mnt> # btrfs sub create <mnt>/subv0 # btrfs sub snap <mnt> <mnt>/subv0/snap0 # change <mnt>/subv0 from R/W to R/O # btrfs sub del <mnt>/subv0/snap0 We deleted the snapshot successfully. I think we should not be able to delete the snapshot since the parent subvolume is R/O. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2013-01-14Btrfs: disable qgroup id 0Miao Xie
Qgroup id 0 is a special number, we should set the id of a qgroup to 0. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2013-01-14btrfs: get the device in write mode when deleting itLukas Czerner
When we're deleting the device we should get it in write mode since we're going to re-write the super block magic on that device. And it should fail if the device is read-only. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2013-01-14Btrfs: fix memory leak in name_cache_insert()Tsutomu Itoh
We should free name_cache_entry before returning from the error handling code. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2013-01-14Merge tag 'driver-core-3.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg Kroah-Hartman: "Here are two patches for 3.8-rc3. One removes the __dev* defines from init.h now that all usages of it are gone from your tree. The other fix is for debugfs's paramater that was using the wrong base for the option. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'driver-core-3.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: debugfs: convert gid= argument from decimal, not octal Remove __dev* markings from init.h
2013-01-14writeback: add more tracepointsTejun Heo
Add tracepoints for page dirtying, writeback_single_inode start, inode dirtying and writeback. For the latter two inode events, a pair of events are defined to denote start and end of the operations (the starting one has _start suffix and the one w/o suffix happens after the operation is complete). These inode ops are FS specific and can be non-trivial and having enclosing tracepoints is useful for external tracers. This is part of tracepoint additions to improve visiblity into dirtying / writeback operations for io tracer and userland. v2: writeback_dirty_inode[_start] TPs may be called for files on pseudo FSes w/ unregistered bdi. Check whether bdi->dev is %NULL before dereferencing. v3: buffer dirtying moved to a block TP. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-01-14block: add block_{touch|dirty}_buffer tracepointTejun Heo
The former is triggered from touch_buffer() and the latter mark_buffer_dirty(). This is part of tracepoint additions to improve visiblity into dirtying / writeback operations for io tracer and userland. v2: Transformed writeback_dirty_buffer to block_dirty_buffer and made it share TP definition with block_touch_buffer. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-01-14buffer: make touch_buffer() an exported functionTejun Heo
We want to add a trace point to touch_buffer() but macros and inline functions defined in header files can't have tracing points. Move touch_buffer() to fs/buffer.c and make it a proper function. The new exported function is also declared inline. As most uses of touch_buffer() are inside buffer.c with nilfs2 as the only other user, the effect of this change should be negligible. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-01-14block: add missing block_bio_complete() tracepointTejun Heo
bio completion didn't kick block_bio_complete TP. Only dm was explicitly triggering the TP on IO completion. This makes block_bio_complete TP useless for tracers which want to know about bios, and all other bio based drivers skip generating blktrace completion events. This patch makes all bio completions via bio_endio() generate block_bio_complete TP. * Explicit trace_block_bio_complete() invocation removed from dm and the trace point is unexported. * @rq dropped from trace_block_bio_complete(). bios may fly around w/o queue associated. Verifying and accessing the assocaited queue belongs to TP probes. * blktrace now gets both request and bio completions. Make it ignore bio completions if request completion path is happening. This makes all bio based drivers generate blktrace completion events properly and makes the block_bio_complete TP actually useful. v2: With this change, block_bio_complete TP could be invoked on sg commands which have bio's with %NULL bi_bdev. Update TP assignment code to check whether bio->bi_bdev is %NULL before dereferencing. Signed-off-by: Tejun Heo <tj@kernel.org> Original-patch-by: Namhyung Kim <namhyung@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-01-14f2fs: avoid redundant time update for parent directory in f2fs_delete_entryNamjae Jeon
In call to f2fs_delete_entry, 'dir' time modification code is put at two places. So, remove the redundant code for timing update. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-14f2fs: remove redundant call to set_blocksize in f2fs_fill_superNamjae Jeon
Since, f2fs supports only 4KB blocksize, which is set at the beginning in f2fs_fill_super. So, we do not need to again check this blocksize setting in such case. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-13fs/xfs remove obsolete simple_strto<foo>Abhijit Pawar
This patch replaces usages of obsolete simple_strtoul with kstrtoint in xfs_args and suffix_strtoul. Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-13xfs: recalculate leaf entry pointer after compacting a dir2 blockEric Sandeen
Dave Jones hit this assert when doing a compile on recent git, with CONFIG_XFS_DEBUG enabled: XFS: Assertion failed: (char *)dup - (char *)hdr == be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)), file: fs/xfs/xfs_dir2_data.c, line: 828 Upon further digging, the tag found by xfs_dir2_data_unused_tag_p(dup) contained "2" and not the proper offset, and I found that this value was changed after the memmoves under "Use a stale leaf for our new entry." in xfs_dir2_block_addname(), i.e. memmove(&blp[mid + 1], &blp[mid], (highstale - mid) * sizeof(*blp)); overwrote it. What has happened is that the previous call to xfs_dir2_block_compact() has rearranged things; it changes btp->count as well as the blp array. So after we make that call, we must recalculate the proper pointer to the leaf entries by making another call to xfs_dir2_block_leaf_p(). Dave provided a metadump image which led to a simple reproducer (create a particular filename in the affected directory) and this resolves the testcase as well as the bug on his live system. Thanks also to dchinner for looking at this one with me. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Dave Jones <davej@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-13ext4: trigger the lazy inode table initialization after resizeTheodore Ts'o
After we have finished extending the file system, we need to trigger a the lazy inode table thread to zero out the inode tables. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-12ext4: check bh in ext4_read_block_bitmap()Eryu Guan
Validate the bh pointer before using it, since ext4_read_block_bitmap_nowait() might return NULL. I've seen this in fsfuzz testing. EXT4-fs error (device loop0): ext4_read_block_bitmap_nowait:385: comm touch: Cannot get buffer for block bitmap - block_group = 0, block_bitmap = 3925999616 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8121de25>] ext4_wait_block_bitmap+0x25/0xe0 ... Call Trace: [<ffffffff8121e1e5>] ext4_read_block_bitmap+0x35/0x60 [<ffffffff8125e9c6>] ext4_free_blocks+0x236/0xb80 [<ffffffff811d0d36>] ? __getblk+0x36/0x70 [<ffffffff811d0a5f>] ? __find_get_block+0x8f/0x210 [<ffffffff81191ef3>] ? kmem_cache_free+0x33/0x140 [<ffffffff812678e5>] ext4_xattr_release_block+0x1b5/0x1d0 [<ffffffff812679be>] ext4_xattr_delete_inode+0xbe/0x100 [<ffffffff81222a7c>] ext4_free_inode+0x7c/0x4d0 [<ffffffff812277b8>] ? ext4_mark_inode_dirty+0x88/0x230 [<ffffffff8122993c>] ext4_evict_inode+0x32c/0x490 [<ffffffff811b8cd7>] evict+0xa7/0x1c0 [<ffffffff811b8ed3>] iput_final+0xe3/0x170 [<ffffffff811b8f9e>] iput+0x3e/0x50 [<ffffffff812316fd>] ext4_add_nondir+0x4d/0x90 [<ffffffff81231d0b>] ext4_create+0xeb/0x170 [<ffffffff811aae9c>] vfs_create+0xac/0xd0 [<ffffffff811ac845>] lookup_open+0x185/0x1c0 [<ffffffff8129e3b9>] ? selinux_inode_permission+0xa9/0x170 [<ffffffff811acb54>] do_last+0x2d4/0x7a0 [<ffffffff811af743>] path_openat+0xb3/0x480 [<ffffffff8116a8a1>] ? handle_mm_fault+0x251/0x3b0 [<ffffffff811afc49>] do_filp_open+0x49/0xa0 [<ffffffff811bbaad>] ? __alloc_fd+0xdd/0x150 [<ffffffff8119da28>] do_sys_open+0x108/0x1f0 [<ffffffff8119db51>] sys_open+0x21/0x30 [<ffffffff81618959>] system_call_fastpath+0x16/0x1b Also fix comment for ext4_read_block_bitmap_nowait() Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-01-12ext4: use unlikely to improve the efficiency of the kernelWang Shilong
Because the function 'sb_getblk' seldomly fails to return NULL value,it will be better to use 'unlikely' to optimize it. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-12ext4: return ENOMEM if sb_getblk() failsTheodore Ts'o
The only reason for sb_getblk() failing is if it can't allocate the buffer_head. So ENOMEM is more appropriate than EIO. In addition, make sure that the file system is marked as being inconsistent if sb_getblk() fails. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-01-12vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and rename themMiao Xie
writeback_inodes_sb(_nr)_if_idle() is re-implemented by replacing down_read() with down_read_trylock() because - If ->s_umount is write locked, then the sb is not idle. That is writeback_inodes_sb(_nr)_if_idle() needn't wait for the lock. - writeback_inodes_sb(_nr)_if_idle() grabs s_umount lock when it want to start writeback, it may bring us deadlock problem when doing umount. In order to fix the problem, ext4 and btrfs implemented their own writeback functions instead of writeback_inodes_sb(_nr)_if_idle(), but it introduced the redundant code, it is better to implement a new writeback_inodes_sb(_nr)_if_idle(). The name of these two functions is cumbersome, so rename them to try_to_writeback_inodes_sb(_nr). This idea came from Christoph Hellwig. Some code is from the patch of Kamal Mostafa. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2013-01-11fs/exec.c: work around icc miscompilationXi Wang
The tricky problem is this check: if (i++ >= max) icc (mis)optimizes this check as: if (++i > max) The check now becomes a no-op since max is MAX_ARG_STRINGS (0x7FFFFFFF). This is "allowed" by the C standard, assuming i++ never overflows, because signed integer overflow is undefined behavior. This optimization effectively reverts the previous commit 362e6663ef23 ("exec.c, compat.c: fix count(), compat_count() bounds checking") that tries to fix the check. This patch simply moves ++ after the check. Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11fs/xfs: remove depends on CONFIG_EXPERIMENTALKees Cook
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Ben Myers <bpm@sgi.com> CC: Alex Elder <elder@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Ben Myers <bpm@sgi.com>
2013-01-11fs/nilfs2: remove depends on CONFIG_EXPERIMENTALKees Cook
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2013-01-11fs/ecryptfs: remove depends on CONFIG_EXPERIMENTALKees Cook
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Tyler Hicks <tyhicks@canonical.com> CC: Dustin Kirkland <dustin.kirkland@gazzang.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Tyler Hicks <tyhicks@canonical.com>
2013-01-11fs/ceph: remove depends on CONFIG_EXPERIMENTALKees Cook
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Sage Weil <sage@inktank.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Sage Weil <sage@inktank.com>
2013-01-11pstore: Avoid deadlock in panic and emergency-restart pathSeiji Aguchi
[Issue] When pstore is in panic and emergency-restart paths, it may be blocked in those paths because it simply takes spin_lock. This is an example scenario which pstore may hang up in a panic path: - cpuA grabs psinfo->buf_lock - cpuB panics and calls smp_send_stop - smp_send_stop sends IRQ to cpuA - after 1 second, cpuB gives up on cpuA and sends an NMI instead - cpuA is now in an NMI handler while still holding buf_lock - cpuB is deadlocked This case may happen if a firmware has a bug and cpuA is stuck talking with it more than one second. Also, this is a similar scenario in an emergency-restart path: - cpuA grabs psinfo->buf_lock and stucks in a firmware - cpuB kicks emergency-restart via either sysrq-b or hangcheck timer. And then, cpuB is deadlocked by taking psinfo->buf_lock again. [Solution] This patch avoids the deadlocking issues in both panic and emergency_restart paths by introducing a function, is_non_blocking_path(), to check if a cpu can be blocked in current path. With this patch, pstore is not blocked even if another cpu has taken a spin_lock, in those paths by changing from spin_lock_irqsave to spin_trylock_irqsave. In addition, according to a comment of emergency_restart() in kernel/sys.c, spin_lock shouldn't be taken in an emergency_restart path to avoid deadlock. This patch fits the comment below. <snip> /** * emergency_restart - reboot the system * * Without shutting down any hardware or taking any locks * reboot the system. This is called when we know we are in * trouble so this is our best effort to reboot. This is * safe to call in interrupt context. */ void emergency_restart(void) <snip> Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-01-11debugfs: convert gid= argument from decimal, not octalDave Reisner
This patch technically breaks userspace, but I suspect that anyone who actually used this flag would have encountered this brokenness, declared it lunacy, and already sent a patch. Signed-off-by: Dave Reisner <dreisner@archlinux.org> Reviewed-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11f2fs: move f2fs_balance_fs to punch_holeJaegeuk Kim
The f2fs_fallocate() has two operations: punch_hole and expand_size. Only in the case of punch_hole, dirty node pages can be produced, so let's trigger f2fs_balance_fs() in this case only. Furthermore, let's trigger it at every data truncation routine. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>