summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2014-04-23locks: rename FL_FILE_PVT and IS_FILE_PVT to use "*_OFDLCK" insteadJeff Layton
File-private locks have been re-christened as "open file description" locks. Finish the symbol name cleanup in the internal implementation. Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-04-23xfs: add filestream allocator tracepointsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23xfs: remove xfs_filestream_associateChristoph Hellwig
There is no good reason to create a filestream when a directory entry is created. Delay it until the first allocation happens to simply the code and reduce the amount of mru cache lookups we do. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23xfs: don't create a slab cache for filestream itemsChristoph Hellwig
We only have very few of these around, and allocation isn't that much of a hot path. Remove the slab cache to simplify the code, and to not waste any resources for the usual case of not having any inodes that use the filestream allocator. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23xfs: rewrite the filestream allocator using the dentry cacheChristoph Hellwig
In Linux we will always be able to find a parent inode for file that are undergoing I/O. Use this to simply the file stream allocator by only keeping track of parent inodes. Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-04-23xfs: remove XFS_IFILESTREAMChristoph Hellwig
We never test the flag except in xfs_inode_is_filestream, but that function already tests the on-disk flag or filesystem wide flags, and is used to decide if we want to set XFS_IFILESTREAM in the first place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23xfs: embedd mru_elem into parent structureChristoph Hellwig
There is no need to do a separate allocation for each mru element, just embedd the structure into the parent one in the user. Besides saving a memory allocation and the infrastructure required for it this also simplifies the API. While we do major surgery on xfs_mru_cache.c also de-typedef it and make struct mru_cache private to the implementation file. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23xfs: handle duplicate entries in xfs_mru_cache_insertChristoph Hellwig
The radix tree code can detect and reject duplicate keys at insert time. Make xfs_mru_cache_insert handle this case so that future changes to the filestream allocator can take advantage of this. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23xfs: split xfs_bmap_btalloc_nullfbChristoph Hellwig
Split xfs_bmap_btalloc_nullfb into one function for filestream allocations and one for everything else that share a few helpers. This dramatically simplifies the control flow. Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-04-22fs/bio.c: remove nr_segs (unused function parameter)Fabian Frederick
nr_segs is no longer used in bio_alloc_map_data since c8db444820a1e3 ("block: Don't save/copy bvec array anymore") Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-22fs/bio: remove bs paramater in biovec_create_poolFabian Frederick
bs is no longer used in biovec_create_pool since 9f060e2231ca96 ("block: Convert integrity to bvec_alloc_bs()") Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-22fs/aio.c: Remove ctx parameter in kiocb_cancelFabian Frederick
ctx is no longer used in kiocb_cancel since 57282d8fd74407 ("aio: Kill ki_users") Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2014-04-22locks: rename file-private locks to "open file description locks"Jeff Layton
File-private locks have been merged into Linux for v3.15, and *now* people are commenting that the name and macro definitions for the new file-private locks suck. ...and I can't even disagree. The names and command macros do suck. We're going to have to live with these for a long time, so it's important that we be happy with the names before we're stuck with them. The consensus on the lists so far is that they should be rechristened as "open file description locks". The name isn't a big deal for the kernel, but the command macros are not visually distinct enough from the traditional POSIX lock macros. The glibc and documentation folks are recommending that we change them to look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a programmer will typo one of the commands wrong, and also makes it easier to spot this difference when reading code. This patch makes the following changes that I think are necessary before v3.15 ships: 1) rename the command macros to their new names. These end up in the uapi headers and so are part of the external-facing API. It turns out that glibc doesn't actually use the fcntl.h uapi header, but it's hard to be sure that something else won't. Changing it now is safest. 2) make the the /proc/locks output display these as type "OFDLCK" Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Carlos O'Donell <carlos@redhat.com> Cc: Stefan Metzmacher <metze@samba.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Frank Filz <ffilzlnx@mindspring.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-04-21ext4: remove obsoleted checkDmitry Monakhov
BH can not be NULL at this point, ext4_read_dirblock() always return non null value, and we already have done all necessery checks. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-04-21ext4: add a new spinlock i_raw_lock to protect the ext4's raw inodeTheodore Ts'o
To avoid potential data races, use a spinlock which protects the raw (on-disk) inode. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2014-04-21ext4: fix locking for O_APPEND writesTheodore Ts'o
Al Viro pointed out that locking for O_APPEND writes was problematic, since the location of the write isn't known until after we take the i_mutex, which impacts the ext4_unaligned_aio() and s_bitmap_maxbytes check. For O_APPEND always assume that the write is unaligned so call ext4_unwritten_wait(). And to solve the second problem, take the i_mutex earlier before we start the s_bitmap_maxbytes check. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-21ext4: factor out common code in ext4_file_write()Theodore Ts'o
This shouldn't change any logic flow; just delete duplicated code. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2014-04-21ext4: move ext4_file_dio_write() into ext4_file_write()Theodore Ts'o
This commit doesn't actually change anything; it just moves code around in preparation for some code simplification work. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2014-04-21ext4: inline generic_file_aio_write() into ext4_file_write()Theodore Ts'o
Copy generic_file_aio_write() into ext4_file_write(). This is part of a patch series which allows us to simplify ext4_file_write() and ext4_file_dio_write(), by calling __generic_file_aio_write() directly. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2014-04-21fs: fix new kernel-doc warnings in fs/bio.cRandy Dunlap
Fix new kernel-doc warnings in fs/bio.c: Warning(fs/bio.c:316): No description found for parameter 'bio' Warning(fs/bio.c:316): No description found for parameter 'parent' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-20ext4: rename uninitialized extents to unwrittenLukas Czerner
Currently in ext4 there is quite a mess when it comes to naming unwritten extents. Sometimes we call it uninitialized and sometimes we refer to it as unwritten. The right name for the extent which has been allocated but does not contain any written data is _unwritten_. Other file systems are using this name consistently, even the buffer head state refers to it as unwritten. We need to fix this confusion in ext4. This commit changes every reference to an uninitialized extent (meaning allocated but unwritten) to unwritten extent. This includes comments, function names and variable names. It even covers abbreviation of the word uninitialized (such as uninit) and some misspellings. This commit does not change any of the code paths at all. This has been confirmed by comparing md5sums of the assembly code of each object file after all the function names were stripped from it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-20ext4: get rid of EXT4_MAP_UNINIT flagLukas Czerner
Currently EXT4_MAP_UNINIT is used in dioread_nolock case to mark the cases where we're using dioread_nolock and we're writing into either unallocated, or unwritten extent, because we need to make sure that any DIO write into that inode will wait for the extent conversion. However EXT4_MAP_UNINIT is not only entirely misleading name but also unnecessary because we can check for EXT4_MAP_UNWRITTEN in the dioread_nolock case instead. This commit removes EXT4_MAP_UNINIT flag. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-20Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "These are regression and bug fixes for ext4. We had a number of new features in ext4 during this merge window (ZERO_RANGE and COLLAPSE_RANGE fallocate modes, renameat, etc.) so there were many more regression and bug fixes this time around. It didn't help that xfstests hadn't been fully updated to fully stress test COLLAPSE_RANGE until after -rc1" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits) ext4: disable COLLAPSE_RANGE for bigalloc ext4: fix COLLAPSE_RANGE failure with 1KB block size ext4: use EINVAL if not a regular file in ext4_collapse_range() ext4: enforce we are operating on a regular file in ext4_zero_range() ext4: fix extent merging in ext4_ext_shift_path_extents() ext4: discard preallocations after removing space ext4: no need to truncate pagecache twice in collapse range ext4: fix removing status extents in ext4_collapse_range() ext4: use filemap_write_and_wait_range() correctly in collapse range ext4: use truncate_pagecache() in collapse range ext4: remove temporary shim used to merge COLLAPSE_RANGE and ZERO_RANGE ext4: fix ext4_count_free_clusters() with EXT4FS_DEBUG and bigalloc enabled ext4: always check ext4_ext_find_extent result ext4: fix error handling in ext4_ext_shift_extents ext4: silence sparse check warning for function ext4_trim_extent ext4: COLLAPSE_RANGE only works on extent-based files ext4: fix byte order problems introduced by the COLLAPSE_RANGE patches ext4: use i_size_read in ext4_unaligned_aio() fs: disallow all fallocate operation on active swapfile fs: move falloc collapse range check into the filesystem methods ...
2014-04-19ext4: disable COLLAPSE_RANGE for bigallocNamjae Jeon
Once COLLAPSE RANGE is be disable for ext4 with bigalloc feature till finding root-cause of problem. It will be enable with fixing that regression of xfstest(generic 075 and 091) again. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-19ext4: fix COLLAPSE_RANGE failure with 1KB block sizeNamjae Jeon
When formatting with 1KB or 2KB(not aligned with PAGE SIZE) block size, xfstests generic/075 and 091 are failing. The offset supplied to function truncate_pagecache_range is block size aligned. In this function start offset is re-aligned to PAGE_SIZE by rounding_up to the next page boundary. Due to this rounding up, old data remains in the page cache when blocksize is less than page size and start offset is not aligned with page size. In case of collapse range, we need to align start offset to page size boundary by doing a round down operation instead of round up. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-19coredump: fix va_list corruptionEric Dumazet
A va_list needs to be copied in case it needs to be used twice. Thanks to Hugh for debugging this issue, leading to various panics. Tested: lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern 'produce_core' is simply : main() { *(int *)0 = 1;} lpq84:~# ./produce_core Segmentation fault (core dumped) lpq84:~# dmesg | tail -1 [ 614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed Notice the last argument was replaced by a NULL (we were lucky enough to not crash, but do not try this on your production machine !) After fix : lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern lpq83:~# ./produce_core Segmentation fault lpq83:~# dmesg | tail -1 [ 740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed Fixes: 5fe9d8ca21cc ("coredump: cn_vprintf() has no reason to call vsnprintf() twice") Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Hugh Dickins <hughd@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-19fix races between __d_instantiate() and checks of dentry flagsAl Viro
in non-lazy walk we need to be careful about dentry switching from negative to positive - both ->d_flags and ->d_inode are updated, and in some places we might see only one store. The cases where dentry has been obtained by dcache lookup with ->i_mutex held on parent are safe - ->d_lock and ->i_mutex provide all the barriers we need. However, there are several places where we run into trouble: * do_last() fetches ->d_inode, then checks ->d_flags and assumes that inode won't be NULL unless d_is_negative() is true. Race with e.g. creat() - we might have fetched the old value of ->d_inode (still NULL) and new value of ->d_flags (already not DCACHE_MISS_TYPE). Lin Ming has observed and reported the resulting oops. * a bunch of places checks ->d_inode for being non-NULL, then checks ->d_flags for "is it a symlink". Race with symlink(2) in case if our CPU sees ->d_inode update first - we see non-NULL there, but ->d_flags still contains DCACHE_MISS_TYPE instead of DCACHE_SYMLINK_TYPE. Result: false negative on "should we follow link here?", with subsequent unpleasantness. Cc: stable@vger.kernel.org # 3.13 and 3.14 need that one Reported-and-tested-by: Lin Ming <minggr@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-18Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "A set of 5 small cifs fixes" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cif: fix dead code cifs: fix error handling cifs_user_readv fs: cifs: remove unused variable. Return correct error on query of xattr on file with empty xattrs cifs: Wait for writebacks to complete before attempting write.
2014-04-18Merge tag 'driver-core-3.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some driver core fixes for 3.15-rc2. Also in here are some documentation updates, as well as an API removal that had to wait for after -rc1 due to the cleanups coming into you from multiple developer trees (this one and the PPC tree.) All have been in linux next successfully" * tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: drivers/base/dd.c incorrect pr_debug() parameters Documentation: Update stable address in Chinese and Japanese translations topology: Fix compilation warning when not in SMP Chinese: add translation of io_ordering.txt stable_kernel_rules: spelling/word usage sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner() kernfs: protect lazy kernfs_iattrs allocation with mutex fs: Don't return 0 from get_anon_bdev
2014-04-18ext4: use EINVAL if not a regular file in ext4_collapse_range()Theodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18ext4: enforce we are operating on a regular file in ext4_zero_range()jon ernst
Signed-off-by: Jon Ernst <jonernst07@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18ext4: fix extent merging in ext4_ext_shift_path_extents()Lukas Czerner
There is a bug in ext4_ext_shift_path_extents() where if we actually manage to merge a extent we would skip shifting the next extent. This will result in in one extent in the extent tree not being properly shifted. This is causing failure in various xfstests tests using fsx or fsstress with collapse range support. It will also cause file system corruption which looks something like: e2fsck 1.42.9 (4-Feb-2014) Pass 1: Checking inodes, blocks, and sizes Inode 20 has out of order extents (invalid logical block 3, physical block 492938, len 2) Clear? yes ... when running e2fsck. It's also very easily reproducible just by running fsx without any parameters. I can usually hit the problem within a minute. Fix it by increasing ex_start only if we're not merging the extent. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2014-04-18ext4: discard preallocations after removing spaceLukas Czerner
Currently in ext4_collapse_range() and ext4_punch_hole() we're discarding preallocation twice. Once before we attempt to do any changes and second time after we're done with the changes. While the second call to ext4_discard_preallocations() in ext4_punch_hole() case is not needed, we need to discard preallocation right after ext4_ext_remove_space() in collapse range case because in the case we had to restart a transaction in the middle of removing space we might have new preallocations created. Remove unneeded ext4_discard_preallocations() ext4_punch_hole() and move it to the better place in ext4_collapse_range() Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18ext4: no need to truncate pagecache twice in collapse rangeLukas Czerner
We're already calling truncate_pagecache() before we attempt to do any actual job so there is not need to truncate pagecache once more using truncate_setsize() after we're finished. Remove truncate_setsize() and replace it just with i_size_write() note that we're holding appropriate locks. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18ext4: fix removing status extents in ext4_collapse_range()Lukas Czerner
Currently in ext4_collapse_range() when calling ext4_es_remove_extent() to remove status extents we're passing (EXT_MAX_BLOCKS - punch_start - 1) in order to remove all extents from start of the collapse range to the end of the file. However this is wrong because we might miss the possible extent covering the last block of the file. Fix it by removing the -1. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2014-04-18ext4: use filemap_write_and_wait_range() correctly in collapse rangeLukas Czerner
Currently we're passing -1 as lend argumnet for filemap_write_and_wait_range() which is wrong since lend is signed type so it would cause some confusion and we might not write_and_wait for the entire range we're expecting to write. Fix it by using LLONG_MAX instead. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18ext4: use truncate_pagecache() in collapse rangeLukas Czerner
We should be using truncate_pagecache() instead of truncate_pagecache_range() in the collapse range because we're truncating page cache from offset to the end of file. truncate_pagecache() also get rid of the private COWed pages from the range because we're going to shift the end of the file. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18Revert "nfsd4: fix nfs4err_resource in 4.1 case"J. Bruce Fields
Since we're still limiting attributes to a page, the result here is that a large getattr result will return NFS4ERR_REP_TOO_BIG/TOO_BIG_TO_CACHE instead of NFS4ERR_RESOURCE. Both error returns are wrong, and the real bug here is the arbitrary limit on getattr results, fixed by as-yet out-of-tree patches. But at a minimum we can make life easier for clients by sticking to one broken behavior in released kernels instead of two.... Trond says: one immediate consequence of this patch will be that NFSv4.1 clients will now report EIO instead of EREMOTEIO if they hit the problem. That may make debugging a little less obvious. Another consequence will be that if we ever do try to add client side handling of NFS4ERR_REP_TOO_BIG, then we now have to deal with the “handle existing buggy server” syndrome. Reported-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18nfsd: set timeparms.to_maxval in setup_callback_clientJeff Layton
...otherwise the logic in the timeout handling doesn't work correctly. Spotted-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18locks: allow __break_lease to sleep even when break_time is 0Jeff Layton
A fl->fl_break_time of 0 has a special meaning to the lease break code that basically means "never break the lease". knfsd uses this to ensure that leases don't disappear out from under it. Unfortunately, the code in __break_lease can end up passing this value to wait_event_interruptible as a timeout, which prevents it from going to sleep at all. This causes __break_lease to spin in a tight loop and causes soft lockups. Fix this by ensuring that we pass a minimum value of 1 as a timeout instead. Cc: <stable@vger.kernel.org> Cc: J. Bruce Fields <bfields@fieldses.org> Reported-by: Terry Barnaby <terry1@beam.ltd.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18arch: Mass conversion of smp_mb__*()Peter Zijlstra
Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18sched, treewide: Replace hardcoded nice values with MIN_NICE/MAX_NICEDongsheng Yang
Replace various -20/+19 hardcoded nice values with MIN_NICE/MAX_NICE. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/ff13819fd09b7a5dba5ab5ae797f2e7019bdfa17.1394532288.git.yangds.fnst@cn.fujitsu.com Cc: devel@driverdev.osuosl.org Cc: devicetree@vger.kernel.org Cc: fcoe-devel@open-fcoe.org Cc: linux390@de.ibm.com Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-s390@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: nbd-general@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: openipmi-developer@lists.sourceforge.net Cc: qla2xxx-upstream@qlogic.com Cc: linux-arch@vger.kernel.org [ Consolidated the patches, twiddled the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-17GFS2: quotas not being refreshed in gfs2_adjust_quotaAbhi Das
Old values of user quota limits were being used and could allow users to exceed their allotted quotas. This patch refreshes the limits to the latest values so that quotas are enforced correctly. Resolves: rhbz#1077463 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-04-16cif: fix dead codeMichael Opdenacker
This issue was found by Coverity (CID 1202536) This proposes a fix for a statement that creates dead code. The "rc < 0" statement is within code that is run with "rc > 0". It seems like "err < 0" was meant to be used here. This way, the error code is returned by the function. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steve French <smfrench@gmail.com>
2014-04-16cifs: fix error handling cifs_user_readvJeff Layton
Coverity says: *** CID 1202537: Dereference after null check (FORWARD_NULL) /fs/cifs/file.c: 2873 in cifs_user_readv() 2867 cur_len = min_t(const size_t, len - total_read, cifs_sb->rsize); 2868 npages = DIV_ROUND_UP(cur_len, PAGE_SIZE); 2869 2870 /* allocate a readdata struct */ 2871 rdata = cifs_readdata_alloc(npages, 2872 cifs_uncached_readv_complete); >>> CID 1202537: Dereference after null check (FORWARD_NULL) >>> Comparing "rdata" to null implies that "rdata" might be null. 2873 if (!rdata) { 2874 rc = -ENOMEM; 2875 goto error; 2876 } 2877 2878 rc = cifs_read_allocate_pages(rdata, npages); ...when we "goto error", rc will be non-zero, and then we end up trying to do a kref_put on the rdata (which is NULL). Fix this by replacing the "goto error" with a "break". Reported-by: <scan-admin@coverity.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2014-04-17xfs: fix tmpfile/selinux deadlock and initialize securityBrian Foster
xfstests generic/004 reproduces an ilock deadlock using the tmpfile interface when selinux is enabled. This occurs because xfs_create_tmpfile() takes the ilock and then calls d_tmpfile(). The latter eventually calls into xfs_xattr_get() which attempts to get the lock again. E.g.: xfs_io D ffffffff81c134c0 4096 3561 3560 0x00000080 ffff8801176a1a68 0000000000000046 ffff8800b401b540 ffff8801176a1fd8 00000000001d5800 00000000001d5800 ffff8800b401b540 ffff8800b401b540 ffff8800b73a6bd0 fffffffeffffffff ffff8800b73a6bd8 ffff8800b5ddb480 Call Trace: [<ffffffff8177f969>] schedule+0x29/0x70 [<ffffffff81783a65>] rwsem_down_read_failed+0xc5/0x120 [<ffffffffa05aa97f>] ? xfs_ilock_attr_map_shared+0x1f/0x50 [xfs] [<ffffffff813b3434>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff810ed179>] ? down_read_nested+0x89/0xa0 [<ffffffffa05aa7f2>] ? xfs_ilock+0x122/0x250 [xfs] [<ffffffffa05aa7f2>] xfs_ilock+0x122/0x250 [xfs] [<ffffffffa05aa97f>] xfs_ilock_attr_map_shared+0x1f/0x50 [xfs] [<ffffffffa05701d0>] xfs_attr_get+0x90/0xe0 [xfs] [<ffffffffa0565e07>] xfs_xattr_get+0x37/0x50 [xfs] [<ffffffff8124842f>] generic_getxattr+0x4f/0x70 [<ffffffff8133fd9e>] inode_doinit_with_dentry+0x1ae/0x650 [<ffffffff81340e0c>] selinux_d_instantiate+0x1c/0x20 [<ffffffff813351bb>] security_d_instantiate+0x1b/0x30 [<ffffffff81237db0>] d_instantiate+0x50/0x70 [<ffffffff81237e85>] d_tmpfile+0xb5/0xc0 [<ffffffffa05add02>] xfs_create_tmpfile+0x362/0x410 [xfs] [<ffffffffa0559ac8>] xfs_vn_tmpfile+0x18/0x20 [xfs] [<ffffffff81230388>] path_openat+0x228/0x6a0 [<ffffffff810230f9>] ? sched_clock+0x9/0x10 [<ffffffff8105a427>] ? kvm_clock_read+0x27/0x40 [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0 [<ffffffff8123101a>] do_filp_open+0x3a/0x90 [<ffffffff817845e7>] ? _raw_spin_unlock+0x27/0x40 [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0 [<ffffffff8121e3ce>] do_sys_open+0x12e/0x210 [<ffffffff8121e4ce>] SyS_open+0x1e/0x20 [<ffffffff8178eda9>] system_call_fastpath+0x16/0x1b xfs_vn_tmpfile() also fails to initialize security on the newly created inode. Pull the d_tmpfile() call up into xfs_vn_tmpfile() after the transaction has been committed and the inode unlocked. Also, initialize security on the inode based on the parent directory provided via the tmpfile call. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-17xfs: fix buffer use after free on IO errorEric Sandeen
When testing exhaustion of dm snapshots, the following appeared with CONFIG_DEBUG_OBJECTS_FREE enabled: ODEBUG: free active (active state 0) object type: work_struct hint: xfs_buf_iodone_work+0x0/0x1d0 [xfs] indicating that we'd freed a buffer which still had a pending reference, down this path: [ 190.867975] [<ffffffff8133e6fb>] debug_check_no_obj_freed+0x22b/0x270 [ 190.880820] [<ffffffff811da1d0>] kmem_cache_free+0xd0/0x370 [ 190.892615] [<ffffffffa02c5924>] xfs_buf_free+0xe4/0x210 [xfs] [ 190.905629] [<ffffffffa02c6167>] xfs_buf_rele+0xe7/0x270 [xfs] [ 190.911770] [<ffffffffa034c826>] xfs_trans_read_buf_map+0x7b6/0xac0 [xfs] At issue is the fact that if IO fails in xfs_buf_iorequest, we'll queue completion unconditionally, and then call xfs_buf_rele; but if IO failed, there are no IOs remaining, and xfs_buf_rele will free the bp while work is still queued. Fix this by not scheduling completion if the buffer has an error on it; run it immediately. The rest is only comment changes. Thanks to dchinner for spotting the root cause. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-17xfs: wrong error sign conversion during failed DIO writesDave Chinner
We negate the error value being returned from a generic function incorrectly. The code path that it is running in returned negative errors, so there is no need to negate it to get the correct error signs here. This was uncovered by generic/019. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-17xfs: unmount does not wait for shutdown during unmountDave Chinner
And interesting situation can occur if a log IO error occurs during the unmount of a filesystem. The cases reported have the same signature - the update of the superblock counters fails due to a log write IO error: XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1 XFS (dm-16): Log I/O Error Detected. Shutting down filesystem XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount. XFS (dm-16): xfs_log_force: error 5 returned. XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s) It can be seen that the last line of output contains a corrupt device name - this is because the log and xfs_mount structures have already been freed by the time this message is printed. A kernel oops closely follows. The issue is that the shutdown is occurring in a separate IO completion thread to the unmount. Once the shutdown processing has started and all the iclogs are marked with XLOG_STATE_IOERROR, the log shutdown code wakes anyone waiting on a log force so they can process the shutdown error. This wakes up the unmount code that is doing a synchronous transaction to update the superblock counters. The unmount path now sees all the iclogs are marked with XLOG_STATE_IOERROR and so never waits on them again, knowing that if it does, there will not be a wakeup trigger for it and we will hang the unmount if we do. Hence the unmount runs through all the remaining code and frees all the filesystem structures while the xlog_iodone() is still processing the shutdown. When the log shutdown processing completes, xfs_do_force_shutdown() emits the "Please umount the filesystem and rectify the problem(s)" message, and xlog_iodone() then aborts all the objects attached to the iclog. An iclog that has already been freed.... The real issue here is that there is no serialisation point between the log IO and the unmount. We have serialisations points for log writes, log forces, reservations, etc, but we don't actually have any code that wakes for log IO to fully complete. We do that for all other types of object, so why not iclogbufs? Well, it turns out that we can easily do this. We've got xfs_buf handles, and that's what everyone else uses for IO serialisation. i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only release the lock in xlog_iodone() when we are finished with the buffer. That way before we tear down the iclog, we can lock and unlock the buffer to ensure IO completion has finished completely before we tear it down. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Bob Mastors <bob.mastors@solidfire.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-17xfs: collapse range is delalloc challengedDave Chinner
FSX has been detecting data corruption after to collapse range calls. The key observation is that the offset of the last extent in the file was not being shifted, and hence when the file size was adjusted it was truncating away data because the extents handled been correctly shifted. Tracing indicated that before the collapse, the extent list looked like: .... ino 0x5788 state idx 6 offset 26 block 195904 count 10 flag 0 ino 0x5788 state idx 7 offset 39 block 195917 count 35 flag 0 ino 0x5788 state idx 8 offset 86 block 195964 count 32 flag 0 and after the shift of 2 blocks: ino 0x5788 state idx 6 offset 24 block 195904 count 10 flag 0 ino 0x5788 state idx 7 offset 37 block 195917 count 35 flag 0 ino 0x5788 state idx 8 offset 86 block 195964 count 32 flag 0 Note that the last extent did not change offset. After the changing of the file size: ino 0x5788 state idx 6 offset 24 block 195904 count 10 flag 0 ino 0x5788 state idx 7 offset 37 block 195917 count 35 flag 0 ino 0x5788 state idx 8 offset 86 block 195964 count 30 flag 0 You can see that the last extent had it's length truncated, indicating that we've lost data. The reason for this is that the xfs_bmap_shift_extents() loop uses XFS_IFORK_NEXTENTS() to determine how many extents are in the inode. This, unfortunately, doesn't take into account delayed allocation extents - it's a count of physically allocated extents - and hence when the file being collapsed has a delalloc extent like this one does prior to the range being collapsed: .... ino 0x5788 state idx 4 offset 11 block 4503599627239429 count 1 flag 0 .... it gets the count wrong and terminates the shift loop early. Fix it by using the in-memory extent array size that includes delayed allocation extents to determine the number of extents on the inode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>