summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-06-11nfs: Only update callback sequnce id when CB_SEQUENCE successKinglong Mee
When testing pnfs layout, nfsd got error NFS4ERR_SEQ_MISORDERED. It is caused by nfs return NFS4ERR_DELAY before validate_seqid(), don't update the sequnce id, but nfsd updates the sequnce id !!! According to RFC5661 20.9.3, " If CB_SEQUENCE returns an error, then the state of the slot (sequence ID, cached reply) MUST NOT change. " Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10NFS: Convert use of __constant_htonl to htonlVaishali Thakkar
In little endian cases, the macro htonl unfolds to __swab32 which provides special case for constants. In big endian cases, __constant_htonl and htonl expand directly to the same expression. So, replace __constant_htonl with htonl with the goal of getting rid of the definition of __constant_htonl completely. The semantic patch that performs this transformation is as follows: @@expression x;@@ - __constant_htonl(x) + htonl(x) Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10NFS: Remove unused nfs_rw_ops->rw_release() functionAnna Schumaker
This was only ever set to nfs_writeback_release_common(), a function which is completely empty. Let's just drop this function pointer and simplify the code a bit. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10NFSv4: handle nfs4_get_referral failureDominique Martinet
nfs4_proc_lookup_common is supposed to return a posix error, we have to handle any error returned that isn't errno Reported-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10sunrpc: keep a count of swapfiles associated with the rpc_clntJeff Layton
Jerome reported seeing a warning pop when working with a swapfile on NFS. The nfs_swap_activate can end up calling sk_set_memalloc while holding the rcu_read_lock and that function can sleep. To fix that, we need to take a reference to the xprt while holding the rcu_read_lock, set the socket up for swapping and then drop that reference. But, xprt_put is not exported and having NFS deal with the underlying xprt is a bit of layering violation anyway. Fix this by adding a set of activate/deactivate functions that take a rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and nfs_swap_deactivate call those. Also, add a per-rpc_clnt atomic counter to keep track of the number of active swapfiles associated with it. When the counter does a 0->1 transition, we enable swapping on the xprt, when we do a 1->0 transition we disable swapping on it. This also allows us to be a bit more selective with the RPC_TASK_SWAPPER flag. If non-swapper and swapper clnts are sharing a xprt, then we only need to flag the tasks from the swapper clnt with that flag. Acked-by: Mel Gorman <mgorman@suse.de> Reported-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10btrfs: wait for delayed iputs on no spaceZhao Lei
btrfs will report no_space when we run following write and delete file loop: # FILE_SIZE_M=[ 75% of fs space ] # DEV=[ some dev ] # MNT=[ some dir ] # # mkfs.btrfs -f "$DEV" # mount -o nodatacow "$DEV" "$MNT" # for ((i = 0; i < 100; i++)); do dd if=/dev/zero of="$MNT"/file0 bs=1M count="$FILE_SIZE_M"; rm -f "$MNT"/file0; done # Reason: iput() and evict() is run after write pages to block device, if write pages work is not finished before next write, the "rm"ed space is not freed, and caused above bug. Fix: We can add "-o flushoncommit" mount option to avoid above bug, but it have performance problem. Actually, we can to wait for on-the-fly writes only when no-space happened, it is which this patch do. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Make snapshot accounting work with new extent-orientedQu Wenruo
qgroup. Make snapshot accounting work with new extent-oriented mechanism by skipping given root in new/old_roots in create_pending_snapshot(). Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Add the ability to skip given qgroup for old/new_roots.Qu Wenruo
This is used by later qgroup fix patches for snapshot. As current snapshot accounting is done by btrfs_qgroup_inherit(), but new extent oriented quota mechanism will account extent from btrfs_copy_root() and other snapshot things, causing wrong result. So add this ability to handle snapshot accounting. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: ulist: Add ulist_del() function.Qu Wenruo
This function will delete unode with given (val,aux) pair. And with this patch, seqnum for debug usage doesn't have any meaning now, so remove them. This is used by later patches to skip snapshot root. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Cleanup the old ref_node-oriented mechanism.Qu Wenruo
Goodbye, the old mechanisim. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism.Qu Wenruo
Since the self test transaction don't have delayed_ref_roots, so use find_all_roots() and export btrfs_qgroup_account_extent() to simulate it Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.Qu Wenruo
Switch from old ref_node based qgroup to extent based qgroup mechanism for normal operations. The new mechanism should hugely reduce the overhead of btrfs quota system, and further more, the codes and logic should be more clean and easier to maintain. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Switch rescan to new mechanism.Qu Wenruo
Switch rescan to use the new new extent oriented mechanism. As rescan is also based on extent, new mechanism is just a perfect match for rescan. With re-designed internal functions, rescan is quite easy, just call btrfs_find_all_roots() and then btrfs_qgroup_account_one_extent(). Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Add new qgroup calculation functionQu Wenruo
btrfs_qgroup_account_extents(). The new btrfs_qgroup_account_extents() function should be called in btrfs_commit_transaction() and it will update all the qgroup according to delayed_ref_root->dirty_extent_root. The new function can handle both normal operation during commit_transaction() or in rescan in a unified method with clearer logic. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: backref: Add special time_seq == (u64)-1 case forQu Wenruo
btrfs_find_all_roots(). Allow btrfs_find_all_roots() to skip all delayed_ref_head lock and tree lock to do tree search. This is important for later qgroup implement which will call find_all_roots() after fs trees are committed. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Add new function to record old_roots.Qu Wenruo
Add function btrfs_qgroup_prepare_account_extents() to get old_roots which are needed for qgroup. We do it in commit_transaction() and before switch_roots(), and only search commit_root, so it gives a quite accurate view for previous transaction. With old_roots from previous transaction, we can use it to do accurate account with current transaction. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Record possible quota-related extent for qgroup.Qu Wenruo
Add hook in add_delayed_ref_head() to record quota-related extent record into delayed_ref_root->dirty_extent_record rb-tree for later qgroup accounting. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Add function qgroup_update_counters().Qu Wenruo
Add function qgroup_update_counters(), which will update related qgroups' rfer/excl according to old/new_roots. This is one of the two core functions for the new qgroup implement. This is based on btrfs_adjust_coutners() but with clearer logic and comment. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Add function qgroup_update_refcnt().Qu Wenruo
This function is used to update refcnt for qgroups. And is one of the two core functions used in the new qgroup implement. This is based on the old update_old/new_refcnt, but provides a unified logic and behavior. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: extent-tree: Use ref_node to replace unneeded parameters in ↵Qu Wenruo
__inc_extent_ref() and __free_extent() __btrfs_inc_extent_ref() and __btrfs_free_extent() have already had too many parameters, but three of them can be extracted from btrfs_delayed_ref_node struct. So use btrfs_delayed_ref_node struct as a single parameter to replace the bytenr/num_byte/no_quota parameters. The real objective of this patch is to allow btrfs_qgroup_record_ref() get the delayed_ref_node in incoming qgroup patches. Other functions calling btrfs_qgroup_record_ref() are not affected since the rest will only add/sub exclusive extents, where node is not used. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: qgroup: Cleanup open-coded old/new_refcnt update and read.Qu Wenruo
Use inline functions to do such things, to improve readability. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Acked-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: delayed-ref: Cleanup the unneeded functions.Qu Wenruo
Cleanup the rb_tree merge/insert/update functions, since now we use list instead of rb_tree now. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: delayed-ref: Use list to replace the ref_root in ref_head.Qu Wenruo
This patch replace the rbtree used in ref_head to list. This has the following advantage: 1) Easier merge logic. With the new list implement, we only need to care merging the tail ref_node with the new ref_node. And this can be done quite easy at insert time, no need to do a indicated merge at run_delayed_refs(). Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: backref: Don't merge refs which are not for same block.Qu Wenruo
Old __merge_refs() in backref.c will even merge refs whose root_id are different, which makes qgroup gives wrong result. Fix it by checking ref_for_same_block() before any mode specific works. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: Fix lockdep warning of wr_ctx->wr_lock in scrub_free_wr_ctx()Zhao Lei
lockdep report following warning in test: [25176.843958] ================================= [25176.844519] [ INFO: inconsistent lock state ] [25176.845047] 4.1.0-rc3 #22 Tainted: G W [25176.845591] --------------------------------- [25176.846153] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [25176.846713] fsstress/26661 [HC0[0]:SC1[1]:HE1:SE0] takes: [25176.847246] (&wr_ctx->wr_lock){+.?...}, at: [<ffffffffa04cdc6d>] scrub_free_ctx+0x2d/0xf0 [btrfs] [25176.847838] {SOFTIRQ-ON-W} state was registered at: [25176.848396] [<ffffffff810bf460>] __lock_acquire+0x6a0/0xe10 [25176.848955] [<ffffffff810bfd1e>] lock_acquire+0xce/0x2c0 [25176.849491] [<ffffffff816489af>] mutex_lock_nested+0x7f/0x410 [25176.850029] [<ffffffffa04d04ff>] scrub_stripe+0x4df/0x1080 [btrfs] [25176.850575] [<ffffffffa04d11b1>] scrub_chunk.isra.19+0x111/0x130 [btrfs] [25176.851110] [<ffffffffa04d144c>] scrub_enumerate_chunks+0x27c/0x510 [btrfs] [25176.851660] [<ffffffffa04d3b87>] btrfs_scrub_dev+0x1c7/0x6c0 [btrfs] [25176.852189] [<ffffffffa04e918e>] btrfs_dev_replace_start+0x36e/0x450 [btrfs] [25176.852771] [<ffffffffa04a98e0>] btrfs_ioctl+0x1e10/0x2d20 [btrfs] [25176.853315] [<ffffffff8121c5b8>] do_vfs_ioctl+0x318/0x570 [25176.853868] [<ffffffff8121c851>] SyS_ioctl+0x41/0x80 [25176.854406] [<ffffffff8164da17>] system_call_fastpath+0x12/0x6f [25176.854935] irq event stamp: 51506 [25176.855511] hardirqs last enabled at (51506): [<ffffffff810d4ce5>] vprintk_emit+0x225/0x5e0 [25176.856059] hardirqs last disabled at (51505): [<ffffffff810d4b77>] vprintk_emit+0xb7/0x5e0 [25176.856642] softirqs last enabled at (50886): [<ffffffff81067a23>] __do_softirq+0x363/0x640 [25176.857184] softirqs last disabled at (50949): [<ffffffff8106804d>] irq_exit+0x10d/0x120 [25176.857746] other info that might help us debug this: [25176.858845] Possible unsafe locking scenario: [25176.859981] CPU0 [25176.860537] ---- [25176.861059] lock(&wr_ctx->wr_lock); [25176.861705] <Interrupt> [25176.862272] lock(&wr_ctx->wr_lock); [25176.862881] *** DEADLOCK *** Reason: Above warning is caused by: Interrupt -> bio_endio() -> ... -> scrub_put_ctx() -> scrub_free_ctx() *1 -> ... -> mutex_lock(&wr_ctx->wr_lock); scrub_put_ctx() is allowed to be called in end_bio interrupt, but in code design, it will never call scrub_free_ctx(sctx) in interrupe context(above *1), because btrfs_scrub_dev() get one additional reference of sctx->refs, which makes scrub_free_ctx() only called withine btrfs_scrub_dev(). Now the code runs out of our wish, because free sequence in scrub_pending_bio_dec() have a gap. Current code: -----------------------------------+----------------------------------- scrub_pending_bio_dec() | btrfs_scrub_dev -----------------------------------+----------------------------------- atomic_dec(&sctx->bios_in_flight); | wake_up(&sctx->list_wait); | | scrub_put_ctx() | -> atomic_dec_and_test(&sctx->refs) scrub_put_ctx(sctx); | -> atomic_dec_and_test(&sctx->refs)| -> scrub_free_ctx() | -----------------------------------+----------------------------------- We expected: -----------------------------------+----------------------------------- scrub_pending_bio_dec() | btrfs_scrub_dev -----------------------------------+----------------------------------- atomic_dec(&sctx->bios_in_flight); | wake_up(&sctx->list_wait); | scrub_put_ctx(sctx); | -> atomic_dec_and_test(&sctx->refs)| | scrub_put_ctx() | -> atomic_dec_and_test(&sctx->refs) | -> scrub_free_ctx() -----------------------------------+----------------------------------- Fix: Move scrub_pending_bio_dec() to a workqueue, to avoid this function run in interrupt context. Tested by check tracelog in debug. Changelog v1->v2: Use workqueue instead of adjust function call sequence in v1, because v1 will introduce a bug pointed out by: Filipe David Manana <fdmanana@gmail.com> Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10btrfs: Handle unaligned length in extent_sameMark Fasheh
The extent-same code rejects requests with an unaligned length. This poses a problem when we want to dedupe the tail extent of files as we skip cloning the portion between i_size and the extent boundary. If we don't clone the entire extent, it won't be deleted. So the combination of these behaviors winds up giving us worst-case dedupe on many files. We can fix this by allowing a length that extents to i_size and internally aligining those to the end of the block. This is what btrfs_ioctl_clone() so we can just copy that check over. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10Btrfs: btrfs_defrag_file: Fix calculation of max_to_defrag.chandan
max_to_defrag represents the number of pages to defrag rather than the last page of the file range to be defragged. Consider a file having 10 4k blocks (i.e. blocks in the range [0 - 9]). If the defrag ioctl was invoked for the block range [3 - 6], then max_to_defrag should actually have the value 4. Instead in the current code we end up setting it to 6. Now, this does not (yet) cause an issue since the first part of the while loop condition in btrfs_defrag_file() (i.e. "i <= last_index") causes the control to flow out of the while loop before any buggy behavior is actually caused. So the patch just makes sure that max_to_defrag ends up having the right value rather than fixing a bug. I did run the xfstests suite to make sure that the code does not regress. Changelog: v1->v2: Provide a much descriptive commit message. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10Btrfs: btrfs_defrag_file: Fix ra_index computation.chandan
Read-ahead is done for the pages in the range [ra_index, ra_index + cluster - 1]. So the next read-ahead should be starting from the page at index 'ra_index + cluster' (unless we deemed that the extent at 'ra_index + cluster' as non-defraggable) rather than from the page at index 'ra_index + max_cluster'. This patch fixes this. I did run the xfstests suite to make sure that the code does not regress. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10Btrfs: fix necessary chunk tree space calculation when allocating a chunkFilipe Manana
When allocating a new chunk or removing one we need to update num_devs device items and insert or remove a chunk item in the chunk tree, so in the worst case the space needed in the chunk space_info is: btrfs_calc_trunc_metadata_size(chunk_root, num_devs) + btrfs_calc_trans_metadata_size(chunk_root, 1) That is, in the worst case we need to cow num_devs paths and cow 1 other path that can result in splitting every node and leaf, and each path consisting of BTRFS_MAX_LEVEL - 1 nodes and 1 leaf. We were requiring some additional chunk_root->nodesize * BTRFS_MAX_LEVEL * num_devs bytes, which were unnecessary since updating the existing device items does not result in splitting the nodes and leaf since after updating them they remain with the same size. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10Btrfs: don't attach unnecessary extents to transaction on fsyncFilipe Manana
We don't need to attach ordered extents that have completed to the current transaction. Doing so only makes us hold memory for longer than necessary and delaying the iput of the inode until the transaction is committed (for each created ordered extent we do an igrab and then schedule an asynchronous iput when the ordered extent's reference count drops to 0), preventing the inode from being evictable until the transaction commits. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10Btrfs: avoid syncing log in the fast fsync path when not necessaryFilipe Manana
Commit 3a8b36f37806 ("Btrfs: fix data loss in the fast fsync path") added a performance regression for that causes an unnecessary sync of the log trees (fs/subvol and root log trees) when 2 consecutive fsyncs are done against a file, without no writes or any metadata updates to the inode in between them and if a transaction is committed before the second fsync is called. Huang Ying reported this to lkml (https://lkml.org/lkml/2015/3/18/99) after a test sysbench test that measured a -62% decrease of file io requests per second for that tests' workload. The test is: echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor echo performance > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor mkfs -t btrfs /dev/sda2 mount -t btrfs /dev/sda2 /fs/sda2 cd /fs/sda2 for ((i = 0; i < 1024; i++)); do fallocate -l 67108864 testfile.$i; done sysbench --test=fileio --max-requests=0 --num-threads=4 --max-time=600 \ --file-test-mode=rndwr --file-total-size=68719476736 --file-io-mode=sync \ --file-num=1024 run A test on kvm guest, running a debug kernel gave me the following results: Without 3a8b36f378060d: 16.01 reqs/sec With 3a8b36f378060d: 3.39 reqs/sec With 3a8b36f378060d and this patch: 16.04 reqs/sec Reported-by: Huang Ying <ying.huang@intel.com> Tested-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10Merge branch 'send_fixes_4.2' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.2
2015-06-09f2fs: drop the volatile_write flag onlyJaegeuk Kim
When aborting volatile_writes, let's drop its flag and give up any further volatile_writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-09gfs2: Don't support fallocate on jdata filesAbhi Das
We cannot provide an efficient implementation due to the headers on the data blocks, so there doesn't seem much point in having it. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-06-09ext4: Add support FALLOC_FL_INSERT_RANGE for fallocateNamjae Jeon
This patch implements fallocate's FALLOC_FL_INSERT_RANGE for Ext4. 1) Make sure that both offset and len are block size aligned. 2) Update the i_size of inode by len bytes. 3) Compute the file's logical block number against offset. If the computed block number is not the starting block of the extent, split the extent such that the block number is the starting block of the extent. 4) Shift all the extents which are lying between [offset, last allocated extent] towards right by len bytes. This step will make a hole of len bytes at offset. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
2015-06-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-06-08f2fs: skip committing valid superblockChao Yu
In recovery procedure for superblock, we try to write data of valid superblock into invalid one for recovery, work should be finished here, but then still we will write the valid one with its original data. This operation is not needed. Let's skip doing this unnecessary work. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-08f2fs: setting discard option in parse_options()Chao Yu
For the first mount of f2fs image with realtime discard option, we will disable discard option if device is not supported, but for remount operation, our discard option can still be set, this should be avoided. This patch moves configuring of discard option to parse_options() to fix this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-08Merge 4.1-rc7 into driver-core-nextGreg Kroah-Hartman
We want the fixes in this branch as well for testing and merge resolution. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-08jbd2: speedup jbd2_journal_get_[write|undo]_access()Jan Kara
jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are frequently called for buffers that are already part of the running transaction - most frequently it is the case for bitmaps, inode table blocks, and superblock. Since in such cases we have nothing to do, it is unfortunate we still grab reference to journal head, lock the bh, lock bh_state only to find out there's nothing to do. Improving this is a bit subtle though since until we find out journal head is attached to the running transaction, it can disappear from under us because checkpointing / commit decided it's no longer needed. We deal with this by protecting journal_head slab with RCU. We still have to be careful about journal head being freed & reallocated within slab and about exposing journal head in consistent state (in particular b_modified and b_frozen_data must be in correct state before we allow user to touch the buffer). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-06-08jbd2: more simplifications in do_get_write_access()Jan Kara
Check for the simple case of unjournaled buffer first, handle it and bail out. This allows us to remove one if and unindent the difficult case by one tab. The result is easier to read. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-06-08jbd2: simplify error path on allocation failure in do_get_write_access()Jan Kara
We were acquiring bh_state_lock when allocation of buffer failed in do_get_write_access() only to be able to jump to a label that releases the lock and does all other checks that don't make sense for this error path. Just jump into the right label instead. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-06-08jbd2: simplify code flow in do_get_write_access()Jan Kara
needs_copy is set only in one place in do_get_write_access(), just move the frozen buffer copying into that place and factor it out to a separate function to make do_get_write_access() slightly more readable. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-06-08ext4 crypto: fix sparse warnings in fs/ext4/ioctl.cFabian Frederick
[ Added another sparse fix for EXT4_IOC_GET_ENCRYPTION_POLICY while we're at it. --tytso ] Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-06-08gfs2: s64 cast for negative quota valueAbhi Das
One-line fix to cast quota value to s64 before comparison. By default the quantity is treated as u64. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-06-08ext4: BUG_ON assertion repeated for inode1, not done for inode2David Moore
During a source code review of fs/ext4/extents.c I noted identical consecutive lines. An assertion is repeated for inode1 and never done for inode2. This is not in keeping with the rest of the code in the ext4_swap_extents function and appears to be a bug. Assert that the inode2 mutex is not locked. Signed-off-by: David Moore <dmoorefo@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2015-06-08ext4 crypto: fix ext4_get_crypto_ctx()'s calling convention in ext4_decrypt_oneTheodore Ts'o
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-06-08ext4: return error code from ext4_mb_good_group()Lukas Czerner
Currently ext4_mb_good_group() only returns 0 or 1 depending on whether the allocation group is suitable for use or not. However we might get various errors and fail while initializing new group including -EIO which would never get propagated up the call chain. This might lead to an endless loop at writeback when we're trying to find a good group to allocate from and we fail to initialize new group (read error for example). Fix this by returning proper error code from ext4_mb_good_group() and using it in ext4_mb_regular_allocator(). In ext4_mb_regular_allocator() we will always return only the first occurred error from ext4_mb_good_group() and we only propagate it back to the caller if we do not get any other errors and we fail to allocate any blocks. Note that with other modes than errors=continue, we will fail immediately in ext4_mb_good_group() in case of error, however with errors=continue we should try to continue using the file system, that's why we're not going to fail immediately when we see an error from ext4_mb_good_group(), but rather when we fail to find a suitable block group to allocate from due to an problem in group initialization. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2015-06-08ext4: try to initialize all groups we can in case of failure on ppc64Lukas Czerner
Currently on the machines with page size > block size when initializing block group buddy cache we initialize it for all the block group bitmaps in the page. However in the case of read error, checksum error, or if a single bitmap is in any way corrupted we would fail to initialize all of the bitmaps. This is problematic because we will not have access to the other allocation groups even though those might be perfectly fine and usable. Fix this by reading all the bitmaps instead of error out on the first problem and simply skip the bitmaps which were either not read properly, or are not valid. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-06-08ext4: verify block bitmap even after fresh initializationLukas Czerner
If we want to rely on the buffer_verified() flag of the block bitmap buffer, we have to set it consistently. However currently if we're initializing uninitialized block bitmap in ext4_read_block_bitmap_nowait() we're not going to set buffer verified at all. We can do this by simply setting the flag on the buffer, but I think it's actually better to run ext4_validate_block_bitmap() to make sure that what we did in the ext4_init_block_bitmap() is right. So run ext4_validate_block_bitmap() even after the block bitmap initialization. Also bail out early from ext4_validate_block_bitmap() if we see corrupt bitmap, since we already know it's corrupt and we do not need to verify that. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>