summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-04-02ceph: optimizing cap reservationChengguang Xu
When caps_avail_count is in a low level, most newly trimmed caps will probably go into ->caps_list and caps_avail_count will be increased. Hence after trimming, should recheck caps_avail_count to effectly reuse newly trimmed caps. Also, when releasing unnecessary caps follow the same rule of ceph_put_cap. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: release unreserved caps if having enough available capsChengguang Xu
When unreserving caps check if there is too mamy available caps in the ->caps_list, if so release unreserved caps. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: optimizing cap allocationChengguang Xu
When setting high volume of caps_min_count or having many unreserved caps, unused caps may always keep in the ->caps_list even can't get new cap from kmem_cache_alloc because lack of maximum limitation of caps_avail_count. Hence reuse caps in ->caps_list if available, it's maybe better than setting max limitation of caps_avail_count and releasing unused caps when reaching the limit. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: adding protection for showing cap reservation infoChengguang Xu
Adding spinlock protection during getting cap reservation ralated fields so that the numbers match below BUG_ON condition in the code. BUG_ON(mdsc->caps_total_count != mdsc->caps_use_count + mdsc->caps_reserve_count + mdsc->caps_avail_count); Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: use seq_show_option for string type optionsChengguang Xu
Using seq_show_option to replace seq_printf for string type options. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02libceph, ceph: change permission for readonly debugfs entriesChengguang Xu
Remove write permission for debugfs entries which only have readonly function. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: keep consistent semantic in fscache related option combinationChengguang Xu
When specifying multiple fscache related options, the result isn't always the same as option order, this fix will keep strict consistent meaning by order. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: add newline to end of debug message formatChengguang Xu
Some of dout format do not include newline in the end, fix for the files which are in fs/ceph and net/ceph directories, and changing printk to dout for printing debug info in super.c Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02libceph, ceph: move ceph_calc_file_object_mapping() to striper.cIlya Dryomov
ceph_calc_file_object_mapping() has nothing to do with osdmaps. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02libceph, ceph: change ceph_calc_file_object_mapping() signatureIlya Dryomov
- make it void - xlen (object extent length) out parameter should be u32 because only a single stripe unit is mapped at a time Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-01ext4: force revalidation of directory pointer after seekdir(2)Theodore Ts'o
A malicious user could force the directory pointer to be in an invalid spot by using seekdir(2). Use the mechanism we already have to notice if the directory has changed since the last time we called ext4_readdir() to force a revalidation of the pointer. Reported-by: syzbot+1236ce66f79263e8a862@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-04-01cifs: smbd: disconnect transport on RDMA errorsLong Li
On RDMA errors, transport should disconnect the RDMA CM connection. This will notify the upper layer, and it will attempt transport reconnect. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org>
2018-04-01cifs: smbd: avoid reconnect lockupLong Li
During transport reconnect, other processes may have registered memory and blocked on transport. This creates a deadlock situation because the transport resources can't be freed, and reconnect is blocked. Fix this by returning to upper layer on timeout. Before returning, transport status is set to reconnecting so other processes will release memory registration resources. Upper layer will retry the reconnect. This is not in fast I/O path so setting the timeout to 5 seconds. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org>
2018-04-01Don't log confusing message on reconnect by defaultSteve French
Change the following message (which can occur on reconnect) from a warning to an FYI message. It is confusing to users. [58360.523634] CIFS VFS: Free previous auth_key.response = 00000000a91cdc84 By default this message won't show up on reconnect unless the user bumps up the log level to include FYI messages. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-01Don't log expected error on DFS referral requestSteve French
STATUS_FS_DRIVER_REQUIRED is expected when DFS is not turned on on the server. Do not log it on DFS referral response. It clutters the dmesg log unnecessarily at mount time. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com Reviewed-by: Ronnie sahlberg <lsahlber@redhat.com>
2018-04-01fs: cifs: Replace _free_xid call in cifs_root_iget functionPhillip Potter
Modify end of cifs_root_iget function in fs/cifs/inode.c to call free_xid(xid) instead of _free_xid(xid), thereby allowing debug notification of this action when enabled. Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: Steve French <smfrench@gmail.com>
2018-04-01SMB3.1.1 dialect is no longer experimentalSteve French
SMB3.1.1 is a very important dialect, with much improved security. We can remove the ExPERIMENTAL comments about it. It is widely supported by servers. Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2018-04-01Tree connect for SMB3.1.1 must be signed for non-encrypted sharesSteve French
SMB3.1.1 tree connect was only being signed when signing was mandatory but needs to always be signed (for non-guest users). See MS-SMB2 section 3.2.4.1.1 Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org>
2018-04-01fix smb3-encryption breakage when CONFIG_DEBUG_SG=yRonnie Sahlberg
We can not use the standard sg_set_buf() fucntion since when CONFIG_DEBUG_SG=y this adds a check that will BUG_ON for cifs.ko when we pass it an object from the stack. Create a new wrapper smb2_sg_set_buf() which avoids doing that particular check and use it for smb3 encryption instead. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2018-04-01CIFS: fix sha512 check in cifs_crypto_secmech_releaseGustavo A. R. Silva
It seems this is a copy-paste error and that the proper variable to use in this particular case is _sha512_ instead of _md5_. Addresses-Coverity-ID: 1465358 ("Copy-paste error") Fixes: 1c6614d229e7 ("CIFS: add sha512 secmech") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2018-04-01CIFS: implement v3.11 preauth integrityAurelien Aptel
SMB3.11 clients must implement pre-authentification integrity. * new mechanism to certify requests/responses happening before Tree Connect. * supersedes VALIDATE_NEGOTIATE * fixes signing for SMB3.11 Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-04-01CIFS: add sha512 secmechAurelien Aptel
* prepare for SMB3.11 pre-auth integrity * enable sha512 when SMB311 is enabled in Kconfig * add sha512 as a soft dependency Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-04-01CIFS: refactor crypto shash/sdesc allocation&freeAurelien Aptel
shash and sdesc and always allocated and freed together. * abstract this in new functions cifs_alloc_hash() and cifs_free_hash(). * make smb2/3 crypto allocation independent from each other. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org>
2018-04-01cifs: fix memory leak in SMB2_open()Ronnie Sahlberg
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2018-04-01CIFS: SMBD: fix spelling mistake: "faield" and "legnth"Colin Ian King
Trivial fix to spelling mistake in log_rdma_send and log_rdma_mr message text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-04-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30ext4: add extra checks to ext4_xattr_block_get()Theodore Ts'o
Add explicit checks in ext4_xattr_block_get() just in case the e_value_offs and e_value_size fields in the the xattr block are corrupted in memory after the buffer_verified bit is set on the xattr block. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2018-03-31btrfs: lift errors from add_extent_changeset to the callersDavid Sterba
The missing error handling in add_extent_changeset was hidden, so make it at least visible in the callers. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31Btrfs: print error messages when failing to read treesLiu Bo
When mount fails to read trees like fs tree, checksum tree, extent tree, etc, there is not enough information about where went wrong. With this, messages like "BTRFS warning (device sdf): failed to read root (objectid=7): -5" would help us a bit. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: user proper type for btrfs_mask_flags flagsDavid Sterba
All users pass a local unsigned int and not the __uXX types that are supposed to be used for userspace interfaces. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: split dev-replace locking helpers for read and writeDavid Sterba
The current calls are unclear in what way btrfs_dev_replace_lock takes the locks, so drop the argument, split the helpers and use similar naming as for read and write locks. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: remove stale comments about fs_mutexDavid Sterba
The fs_mutex has been killed in 2008, a213501153fd66e2 ("Btrfs: Replace the big fs_mutex with a collection of other locks"), still remembered in some comments. We don't have any extra needs for locking in the ACL handlers. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: use RCU in btrfs_show_devname for device list traversalDavid Sterba
The show_devname callback is used to print device name in /proc/self/mounts, we need to traverse the device list consistently and read the name that's copied to a seq buffer so we don't need further locking. If the first device is being deleted at the same time, the RCU will allow us to read the device name, though it will become stale right after the RCU protection ends. This is unavoidable and the user can expect that the device will disappear from the filesystem's list at some point. The device_list_mutex was pretty heavy as it is used eg. for writing superblock and a few other IO related contexts. This can stall any application that reads the proc file for no reason. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: update barrier in should_cow_blockDavid Sterba
Once there was a simple int force_cow that was used with the plain barriers, and then converted to a bit, so we should use the appropriate barrier helper. Other variables in the complex if condition do not depend on a barrier, so we should be fine in case the atomic barrier becomes a no-op. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: use lockdep_assert_held for mutexesDavid Sterba
Using lockdep_assert_held is preferred, replace mutex_is_locked. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: use lockdep_assert_held for spinlocksDavid Sterba
Using lockdep_assert_held is preferred, replace assert_spin_locked. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: Validate child tree block's level and first keyQu Wenruo
We have several reports about node pointer points to incorrect child tree blocks, which could have even wrong owner and level but still with valid generation and checksum. Although btrfs check could handle it and print error message like: leaf parent key incorrect 60670574592 Kernel doesn't have enough check on this type of corruption correctly. At least add such check to read_tree_block() and btrfs_read_buffer(), where we need two new parameters @level and @first_key to verify the child tree block. The new @level check is mandatory and all call sites are already modified to extract expected level from its call chain. While @first_key is optional, the following call sites are skipping such check: 1) Root node/leaf As ROOT_ITEM doesn't contain the first key, skip @first_key check. 2) Direct backref Only parent bytenr and level is known and we need to resolve the key all by ourselves, skip @first_key check. Another note of this verification is, it needs extra info from nodeptr or ROOT_ITEM, so it can't fit into current tree-checker framework, which is limited to node/leaf boundary. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: tests/qgroup: Fix wrong tree backref levelQu Wenruo
The extent tree of the test fs is like the following: BTRFS info (device (null)): leaf 16327509003777336587 total ptrs 1 free space 3919 item 0 key (4096 168 4096) itemoff 3944 itemsize 51 extent refs 1 gen 1 flags 2 tree block key (68719476736 0 0) level 1 ^^^^^^^ ref#0: tree block backref root 5 And it's using an empty tree for fs tree, so there is no way that its level can be 1. For REAL (created by mkfs) fs tree backref with no skinny metadata, the result should look like: item 3 key (30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51 refs 1 gen 4 flags TREE_BLOCK tree block key (256 INODE_ITEM 0) level 0 ^^^^^^^ tree block backref root 5 Fix the level to 0, so it won't break later tree level checker. Fixes: faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting code") Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31Btrfs: fix copy_items() return value when logging an inodeFilipe Manana
When logging an inode, at tree-log.c:copy_items(), if we call btrfs_next_leaf() at the loop which checks for the need to log holes, we need to make sure copy_items() returns the value 1 to its caller and not 0 (on success). This is because the path the caller passed was released and is now different from what is was before, and the caller expects a return value of 0 to mean both success and that the path has not changed, while a return value of 1 means both success and signals the caller that it can not reuse the path, it has to perform another tree search. Even though this is a case that should not be triggered on normal circumstances or very rare at least, its consequences can be very unpredictable (especially when replaying a log tree). Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31Btrfs: fix fsync after hole punching when using no-holes featureFilipe Manana
When we have the no-holes mode enabled and fsync a file after punching a hole in it, we can end up not logging the whole hole range in the log tree. This happens if the file has extent items that span more than one leaf and we punch a hole that covers a range that starts in a leaf but does not go beyond the offset of the first extent in the next leaf. Example: $ mkfs.btrfs -f -O no-holes -n 65536 /dev/sdb $ mount /dev/sdb /mnt $ for ((i = 0; i <= 831; i++)); do offset=$((i * 2 * 256 * 1024)) xfs_io -f -c "pwrite -S 0xab -b 256K $offset 256K" \ /mnt/foobar >/dev/null done $ sync # We now have 2 leafs in our filesystem fs tree, the first leaf has an # item corresponding the extent at file offset 216530944 and the second # leaf has a first item corresponding to the extent at offset 217055232. # Now we punch a hole that partially covers the range of the extent at # offset 216530944 but does go beyond the offset 217055232. $ xfs_io -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar <power fail> # mount to replay the log $ mount /dev/sdb /mnt # Before this patch, only the subrange [216658016, 216662016[ (length of # 4000 bytes) was logged, leaving an incorrect file layout after log # replay. Fix this by checking if there is a hole between the last extent item that we processed and the first extent item in the next leaf, and if there is one, log an explicit hole extent item. Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: use helper to set ulist aux from a qgroupDavid Sterba
We have a nice helper to do proper casting of a qgroup to a ulist aux value. And several places that could make use of it. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"Qu Wenruo
This reverts commit 48a89bc4f2ceab87bc858a8eb189636b09c846a7. The idea to commit transaction and free some space after hitting qgroup limit is good, although the problem is it can easily cause deadlocks. One deadlock example is caused by trying to flush data while still holding it: Call Trace: __schedule+0x49d/0x10f0 schedule+0xc6/0x290 schedule_timeout+0x187/0x1c0 wait_for_completion+0x204/0x3a0 btrfs_wait_ordered_extents+0xa40/0xaf0 [btrfs] qgroup_reserve+0x913/0xa10 [btrfs] btrfs_qgroup_reserve_data+0x3ef/0x580 [btrfs] btrfs_check_data_free_space+0x96/0xd0 [btrfs] __btrfs_buffered_write+0x3ac/0xd40 [btrfs] btrfs_file_write_iter+0x62a/0xba0 [btrfs] __vfs_write+0x320/0x430 vfs_write+0x107/0x270 SyS_write+0xbf/0x150 do_syscall_64+0x1b0/0x3d0 entry_SYSCALL64_slow_path+0x25/0x25 Another can be caused by trying to commit one transaction while nesting with trans handle held by ourselves: btrfs_start_transaction() |- btrfs_qgroup_reserve_meta_pertrans() |- qgroup_reserve() |- btrfs_join_transaction() |- btrfs_commit_transaction() The retry is causing more problems than exppected when limit is enabled. At least a graceful EDQUOT is way better than deadlock. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: qgroup: Update trace events for metadata reservationQu Wenruo
Now trace_qgroup_meta_reserve() will have extra type parameter. And introduce two new trace events: 1) trace_qgroup_meta_free_all_pertrans() For btrfs_qgroup_free_meta_all_pertrans() 2) trace_qgroup_meta_convert() For btrfs_qgroup_convert_reserved_meta() Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved spaceQu Wenruo
For quota disabled->enable case, it's possible that at reservation time quota was not enabled so no bytes were really reserved, while at release time, quota was enabled so we will try to release some bytes we didn't really own. Such situation can cause metadata reserveation underflow, for both types, also less possible for per-trans type since quota enable will commit transaction. To address this, record qgroup meta reserved bytes into root::qgroup_meta_rsv_pertrans and ::prealloc. So at releasing time we won't free any bytes we didn't reserve. For DATA, it's already handled by io_tree, so nothing needs to be done there. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and itemQu Wenruo
Quite similar for delalloc, some modification to delayed-inode and delayed-item reservation. Also needs extra parameter for release case to distinguish normal release and error release. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-30ext4: add bounds checking to ext4_xattr_find_entry()Theodore Ts'o
Add some paranoia checks to make sure we don't stray beyond the end of the valid memory region containing ext4 xattr entries while we are scanning for a match. Also rename the function to xattr_find_entry() since it is static and thus only used in fs/ext4/xattr.c Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2018-03-31btrfs: qgroup: Use separate meta reservation type for delallocQu Wenruo
Before this patch, btrfs qgroup is mixing per-transcation meta rsv with preallocated meta rsv, making it quite easy to underflow qgroup meta reservation. Since we have the new qgroup meta rsv types, apply it to delalloc reservation. Now for delalloc, most of its reserved space will use META_PREALLOC qgroup rsv type. And for callers reducing outstanding extent like btrfs_finish_ordered_io(), they will convert corresponding META_PREALLOC reservation to META_PERTRANS. This is mainly due to the fact that current qgroup numbers will only be updated in btrfs_commit_transaction(), that's to say if we don't keep such placeholder reservation, we can exceed qgroup limitation. And for callers freeing outstanding extent in error handler, we will just free META_PREALLOC bytes. This behavior makes callers of btrfs_qgroup_release_meta() or btrfs_qgroup_convert_meta() to be aware of which type they are. So in this patch, btrfs_delalloc_release_metadata() and its callers get an extra parameter to info qgroup to do correct meta convert/release. The good news is, even we use the wrong type (convert or free), it won't cause obvious bug, as prealloc type is always in good shape, and the type only affects how per-trans meta is increased or not. So the worst case will be at most metadata limitation can be sometimes exceeded (no convert at all) or metadata limitation is reached too soon (no free at all). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANSQu Wenruo
For meta_prealloc reservation users, after btrfs_join_transaction() caller will modify tree so part (or even all) meta_prealloc reservation should be converted to meta_pertrans until transaction commit time. This patch introduces a new function, btrfs_qgroup_convert_reserved_meta() to do this for META_PREALLOC reservation user. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroupQu Wenruo
Since qgroup has seperate metadata reservation types now, we can completely get rid of the old root->qgroup_meta_rsv, which mostly acts as current META_PERTRANS reservation type. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertransQu Wenruo
Btrfs uses 2 different methods to reseve metadata qgroup space. 1) Reserve at btrfs_start_transaction() time This is quite straightforward, caller will use the trans handler allocated to modify b-trees. In this case, reserved metadata should be kept until qgroup numbers are updated. 2) Reserve by using block_rsv first, and later btrfs_join_transaction() This is more complicated, caller will reserve space using block_rsv first, and then later call btrfs_join_transaction() to get a trans handle. In this case, before we modify trees, the reserved space can be modified on demand, and after btrfs_join_transaction(), such reserved space should also be kept until qgroup numbers are updated. Since these two types behave differently, split the original "META" reservation type into 2 sub-types: META_PERTRANS: For above case 1) META_PREALLOC: For reservations that happened before btrfs_join_transaction() of case 2) NOTE: This patch will only convert existing qgroup meta reservation callers according to its situation, not ensuring all callers are at correct timing. Such fix will be added in later patches. Signed-off-by: Qu Wenruo <wqu@suse.com> [ update comments ] Signed-off-by: David Sterba <dsterba@suse.com>