linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-11-01	btrfs: remove BUG_ON in btrfs_rm_dev_replace_free_srcdev()	Anand Jain
	That was only an extra check to tackle a few bugs around this area, now its safe to remove it. Replace it by an ASSERT. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-11-01	btrfs: allow setting zlib compression level via :9	Adam Borowski
	This is bikeshedding, but it seems people are drastically more likely to understand "zlib:9" as compression level rather than an algorithm version compared to "zlib9". Based on feedback on the mailinglist, the ":9" will be the only accepted syntax. The level must be a single digit. Unrecognized format will result to the default, for forward compatibility in a similar way the compression algorithm specifier was relaxed in commit a7164fa4e055daf6368c ("btrfs: prepare for extensions in compression options"). Signed-off-by: Adam Borowski <kilobyte@angband.pl> Reviewed-by: David Sterba <dsterba@suse.com> [ tighten the accepted format ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-11-01	btrfs: allow to set compression level for zlib	David Sterba
	Preliminary support for setting compression level for zlib, the following works: $ mount -o compess=zlib # default $ mount -o compess=zlib0 # same $ mount -o compess=zlib9 # level 9, slower sync, less data $ mount -o compess=zlib1 # level 1, faster sync, more data $ mount -o remount,compress=zlib3 # level set by remount The compress-force works the same as compress'. The level is visible in the same format in /proc/mounts. Level set via file property does not work yet. Required patch: "btrfs: prepare for extensions in compression options" Signed-off-by: David Sterba <dsterba@suse.com>
2017-11-01	quota: fix potential infinite loop	zhangyi (F)
	In dquot_writeback_dquots(), we write back dquot from dirty dquots list. There is a potential infinite loop if ->write_dquot() failure and forget remove dquot from the list. This patch clear dirty bit anyway to avoid it. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	xfs: remove redundant assignment to variable bit	Colin Ian King
	Variable bit is being assigned a value that is never read, hence the assignment is redundant and can be removed. Cleans up clang warning: fs/xfs/libxfs/xfs_rtbitmap.c:675:3: warning: Value stored to 'bit' is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-31	fscrypt: lock mutex before checking for bounce page pool	Eric Biggers
	fscrypt_initialize(), which allocates the global bounce page pool when an encrypted file is first accessed, uses "double-checked locking" to try to avoid locking fscrypt_init_mutex. However, it doesn't use any memory barriers, so it's theoretically possible for a thread to observe a bounce page pool which has not been fully initialized. This is a classic bug with "double-checked locking". While "only a theoretical issue" in the latest kernel, in pre-4.8 kernels the pointer that was checked was not even the last to be initialized, so it was easily possible for a crash (NULL pointer dereference) to happen. This was changed only incidentally by the large refactor to use fs/crypto/. Solve both problems in a trivial way that can easily be backported: just always take the mutex. It's theoretically less efficient, but it shouldn't be noticeable in practice as the mutex is only acquired very briefly once per encrypted file. Later I'd like to make this use a helper macro like DO_ONCE(). However, DO_ONCE() runs in atomic context, so we'd need to add a new macro that allows blocking. Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-31	isofs: use unsigned char types consistently	Arnd Bergmann
	Based on the discussion about the signed character field for the year, I went through all fields in the iso9660 and rockridge standards to see whether they should used signed or unsigned characters. Only a single 8-bit value is defined as signed per 'section 7.1.2': the timezone offset in a timestamp, this has always been handled correctly through explicit sign-extension. All others are either '7.1.1 8-bit unsigned numerical values' or composite fields. I also read the linux source code and came to the same conclusion, also I could not find any other part of the implementation that actually behaves differently for signed or unsigned values. Since it is still ambigous to use plain 'char' in interface definitions, I'm changing all fields representing numbers and reserved bytes to the unambiguous '__u8'. Fields that hold actual strings are left as 'char' arrays. I built the code with '-Wpointer-sign -Wsign-compare' to see if anything got left out, but couldn't find anything wrong with the remaining warnings. This patch should not change runtime behavior and does not need to be backported. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	isofs: fix timestamps beyond 2027	Arnd Bergmann
	isofs uses a 'char' variable to load the number of years since 1900 for an inode timestamp. On architectures that use a signed char type by default, this results in an invalid date for anything beyond 2027. This changes the function argument to a 'u8' array, which is defined the same way on all architectures, and unambiguously lets us use years until 2155. This should be backported to all kernels that might still be in use by that date. Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	fsnotify: convert fsnotify_mark.refcnt from atomic_t to refcount_t	Elena Reshetova
	atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable fsnotify_mark.refcnt is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	fanotify: clean up CONFIG_FANOTIFY_ACCESS_PERMISSIONS ifdefs	Miklos Szeredi
	The only negative from this patch should be an addition of 32bytes to 'struct fsnotify_group' if CONFIG_FANOTIFY_ACCESS_PERMISSIONS is not defined. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	fsnotify: clean up fsnotify()	Miklos Szeredi
	Use helpers to get first and next marks from connector. Also get rid of inode_node/vfsmount_node local variables, which just refers to the same objects as iter_info. There was an srcu_dereference() for foo_node, but that's completely superfluous since we've already done it when obtaining foo_node. Also get rid of inode_group/vfsmount_group local variables; checking against non-NULL for these is the same as checking against non-NULL inode_mark/vfsmount_mark. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	fanotify: fix fsnotify_prepare_user_wait() failure	Miklos Szeredi
	If fsnotify_prepare_user_wait() fails, we leave the event on the notification list. Which will result in a warning in fsnotify_destroy_event() and later use-after-free. Instead of adding a new helper to remove the event from the list in this case, I opted to move the prepare/finish up into fanotify_handle_event(). This will allow these to be moved further out into the generic code later, and perhaps let us move to non-sleeping RCU. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 05f0e38724e8 ("fanotify: Release SRCU lock when waiting for userspace response") Cc: <stable@vger.kernel.org> # v4.12 Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	fsnotify: fix pinning group in fsnotify_prepare_user_wait()	Miklos Szeredi
	Blind increment of group's user_waits is not enough, we could be far enough in the group's destruction that it isn't taken into account (i.e. grabbing the mark ref afterwards doesn't guarantee that it was the ref coming from the _group_ that was grabbed). Instead we need to check (under lock) that the mark is still attached to the group after having obtained a ref to the mark. If not, skip it. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 9385a84d7e1f ("fsnotify: Pass fsnotify_iter_info into handle_event handler") Cc: <stable@vger.kernel.org> # v4.12 Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	fsnotify: pin both inode and vfsmount mark	Miklos Szeredi
	We may fail to pin one of the marks in fsnotify_prepare_user_wait() when dropping the srcu read lock, resulting in use after free at the next iteration. Solution is to store both marks in iter_info instead of just the one we'll be sending the event for. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 9385a84d7e1f ("fsnotify: Pass fsnotify_iter_info into handle_event handler") Cc: <stable@vger.kernel.org> # v4.12 Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	fsnotify: clean up fsnotify_prepare/finish_user_wait()	Miklos Szeredi
	This patch doesn't actually fix any bug, just paves the way for fixing mark and group pinning. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # v4.12 Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	fsnotify: convert fsnotify_group.refcnt from atomic_t to refcount_t	Elena Reshetova
	atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable fsnotify_group.refcnt is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	fsnotify: Protect bail out path of fsnotify_add_mark_locked() properly	Jan Kara
	When fsnotify_add_mark_locked() fails it cleans up the mark it was adding. Since the mark is already visible in group's list, we should protect update of mark->flags with mark->lock. I'm not aware of any real issues this could cause (since we also hold group->mark_mutex) but better be safe and obey locking rules properly. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	dnotify: Handle errors from fsnotify_add_mark_locked() in fcntl_dirnotify()	Jan Kara
	fsnotify_add_mark_locked() can fail but we do not check its return value. This didn't matter before commit 9dd813c15b2c "fsnotify: Move mark list head from object into dedicated structure" as none of possible failures could happen for dnotify but after that commit -ENOMEM can be returned. Handle this error properly in fcntl_dirnotify() as otherwise we just hit BUG_ON(dn_mark->dn) in dnotify_free_mark(). Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reported-by: syzkaller Fixes: 9dd813c15b2c101168808d4f5941a29985758973 Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-31	treewide: Fix function prototypes for module_param_call()	Kees Cook
	Several function prototypes for the set/get functions defined by module_param_call() have a slightly wrong argument types. This fixes those in an effort to clean up the calls when running under type-enforced compiler instrumentation for CFI. This is the result of running the following semantic patch: @match_module_param_call_function@ declarer name module_param_call; identifier _name, _set_func, _get_func; expression _arg, _mode; @@ module_param_call(_name, _set_func, _get_func, _arg, _mode); @fix_set_prototype depends on match_module_param_call_function@ identifier match_module_param_call_function._set_func; identifier _val, _param; type _val_type, _param_type; @@ int _set_func( -_val_type _val +const char * _val , -_param_type _param +const struct kernel_param * _param ) { ... } @fix_get_prototype depends on match_module_param_call_function@ identifier match_module_param_call_function._get_func; identifier _val, _param; type _val_type, _param_type; @@ int _get_func( -_val_type _val +char * _val , -_param_type _param +const struct kernel_param * _param ) { ... } Two additional by-hand changes are included for places where the above Coccinelle script didn't notice them: drivers/platform/x86/thinkpad_acpi.c fs/lockd/svc.c Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2017-10-31	gfs2: Allow gfs2_xattr_set to be called with the glock held	Andreas Gruenbacher
	On the following call path: gfs2_setattr -> setattr_prepare -> ... -> cap_inode_killpriv -> ... -> gfs2_xattr_set the glock is locked in gfs2_setattr, so check for recursive locking in gfs2_xattr_set as gfs2_xattr_get already does. While at it, get rid of need_unlock in gfs2_xattr_get. Fixes xfstest generic/093. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Abhijith Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31	gfs2: Add support for statx inode flags	Andreas Gruenbacher
	Add support for the STATX_ATTR_ flags in statx. (Compression, encryption, and the nodump flag are not supported by gfs2.) Partially fixes xfstest generic/424. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31	gfs2: Fix and clean up {GET,SET}FLAGS ioctl	Andreas Gruenbacher
	Switch to a simple array for mapping between the FS__FL and GFS_DIF_ flags. Clarify how the mapping between FS_JOURNAL_DATA_FL and the filesystem flags works. The GFS2_DIF_SYSTEM flag cannot be set from user space, so remove it from GFS2_FLAGS_USER_SET. Fail with -EINVAL when trying to set flags that are not supported instead of silently ignoring those flags. Partially fixes xfstest generic/424. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31	gfs2: Fix a harmless typo	Andreas Gruenbacher
	Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31	gfs2: Fix xattr fsync	Andreas Gruenbacher
	Make sure that changing xattrs marks the corresponding inode dirty so that a subsequent fsync will sync those changes to disk. We set I_DIRTY_SYNC as well as I_DIRTY_DATASYNC so that both fsync and fdatasync will sync xattr changes: xattrs can contain information critical to how the data can be accessed, so we don't want fdatasync to skip them. Fixes xfstest generic/066. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31	GFS2: Take inode off order_write list when setting jdata flag	Bob Peterson
	This patch fixes a deadlock caused when the jdata flag is set for inodes that are already on the ordered write list. Since it is on the ordered write list, log_flush calls gfs2_ordered_write which calls filemap_fdatawrite. But since the inode had the jdata flag set, that calls gfs2_jdata_writepages, which tries to start a new transaction. A new transaction cannot be started because it tries to acquire the log_flush rwsem which is already locked by the log flush operation. The bottom line is: We cannot switch an inode from ordered to jdata until we eliminate any ordered data pages (via log flush) or any log_flush operation afterward will create the circular dependency above. So we need to flush the log before setting the diskflags to switch the file mode, then we need to remove the inode from the ordered writes list. Before this patch, the log flush was done for jdata->ordered, but that's wrong. If we're going from jdata to ordered, we don't need to call gfs2_log_flush because the call to filemap_fdatawrite will do it for us: filemap_fdatawrite() -> __filemap_fdatawrite_range() __filemap_fdatawrite_range() -> do_writepages() do_writepages() -> gfs2_jdata_writepages() gfs2_jdata_writepages() -> gfs2_log_flush() This patch modifies function do_gfs2_set_flags so that if a file has its jdata flag set, and it's already on the ordered write list, the log will be flushed and it will be removed from the list before setting the flag. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Abhijith Das <adas@redhat.com>
2017-10-31	GFS2: flush the log and all pages for jdata as we do for WB_SYNC_ALL	Bob Peterson
	In function gfs2_write_inode, starting with patch a9185b41a4f84, we only flush the log and call filemap_fdatawait if we're passed in a wbc sync_mode of WB_SYNC_ALL. We also need to do these things if we're evicting a jdata inode, because we might have jdata pages still attached to bufdata descriptors that need to be revoked, but by the time it gets to evict() it's too late to start a new transaction. This patch changes it to treat jdata inodes as if WB_SYNC_ALL had been specified. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Abhijith Das <adas@redhat.com>
2017-10-31	gfs2: Implement SEEK_HOLE / SEEK_DATA via iomap	Andreas Gruenbacher
	So far, lseek on gfs2 did not report holes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31	GFS2: Switch fiemap implementation to use iomap	Bob Peterson
	This patch switches GFS2's implementation of fiemap from the old block_map code to the new iomap interface. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31	GFS2: Implement iomap for block_map	Bob Peterson
	This patch implements iomap for block mapping, and switches the block_map function to use it under the covers. The additional IOMAP_F_BOUNDARY iomap flag indicates when iomap has reached a "metadata boundary" and fetching the next mapping is likely to incur an additional I/O. This flag is used for setting the bh buffer boundary flag. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31	GFS2: Make height info part of metapath	Bob Peterson
	This patch eliminates height parameters from function gfs2_bmap_alloc. Function find_metapath determines the metapath's "find height", also known as the desired height. Function lookup_metapath determines the metapath's "actual height", previously known as starting height or sheight. Function gfs2_bmap_alloc now gets both height values from the metapath. This simplification was done as a step toward switching the block_map functions to using iomap. The bh_map responsibilities are also removed from function gfs2_bmap_alloc for the same reason. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-30	jfs: remove increment of i_version counter	Jeff Layton
	JFS does not set SB_I_VERSION and doesn't use the i_version counter internally. Just remove this increment. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2017-10-30	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	Several conflicts here. NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to nfp_fl_output() needed some adjustments because the code block is in an else block now. Parallel additions to net/pkt_cls.h and net/sch_generic.h A bug fix in __tcp_retransmit_skb() conflicted with some of the rbtree changes in net-next. The tc action RCU callback fixes in 'net' had some overlap with some of the recent tcf_block reworking. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-30	btrfs: Replace opencoded sizes with their symbolic constants	Nikolay Borisov
	Currently btrfs' code uses a mix of opencoded sizes and defines from sizes.h. Let's unifiy the code base to always use the symbolic constants. No functional changes Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: Use bd_dev to generate index when dev_state_hashtable add items.	Gu JinXiang
	Fix missing change from commit f8f84b2dfda5 ("btrfs: index check-integrity state hash by a dev_t"). Function btrfsic_dev_state_hashtable_lookup uses dev_t to generate hashval when look in up a btrfsic_dev_state in hash table. So when we add a btrfsic_dev_state into the hash table, it should also use dev_t. Reproducer of this bug: Use MOUNT_OPTIONS="-o check_int" when running xfstest, device can not be mounted successfully. So xfstest can not run. Signed-off-by: Gu JinXiang <gujx@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: fix false EIO for missing device	Anand Jain
	When one of the device is missing, bbio_error() takes care of setting the error status. And if its only IO that is pending in that stripe, it fails to check the status of the other IO at %bbio_error before setting the error %bi_status for the %orig_bio. Fix this by checking if %bbio->error has exceeded the %bbio->max_errors. Reproducer as below fdatasync error is seen intermittently. mount -o degraded /dev/sdc /btrfs dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error The reason for the intermittences of the problem is because the following conditions have to be met, which depends on timing: In btrfs_map_bio() - the RAID1 the missing device has to be at %dev_nr = 1 In bbio_error() . before bbio_error() is called the bio of the not-missing device at %dev_nr = 0 must be completed so that the below condition is true if (atomic_dec_and_test(&bbio->stripes_pending)) { Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: use need_full_stripe() in __btrfs_map_block()	Anand Jain
	A cleanup patch, use need_full_stripe() to replace the open code. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Qu Wenruo <quwenruo.btrfs@gmx.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: cleanup extent locking sequence	Goldwyn Rodrigues
	Code cleanup for better understanding: Variable needs_unlock to be called extent_locked to show state as opposed to action. Changed the type to int, to reduce code in the critical path. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: use BLK_STS defines where needed	Anand Jain
	At few places we could use BLK_STS_OK and BLK_STS_NOSUPP. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Satoru Taekeuchi <satoru.takeuchi@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> [ dropped first hunk btrfs_endio_direct_read ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: add assertions for releasing trans handle reservations	Josef Bacik
	These are useful for debugging problems where we mess with trans->block_rsv to make sure we're not screwing something up. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: remove type argument from comp_tree_refs	Josef Bacik
	We can get this from the ref we've passed in. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: remove delayed_ref_node from ref_head	Josef Bacik
	This is just excessive information in the ref_head, and makes the code complicated. It is a relic from when we had the heads and the refs in the same tree, which is no longer the case. With this removal I've cleaned up a bunch of the cruft around this old assumption as well. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: move all ref head cleanup to the helper function	Josef Bacik
	We do a couple different cleanup operations on the ref head. We adjust counters, we'll free any reserved space if we didn't end up using the ref, and we clear the pending csum bytes. Move all these disparate things into cleanup_ref_head and clean up the logic in __btrfs_run_delayed_refs so that it handles the !ref case a lot cleaner, as well as making run_one_delayed_ref() only deal with real refs and not the ref head. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: move ref_mod modification into the if (ref) logic	Josef Bacik
	We only use this logic if our ref isn't a ref_head, so move it up into the if (ref) case since we know that this is a normal ref and not a delayed ref head. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: breakout empty head cleanup to a helper	Josef Bacik
	Move this code out to a helper function to further simplivy __btrfs_run_delayed_refs. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: move extent_op cleanup to a helper	Josef Bacik
	Move the extent_op cleanup for an empty head ref to a helper function to help simplify __btrfs_run_delayed_refs. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: add a helper to return a head ref	Josef Bacik
	Simplify the error handling in __btrfs_run_delayed_refs by breaking out the code used to return a head back to the delayed_refs tree for processing into a helper function. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	Btrfs: only check delayed ref usage in should_end_transaction	Josef Bacik
	We were only doing btrfs_check_space_for_delayed_refs() if the metadata space was full, ie we couldn't allocate chunks. This assumes we'll be able to allocate chunks during transaction commit, but since nothing does a LIMIT flush during the transaction commit this won't actually happen unless we happen to run shy of actual space. We already take into account a full fs in btrfs_check_space_for_delayed_refs() so just kill this extra check to make sure we're ending the transaction when we need to. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	Btrfs: add a extent ref verify tool	Josef Bacik
	We were having corruption issues that were tied back to problems with the extent tree. In order to track them down I built this tool to try and find the culprit, which was pretty successful. If you compile with this tool on it will live verify every ref update that the fs makes and make sure it is consistent and valid. I've run this through with xfstests and haven't gotten any false positives. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update error messages, add fixup from Dan Carpenter to handle errors of read_tree_block ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: pass root to various extent ref mod functions	Josef Bacik
	We need the actual root for the ref verifier tool to work, so change these functions to pass the root around instead. This will be used in a subsequent patch. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30	btrfs: add ref-verify mount option	Josef Bacik
	This adds the infrastructure for turning ref verify on and off for a mount, to be used by a later patch. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> [ enhnance btrfs_print_mod_info to print if ref-verify is compiled in ] Signed-off-by: David Sterba <dsterba@suse.com>