linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-10-03	xfs: create refcount update intent log items	Darrick J. Wong
	Create refcount update intent/done log items to record redo information in the log. Because we need to roll transactions between updating the bmbt mapping and updating the reverse mapping, we also have to track the status of the metadata updates that will be recorded in the post-roll transactions, just in case we crash before committing the final transaction. This mechanism enables log recovery to finish what was already started. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03	xfs: add refcount btree operations	Darrick J. Wong
	Implement the generic btree operations required to manipulate refcount btree blocks. The implementation is similar to the bmapbt, though it will only allocate and free blocks from the AG. Since the refcount root and level fields are separate from the existing roots and levels array, they need a separate logging flag. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [hch: fix logging of AGF refcount btree fields] Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-10-03	xfs: account for the refcount btree in the alloc/free log reservation	Darrick J. Wong
	Every time we allocate or free a data extent, we might need to split the refcount btree. Reserve some blocks in the transaction to handle this possibility. Even though the deferred refcount code can roll a transaction to avoid overloading the transaction, we can still exceed the reservation. Certain pathological workloads (1k blocks, no cowextsize hint, random directio writes), cause a perfect storm wherein a refcount adjustment of a large range of blocks causes full tree splits in two separate extents in two separate refcount tree blocks; allocating new refcount tree blocks causes rmap btree splits; and all the allocation activity causes the freespace btrees to split, blowing the reservation. (Reproduced by generic/167 over NFS atop XFS) Signed-off-by: Christoph Hellwig <hch@lst.de> [darrick.wong@oracle.com: add commit message] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-10-03	xfs: add refcount btree support to growfs	Darrick J. Wong
	Modify the growfs code to initialize new refcount btree blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03	xfs: define the on-disk refcount btree format	Darrick J. Wong
	Start constructing the refcount btree implementation by establishing the on-disk format and everything needed to read, write, and manipulate the refcount btree blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03	xfs: refcount btree add more reserved blocks	Darrick J. Wong
	Since XFS reserves a small amount of space in each AG as the minimum free space needed for an operation, save some more space in case we touch the refcount btree. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03	xfs: introduce refcount btree definitions	Darrick J. Wong
	Add new per-AG refcount btree definitions to the per-AG structures. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03	xfs: define tracepoints for refcount btree activities	Darrick J. Wong
	Define all the tracepoints we need to inspect the refcount btree runtime operation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03	xfs: return an error when an inline directory is too small	Darrick J. Wong
	If the size of an inline directory is so small that it doesn't even cover the required header size, return an error to userspace instead of ASSERTing and returning 0 like everything's ok. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03	vfs: add a FALLOC_FL_UNSHARE mode to fallocate to unshare a range of blocks	Darrick J. Wong
	Add a new fallocate mode flag that explicitly unshares blocks on filesystems that support such features. The new flag can only be used with an allocate-mode fallocate call. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-10-03	ceph: use list_move instead of list_del/list_add	Wei Yongjun
	Using list_move() instead of list_del() + list_add(). Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-03	ceph: handle CEPH_SESSION_REJECT message	Yan, Zheng
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03	ceph: avoid accessing / when mounting a subpath	Yan, Zheng
	Accessing / causes failuire if the client has caps that restrict path Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03	ceph: fix mandatory flock check	Yan, Zheng
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03	ceph: remove warning when ceph_releasepage() is called on dirty page	NeilBrown
	If O_DIRECT writes are racing with buffered writes, then the call to invalidate_inode_pages2_range() can call ceph_releasepage() on dirty pages. Most filesystems hold inode_lock() across O_DIRECT writes so they do not suffer this race, but cephfs deliberately drops the lock, and opens a window for the race. This race can be triggered with the generic/036 test from the xfstests test suite. It doesn't happen every time, but it does happen often. As the possibilty is expected, remove the warning, and instead include the PageDirty() status in the debug message. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-10-03	ceph: ignore error from invalidate_inode_pages2_range() in direct write	NeilBrown
	This call can fail if there are dirty pages. The preceding call to filemap_write_and_wait_range() will normally remove dirty pages, but as inode_lock() is not held over calls to ceph_direct_read_write(), it could race with non-direct writes and pages could be dirtied immediately after filemap_write_and_wait_range() returns If there are dirty pages, they will be removed by the subsequent call to truncate_inode_pages_range(), so having them here is not a problem. If the 'ret' value is left holding an error, then in the async IO case (aio_req is not NULL) the loop that would normally call ceph_osdc_start_request() will see the error in 'ret' and abort all requests. This doesn't seem like correct behaviour. So use separate 'ret2' instead of overloading 'ret'. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-10-03	ceph: fix error handling of start_read()	Yan, Zheng
	If start_page() fails to add a page to page cache or fails to send OSD request. It should cal put_page() (instead of free_page()) for relevant pages. Besides, start_page() need to cancel fscache readpage if it fails to send OSD request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reported-by: Zhi Zhang <zhang.david2011@gmail.com>
2016-10-03	fuse: limit xattr returned size	Miklos Szeredi
	Don't let userspace filesystem give bogus values for the size of xattr and xattr list. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-02	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	Three sets of overlapping changes. Nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03	Merge branch 'xfs-4.9-log-recovery-fixes' into for-next	Dave Chinner

2016-10-03	Merge branch 'iomap-4.9-dax' into for-next	Dave Chinner

2016-10-03	Merge branch 'xfs-4.9-delalloc-rework' into for-next	Dave Chinner

2016-10-03	Merge branch 'xfs-4.9-reflink-prep' into for-next	Dave Chinner

2016-10-03	Merge branch 'iomap-4.9-misc-fixes-1' into for-next	Dave Chinner

2016-10-03	xfs: update atime before I/O in xfs_file_dio_aio_read	Christoph Hellwig
	After the call to __blkdev_direct_IO the final reference to the file might have been dropped by aio_complete already, and the call to file_accessed might cause a use after free. Instead update the access time before the I/O, similar to how we update the time stamps before writes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-and-tested-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-03	ext2: fix possible integer truncation in ext2_iomap_begin	Christoph Hellwig
	For 32-bit architectures we need to cast first_block to u64 before shifting it left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-02	UBIFS: improve function-level documentation	Julia Lawall
	Fix various inconsistencies in the documentation associated with various functions. In the case of fs/ubifs/lprops.c, the second parameter of ubifs_get_lp_stats was renamed from st to lst in commit 84abf972ccff ("UBIFS: add re-mount debugging checks") In the case of fs/ubifs/lpt_commit.c, the excess variables have never existed in the associated functions since the code was introduced into the kernel. The others appear to be straightforward typos. Issues detected using Coccinelle (http://coccinelle.lip6.fr/) Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02	ubifs: fix host xattr_len when changing xattr	Pascal Eberhard
	When an extended attribute is changed, xattr_len of host inode is recalculated. ui->data_len is updated before computation and result is wrong. This patch adds a temporary variable to fix computation. To reproduce the issue: ~# > a.txt ~# attr -s an-attr -V a-value a.txt ~# attr -s an-attr -V a-bit-bigger-value a.txt Now host inode xattr_len is wrong. Forcing dbg_check_filesystem() generates the following error: [ 130.620140] UBIFS (ubi0:2): background thread "ubifs_bgt0_2" started, PID 565 [ 131.470790] UBIFS error (ubi0:2 pid 564): check_inodes: inode 646 has xattr size 240, but calculated size is 256 [ 131.481697] UBIFS (ubi0:2): dump of the inode 646 sitting in LEB 29:114688 [ 131.488953] magic 0x6101831 [ 131.492876] crc 0x9fce9091 [ 131.496836] node_type 0 (inode node) [ 131.501193] group_type 1 (in node group) [ 131.505788] sqnum 9278 [ 131.509191] len 160 [ 131.512549] key (646, inode) [ 131.516688] creat_sqnum 9270 [ 131.520133] size 0 [ 131.523264] nlink 1 [ 131.526398] atime 1053025857.0 [ 131.530574] mtime 1053025857.0 [ 131.534714] ctime 1053025906.0 [ 131.538849] uid 0 [ 131.542009] gid 0 [ 131.545140] mode 33188 [ 131.548636] flags 0x1 [ 131.551977] xattr_cnt 1 [ 131.555108] xattr_size 240 [ 131.558420] xattr_names 12 [ 131.561670] compr_type 0x1 [ 131.564983] data len 0 [ 131.568125] UBIFS error (ubi0:2 pid 564): dbg_check_filesystem: file-system check failed with error -22 [ 131.578074] CPU: 0 PID: 564 Comm: mount Not tainted 4.4.12-g3639bea54a #24 [ 131.585352] Hardware name: Generic AM33XX (Flattened Device Tree) [ 131.591918] [<c00151c0>] (unwind_backtrace) from [<c0012acc>] (show_stack+0x10/0x14) [ 131.600177] [<c0012acc>] (show_stack) from [<c01c950c>] (dbg_check_filesystem+0x464/0x4d0) [ 131.608934] [<c01c950c>] (dbg_check_filesystem) from [<c019f36c>] (ubifs_mount+0x14f8/0x2130) [ 131.617991] [<c019f36c>] (ubifs_mount) from [<c00d7088>] (mount_fs+0x14/0x98) [ 131.625572] [<c00d7088>] (mount_fs) from [<c00ed674>] (vfs_kern_mount+0x4c/0xd4) [ 131.633435] [<c00ed674>] (vfs_kern_mount) from [<c00efb5c>] (do_mount+0x988/0xb50) [ 131.641471] [<c00efb5c>] (do_mount) from [<c00f004c>] (SyS_mount+0x74/0xa0) [ 131.648837] [<c00f004c>] (SyS_mount) from [<c000fe20>] (ret_fast_syscall+0x0/0x3c) [ 131.665315] UBIFS (ubi0:2): background thread "ubifs_bgt0_2" stops Signed-off-by: Pascal Eberhard <pascal.eberhard@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02	ubifs: Use move variable in ubifs_rename()	Richard Weinberger
	...to make the code more consistent since we use move already in other places. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02	ubifs: Implement RENAME_EXCHANGE	Richard Weinberger
	Adds RENAME_EXCHANGE to UBIFS, the operation itself is completely disjunct from a regular rename() that's why we dispatch very early in ubifs_reaname(). RENAME_EXCHANGE used by the renameat2() system call allows the caller to exchange two paths atomically. Both paths have to exist and have to be on the same filesystem. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02	ubifs: Implement RENAME_WHITEOUT	Richard Weinberger
	Adds RENAME_WHITEOUT support to UBIFS, we implement it in the same way as ext4 and xfs do. For an overview of other ways to implement it please refere to commit 7dcf5c3e4527 ("xfs: add RENAME_WHITEOUT support"). Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02	ubifs: Implement O_TMPFILE	Richard Weinberger
	This patchs adds O_TMPFILE support to UBIFS. A temp file is a reference to an unlinked inode, a user holding the reference can use it. As soon it is being closed all data vanishes. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-01	fuse: remove duplicate cs->offset assignment	Miklos Szeredi
	Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01	fuse: don't use fuse_ioctl_copy_user() helper	Miklos Szeredi
	The two invocations share little code. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01	fuse_ioctl_copy_user(): don't open-code copy_page_{to,from}_iter()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01	fuse: Use generic xattr ops	Seth Forshee
	In preparation for posix acl support, rework fuse to use xattr handlers and the generic setxattr/getxattr/listxattr callbacks. Split the xattr code out into it's own file, and promote symbols to module-global scope as needed. Functionally these changes have no impact, as fuse still uses a single handler for all xattrs which uses the old callbacks. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01	fuse: get rid of fc->flags	Miklos Szeredi
	Only two flags: "default_permissions" and "allow_other". All other flags are handled via bitfields. So convert these two as well. They don't change during the lifetime of the filesystem, so this is quite safe. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01	fuse: listxattr: verify xattr list	Miklos Szeredi
	Make sure userspace filesystem is returning a well formed list of xattr names (zero or more nonzero length, null terminated strings). [Michael Theall: only verify in the nonzero size case] Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-10-01	fuse: use timespec64	Miklos Szeredi
	And check for valid nsec value before passing into timespec64_to_jiffies(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01	fuse: don't use ->d_time	Miklos Szeredi
	Store in memory pointed to by ->d_fsdata. Use ->d_init() to allocate the storage. Need to use RCU freeing because the data is used in RCU lookup mode. We could cast ->d_fsdata directly on 64bit archs, but I don't think this is worth the extra complexity. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01	fuse: Add posix ACL support	Seth Forshee
	Add a new INIT flag, FUSE_POSIX_ACL, for negotiating ACL support with userspace. When it is set in the INIT response, ACL support will be enabled. ACL support also implies "default_permissions". When ACL support is enabled, the kernel will cache and have responsibility for enforcing ACLs. ACL xattrs will be passed to userspace, which is responsible for updating the ACLs in the filesystem, keeping the file mode in sync, and inheritance of default ACLs when new filesystem nodes are created. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01	fuse: handle killpriv in userspace fs	Miklos Szeredi
	Only userspace filesystem can do the killing of suid/sgid without races. So introduce an INIT flag and negotiate support for this. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01	fuse: fix killing s[ug]id in setattr	Miklos Szeredi
	Fuse allowed VFS to set mode in setattr in order to clear suid/sgid on chown and truncate, and (since writeback_cache) write. The problem with this is that it'll potentially restore a stale mode. The poper fix would be to let the filesystems do the suid/sgid clearing on the relevant operations. Possibly some are already doing it but there's no way we can detect this. So fix this by refreshing and recalculating the mode. Do this only if ATTR_KILL_S[UG]ID is set to not destroy performance for writes. This is still racy but the size of the window is reduced. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-10-01	fuse: invalidate dir dentry after chmod	Miklos Szeredi
	Without "default_permissions" the userspace filesystem's lookup operation needs to perform the check for search permission on the directory. If directory does not allow search for everyone (this is quite rare) then userspace filesystem has to set entry timeout to zero to make sure permissions are always performed. Changing the mode bits of the directory should also invalidate the (previously cached) dentry to make sure the next lookup will have a chance of updating the timeout, if needed. Reported-by: Jean-Pierre André <jean-pierre.andre@wanadoo.fr> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-09-30	f2fs: introduce update_ckpt_flags to clean up	Jaegeuk Kim
	This patch add update_ckpt_flags() to clean up the flow. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30	f2fs: don't submit irrelevant page	Chao Yu
	While we call ->writepages, there are two cases: a. we didn't writeout any dirty pages, since they are writebacked by other thread concurrently. b. we writeout dirty pages, and have already submitted bio to block layer. In these cases, we don't need to do additional bio flushing unnecessarily, it may split bio in cache into smaller one. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30	f2fs: fix to commit bio cache after flushing node pages	Chao Yu
	In sync_node_pages, we won't check and commit last merged pages in private bio cache of f2fs, as these pages were taged as writeback, someone who is waiting for writebacking of the page will be blocked until the cache was committed by someone else. We need to commit node type bio cache to avoid potential deadlock or long delay of waiting writeback. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30	f2fs: introduce get_checkpoint_version for cleanup	Tiezhu Yang
	There exists almost same codes when get the value of pre_version and cur_version in function validate_checkpoint, this patch adds get_checkpoint_version to clean up redundant codes. Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30	f2fs: remove dead variable	Sheng Yong
	Signed-off-by: Sheng Yong <shengyong1@huawei.com> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30	f2fs: remove redundant io plug	Chao Yu
	Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>