summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-01-20Merge tag 'ceph-for-4.10-rc5' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Three filesystem endianness fixes (one goes back to the 2.6 era, all marked for stable) and two fixups for this merge window's patches" * tag 'ceph-for-4.10-rc5' of git://github.com/ceph/ceph-client: ceph: fix bad endianness handling in parse_reply_info_extra ceph: fix endianness bug in frag_tree_split_cmp ceph: fix endianness of getattr mask in ceph_d_revalidate libceph: make sure ceph_aes_crypt() IV is aligned ceph: fix ceph_get_caps() interruption
2017-01-20Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fix from Miklos Szeredi: "This fixes a regression introduced in this cycle" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix possible use after free on redirect dir lookup
2017-01-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Fix two regressions, one introduced in 4.9 and a less recent one in 4.2" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix time_to_jiffies nsec sanity check fuse: clear FR_PENDING flag when moving requests out of pending queue
2017-01-19Merge tag 'xfs-for-linux-4.10-rc5-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: "I have a few more patches this week -- one to make the behavior of a quota id ioctl consistent with the other filesystems, and the rest improve validation of i_mode & i_size values coming into xfs so that we don't read off the ends of arrays or crash when handed garbage disk data. Summary: - inode i_mode sanitization - prevent overflows in getnextquota - minor build fixes" * tag 'xfs-for-linux-4.10-rc5-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix xfs_mode_to_ftype() prototype xfs: don't wrap ID in xfs_dq_get_next_id xfs: sanity check inode di_mode xfs: sanity check inode mode when creating new dentry xfs: replace xfs_mode_to_ftype table with switch statement xfs: add missing include dependencies to xfs_dir2.h xfs: sanity check directory inode di_size xfs: make the ASSERT() condition likely
2017-01-18xfs: fix xfs_mode_to_ftype() prototypeArnd Bergmann
A harmless warning just got introduced: fs/xfs/libxfs/xfs_dir2.h:40:8: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers] Removing the 'const' modifier avoids the warning and has no other effect. Fixes: 1fc4d33fed12 ("xfs: replace xfs_mode_to_ftype table with switch statement") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-18ceph: fix bad endianness handling in parse_reply_info_extraJeff Layton
sparse says: fs/ceph/mds_client.c:291:23: warning: restricted __le32 degrades to integer fs/ceph/mds_client.c:293:28: warning: restricted __le32 degrades to integer fs/ceph/mds_client.c:294:28: warning: restricted __le32 degrades to integer fs/ceph/mds_client.c:296:28: warning: restricted __le32 degrades to integer The op value is __le32, so we need to convert it before comparing it. Cc: stable@vger.kernel.org # needs backporting for < 3.14 Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-01-18ceph: fix endianness bug in frag_tree_split_cmpJeff Layton
sparse says: fs/ceph/inode.c:308:36: warning: incorrect type in argument 1 (different base types) fs/ceph/inode.c:308:36: expected unsigned int [unsigned] [usertype] a fs/ceph/inode.c:308:36: got restricted __le32 [usertype] frag fs/ceph/inode.c:308:46: warning: incorrect type in argument 2 (different base types) fs/ceph/inode.c:308:46: expected unsigned int [unsigned] [usertype] b fs/ceph/inode.c:308:46: got restricted __le32 [usertype] frag We need to convert these values to host-endian before calling the comparator. Fixes: a407846ef7c6 ("ceph: don't assume frag tree splits in mds reply are sorted") Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-01-18ceph: fix endianness of getattr mask in ceph_d_revalidateJeff Layton
sparse says: fs/ceph/dir.c:1248:50: warning: incorrect type in assignment (different base types) fs/ceph/dir.c:1248:50: expected restricted __le32 [usertype] mask fs/ceph/dir.c:1248:50: got int [signed] [assigned] mask Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry") Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-01-18ceph: fix ceph_get_caps() interruptionYan, Zheng
Commit 5c341ee32881 ("ceph: fix scheduler warning due to nested blocking") causes infinite loop when process is interrupted. Fix it. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-01-18ovl: fix possible use after free on redirect dir lookupAmir Goldstein
ovl_lookup_layer() iterates on path elements of d->name.name but also frees and allocates a new pointer for d->name.name. For the case of lookup in upper layer, the initial d->name.name pointer is stable (dentry->d_name), but for lower layers, the initial d->name.name can be d->redirect, which can be freed during iteration. [SzM] Keep the count of remaining characters in the redirect path and calculate the current position from that. This works becuase only the prefix is modified, the ending always stays the same. Fixes: 02b69b284cd7 ("ovl: lookup redirects") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-01-17xfs: don't wrap ID in xfs_dq_get_next_idEric Sandeen
The GETNEXTQOTA ioctl takes whatever ID is sent in, and looks for the next active quota for an user equal or higher to that ID. But if we are at the maximum ID and then ask for the "next" one, we may wrap back to zero. In this case, userspace may loop forever, because it will start querying again at zero. We'll fix this in userspace as well, but for the kernel, return -ENOENT if we ask for the next quota ID past UINT_MAX so the caller knows to stop. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-17xfs: sanity check inode di_modeAmir Goldstein
Check for invalid file type in xfs_dinode_verify() and fail to load the inode structure from disk. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-17xfs: sanity check inode mode when creating new dentryAmir Goldstein
The helper xfs_dentry_to_name() is used by 2 different classes of callers: Callers that pass zero mode and don't care about the returned name.type field and Callers that pass non zero mode and do care about the name.type field. Change xfs_dentry_to_name() to not take the mode argument and change the call sites of the first class to not pass the mode argument. Create a new helper xfs_dentry_mode_to_name() which does pass the mode argument and returns -EFSCORRUPTED if mode is invalid. Callers that translate non zero mode to on-disk file type now check the return value and will export the error to user instead of staging an invalid file type to be written to directory entry. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-17xfs: replace xfs_mode_to_ftype table with switch statementAmir Goldstein
The size of the xfs_mode_to_ftype[] conversion table was too small to handle an invalid value of mode=S_IFMT. Instead of fixing the table size, replace the conversion table with a conversion helper that uses a switch statement. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-17xfs: add missing include dependencies to xfs_dir2.hAmir Goldstein
xfs_dir2.h dereferences some data types in inline functions and fails to include those type definitions, e.g.: xfs_dir2_data_aoff_t, struct xfs_da_geometry. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-17xfs: sanity check directory inode di_sizeAmir Goldstein
This changes fixes an assertion hit when fuzzing on-disk i_mode values. The easy case to fix is when changing an empty file i_mode to S_IFDIR. In this case, xfs_dinode_verify() detects an illegal zero size for directory and fails to load the inode structure from disk. For the case of non empty file whose i_mode is changed to S_IFDIR, the ASSERT() statement in xfs_dir2_isblock() is replaced with return -EFSCORRUPTED, to avoid interacting with corrupted jusk also when XFS_DEBUG is disabled. Suggested-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-17xfs: make the ASSERT() condition likelyAmir Goldstein
The ASSERT() condition is the normal case, not the exception, so testing the condition should be likely(), not unlikely(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-17ubifs: Fix journal replay wrt. xattr nodesRichard Weinberger
When replaying the journal it can happen that a journal entry points to a garbage collected node. This is the case when a power-cut occurred between a garbage collect run and a commit. In such a case nodes have to be read using the failable read functions to detect whether the found node matches what we expect. One corner case was forgotten, when the journal contains an entry to remove an inode all xattrs have to be removed too. UBIFS models xattr like directory entries, so the TNC code iterates over all xattrs of the inode and removes them too. This code re-uses the functions for walking directories and calls ubifs_tnc_next_ent(). ubifs_tnc_next_ent() expects to be used only after the journal and aborts when a node does not match the expected result. This behavior can render an UBIFS volume unmountable after a power-cut when xattrs are used. Fix this issue by using failable read functions in ubifs_tnc_next_ent() too when replaying the journal. Cc: stable@vger.kernel.org Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system") Reported-by: Rock Lee <rockdotlee@gmail.com> Reviewed-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-01-17ubifs: remove redundant checks for encryption keyEric Biggers
In several places, ubifs checked for an encryption key before creating a file in an encrypted directory. This was redundant with fscrypt_setup_filename() or ubifs_new_inode(), and in the case of ubifs_link() it broke linking to special files. So remove the extra checks. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-01-17ubifs: allow encryption ioctls in compat modeEric Biggers
The ubifs encryption ioctls did not work when called by a 32-bit program on a 64-bit kernel. Since 'struct fscrypt_policy' is not affected by the word size, ubifs just needs to allow these ioctls through, like what ext4 and f2fs do. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-01-17ubifs: add CONFIG_BLOCK dependency for encryptionArnd Bergmann
This came up during the v4.10 merge window: warning: (UBIFS_FS_ENCRYPTION) selects FS_ENCRYPTION which has unmet direct dependencies (BLOCK) fs/crypto/crypto.c: In function 'fscrypt_zeroout_range': fs/crypto/crypto.c:355:9: error: implicit declaration of function 'bio_alloc';did you mean 'd_alloc'? [-Werror=implicit-function-declaration] bio = bio_alloc(GFP_NOWAIT, 1); The easiest way out is to limit UBIFS_FS_ENCRYPTION to configurations that also enable BLOCK. Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-01-17ubifs: fix unencrypted journal writePeter Rosin
Without this, I get the following on reboot: UBIFS error (ubi1:0 pid 703): ubifs_load_znode: bad target node (type 1) length (8240) UBIFS error (ubi1:0 pid 703): ubifs_load_znode: have to be in range of 48-4144 UBIFS error (ubi1:0 pid 703): ubifs_load_znode: bad indexing node at LEB 13:11080, error 5 magic 0x6101831 crc 0xb1cb246f node_type 9 (indexing node) group_type 0 (no node group) sqnum 546 len 128 child_cnt 5 level 0 Branches: 0: LEB 14:72088 len 161 key (133, inode) 1: LEB 14:81120 len 160 key (134, inode) 2: LEB 20:26624 len 8240 key (134, data, 0) 3: LEB 14:81280 len 160 key (135, inode) 4: LEB 20:34864 len 8240 key (135, data, 0) UBIFS warning (ubi1:0 pid 703): ubifs_ro_mode.part.0: switched to read-only mode, error -22 CPU: 0 PID: 703 Comm: mount Not tainted 4.9.0-next-20161213+ #1197 Hardware name: Atmel SAMA5 [<c010d2ac>] (unwind_backtrace) from [<c010b250>] (show_stack+0x10/0x14) [<c010b250>] (show_stack) from [<c024df94>] (ubifs_jnl_update+0x2e8/0x614) [<c024df94>] (ubifs_jnl_update) from [<c0254bf8>] (ubifs_mkdir+0x160/0x204) [<c0254bf8>] (ubifs_mkdir) from [<c01a6030>] (vfs_mkdir+0xb0/0x104) [<c01a6030>] (vfs_mkdir) from [<c0286070>] (ovl_create_real+0x118/0x248) [<c0286070>] (ovl_create_real) from [<c0283ed4>] (ovl_fill_super+0x994/0xaf4) [<c0283ed4>] (ovl_fill_super) from [<c019c394>] (mount_nodev+0x44/0x9c) [<c019c394>] (mount_nodev) from [<c019c4ac>] (mount_fs+0x14/0xa4) [<c019c4ac>] (mount_fs) from [<c01b5338>] (vfs_kern_mount+0x4c/0xd4) [<c01b5338>] (vfs_kern_mount) from [<c01b6b80>] (do_mount+0x154/0xac8) [<c01b6b80>] (do_mount) from [<c01b782c>] (SyS_mount+0x74/0x9c) [<c01b782c>] (SyS_mount) from [<c0107f80>] (ret_fast_syscall+0x0/0x3c) UBIFS error (ubi1:0 pid 703): ubifs_mkdir: cannot create directory, error -22 overlayfs: failed to create directory /mnt/ovl/work/work (errno: 22); mounting read-only Fixes: 7799953b34d1 ("ubifs: Implement encrypt/decrypt for all IO") Signed-off-by: Peter Rosin <peda@axentia.se> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-01-17ubifs: ensure zero err is returned on successful returnColin Ian King
err is no longer being set on a successful return path, causing a garbage value being returned. Fix this by setting err to zero for the successful return path. Found with static analysis by CoverityScan, CID 1389473 Fixes: 7799953b34d18 ("ubifs: Implement encrypt/decrypt for all IO") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-01-16Merge tag 'nfs-for-4.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - fix invalid fget()/fput() calls when doing file locking - fix multiple directory cache invalidation issues due to the client failing to recognise that the directory wasn't changed - fix client recovery when server reboots multiple times * tag 'nfs-for-4.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Fix client recovery when server reboots multiple times NFSv4: update_changeattr should update the attribute timestamp NFSv4: Don't call update_changeattr() unless the unlink is successful NFSv4: Don't apply change_info4 twice on rename within a directory NFSv4: Call update_changeattr() from _nfs4_proc_open only if a file was created nfs: Don't take a reference on fl->fl_file for LOCK operation
2017-01-16Merge tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "Miscellaneous nfsd bugfixes, one for a 4.10 regression, three for older bugs" * tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linux: svcrdma: avoid duplicate dma unmapping during error recovery sunrpc: don't call sleeping functions from the notifier block callbacks svcrpc: don't leak contexts on PROC_DESTROY nfsd: fix supported attributes for acl & labels
2017-01-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fixes from Eric Biederman: "This tree contains 4 fixes. The first is a fix for a race that can causes oopses under the right circumstances, and that someone just recently encountered. Past that are several small trivial correct fixes. A real issue that was blocking development of an out of tree driver, but does not appear to have caused any actual problems for in-tree code. A potential deadlock that was reported by lockdep. And a deadlock people have experienced and took the time to track down caused by a cleanup that removed the code to drop a reference count" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: sysctl: Drop reference added by grab_header in proc_sys_readdir pid: fix lockdep deadlock warning due to ucount_lock libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount mnt: Protect the mountpoint hashtable with mount_lock
2017-01-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro. The most notable fix here is probably the fix for a splice regression ("fix a fencepost error in pipe_advance()") noticed by Alan Wylie. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix a fencepost error in pipe_advance() coredump: Ensure proper size of sparse core files aio: fix lock dep warning tmpfs: clear S_ISGID when setting posix ACLs
2017-01-14Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - the virtio_blk stack DMA corruption fix from Christoph, fixing and issue with VMAP stacks. - O_DIRECT blkbits calculation fix from Chandan. - discard regression fix from Christoph. - queue init error handling fixes for nbd and virtio_blk, from Omar and Jeff. - two small nvme fixes, from Christoph and Guilherme. - rename of blk_queue_zone_size and bdev_zone_size to _sectors instead, to more closely follow what we do in other places in the block layer. This interface is new for this series, so let's get the naming right before releasing a kernel with this feature. From Damien. * 'for-linus' of git://git.kernel.dk/linux-block: block: don't try to discard from __blkdev_issue_zeroout sd: remove __data_len hack for WRITE SAME nvme: use blk_rq_payload_bytes scsi: use blk_rq_payload_bytes block: add blk_rq_payload_bytes block: Rename blk_queue_zone_size and bdev_zone_size nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too nvme-rdma: fix nvme_rdma_queue_is_ready virtio_blk: fix panic in initialization error path nbd: blk_mq_init_queue returns an error code on failure, not NULL virtio_blk: avoid DMA to stack for the sense buffer do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned
2017-01-14coredump: Ensure proper size of sparse core filesDave Kleikamp
If the last section of a core file ends with an unmapped or zero page, the size of the file does not correspond with the last dump_skip() call. gdb complains that the file is truncated and can be confusing to users. After all of the vma sections are written, make sure that the file size is no smaller than the current file position. This problem can be demonstrated with gdb's bigcore testcase on the sparc architecture. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-14aio: fix lock dep warningShaohua Li
lockdep reports a warnning. file_start_write/file_end_write only acquire/release the lock for regular files. So checking the files in aio side too. [ 453.532141] ------------[ cut here ]------------ [ 453.533011] WARNING: CPU: 1 PID: 1298 at ../kernel/locking/lockdep.c:3514 lock_release+0x434/0x670 [ 453.533011] DEBUG_LOCKS_WARN_ON(depth <= 0) [ 453.533011] Modules linked in: [ 453.533011] CPU: 1 PID: 1298 Comm: fio Not tainted 4.9.0+ #964 [ 453.533011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014 [ 453.533011] ffff8803a24b7a70 ffffffff8196cffb ffff8803a24b7ae8 0000000000000000 [ 453.533011] ffff8803a24b7ab8 ffffffff81091ee1 ffff8803a5dba700 00000dba00000008 [ 453.533011] ffffed0074496f59 ffff8803a5dbaf54 ffff8803ae0f8488 fffffffffffffdef [ 453.533011] Call Trace: [ 453.533011] [<ffffffff8196cffb>] dump_stack+0x67/0x9c [ 453.533011] [<ffffffff81091ee1>] __warn+0x111/0x130 [ 453.533011] [<ffffffff81091f97>] warn_slowpath_fmt+0x97/0xb0 [ 453.533011] [<ffffffff81091f00>] ? __warn+0x130/0x130 [ 453.533011] [<ffffffff8191b789>] ? blk_finish_plug+0x29/0x60 [ 453.533011] [<ffffffff811205d4>] lock_release+0x434/0x670 [ 453.533011] [<ffffffff8198af94>] ? import_single_range+0xd4/0x110 [ 453.533011] [<ffffffff81322195>] ? rw_verify_area+0x65/0x140 [ 453.533011] [<ffffffff813aa696>] ? aio_write+0x1f6/0x280 [ 453.533011] [<ffffffff813aa6c9>] aio_write+0x229/0x280 [ 453.533011] [<ffffffff813aa4a0>] ? aio_complete+0x640/0x640 [ 453.533011] [<ffffffff8111df20>] ? debug_check_no_locks_freed+0x1a0/0x1a0 [ 453.533011] [<ffffffff8114793a>] ? debug_lockdep_rcu_enabled.part.2+0x1a/0x30 [ 453.533011] [<ffffffff81147985>] ? debug_lockdep_rcu_enabled+0x35/0x40 [ 453.533011] [<ffffffff812a92be>] ? __might_fault+0x7e/0xf0 [ 453.533011] [<ffffffff813ac9bc>] do_io_submit+0x94c/0xb10 [ 453.533011] [<ffffffff813ac2ae>] ? do_io_submit+0x23e/0xb10 [ 453.533011] [<ffffffff813ac070>] ? SyS_io_destroy+0x270/0x270 [ 453.533011] [<ffffffff8111d7b3>] ? mark_held_locks+0x23/0xc0 [ 453.533011] [<ffffffff8100201a>] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 453.533011] [<ffffffff813acb90>] SyS_io_submit+0x10/0x20 [ 453.533011] [<ffffffff824f96aa>] entry_SYSCALL_64_fastpath+0x18/0xad [ 453.533011] [<ffffffff81119190>] ? trace_hardirqs_off_caller+0xc0/0x110 [ 453.533011] ---[ end trace b2fbe664d1cc0082 ]--- Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-13Merge branch 'for-linus-4.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "These are all over the place. The tracepoint part of the pull fixes a crash and adds a little more information to two tracepoints, while the rest are good old fashioned fixes" * 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: make tracepoint format strings more compact Btrfs: add truncated_len for ordered extent tracepoints Btrfs: add 'inode' for extent map tracepoint btrfs: fix crash when tracepoint arguments are freed by wq callbacks Btrfs: adjust outstanding_extents counter properly when dio write is split Btrfs: fix lockdep warning about log_mutex Btrfs: use down_read_nested to make lockdep silent btrfs: fix locking when we put back a delayed ref that's too new btrfs: fix error handling when run_delayed_extent_op fails btrfs: return the actual error value from from btrfs_uuid_tree_iterate
2017-01-13Merge tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Two small fixups for the filesystem changes that went into this merge window" * tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client: ceph: fix get_oldest_context() ceph: fix mds cluster availability check
2017-01-13NFSv4: Fix client recovery when server reboots multiple timesTrond Myklebust
If the server reboots multiple times, the client should rely on the server to tell it that it cannot reclaim state as per section 9.6.3.4 in RFC7530 and section 8.4.2.1 in RFC5661. Currently, the client is being to conservative, and is assuming that if the server reboots while state recovery is in progress, then it must ignore state that was not recovered before the reboot. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-01-13fuse: fix time_to_jiffies nsec sanity checkDavid Sheets
Commit bcb6f6d2b9c2 ("fuse: use timespec64") introduced clamped nsec values in time_to_jiffies but used the max of nsec and NSEC_PER_SEC - 1 instead of the min. Because of this, dentries would stay in the cache longer than requested and go stale in scenarios that relied on their timely eviction. Fixes: bcb6f6d2b9c2 ("fuse: use timespec64") Signed-off-by: David Sheets <dsheets@docker.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # 4.9
2017-01-13fuse: clear FR_PENDING flag when moving requests out of pending queueTahsin Erdogan
fuse_abort_conn() moves requests from pending list to a temporary list before canceling them. This operation races with request_wait_answer() which also tries to remove the request after it gets a fatal signal. It checks FR_PENDING flag to determine whether the request is still in the pending list. Make fuse_abort_conn() clear FR_PENDING flag so that request_wait_answer() does not remove the request from temporary list. This bug causes an Oops when trying to delete an already deleted list entry in end_requests(). Fixes: ee314a870e40 ("fuse: abort: no fc->lock needed for request ending") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # 4.2+
2017-01-12nfsd: fix supported attributes for acl & labelsJ. Bruce Fields
Oops--in 916d2d844afd I moved some constants into an array for convenience, but here I'm accidentally writing to that array. The effect is that if you ever encounter a filesystem lacking support for ACLs or security labels, then all queries of supported attributes will report that attribute as unsupported from then on. Fixes: 916d2d844afd "nfsd: clean up supported attribute handling" Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-12NFSv4: update_changeattr should update the attribute timestampTrond Myklebust
Otherwise, the attribute cache remains marked as being expired. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-01-12NFSv4: Don't call update_changeattr() unless the unlink is successfulTrond Myklebust
If the unlink wasn't successful, then the directory has presumably not changed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-01-12NFSv4: Don't apply change_info4 twice on rename within a directoryTrond Myklebust
If a file is renamed, but stays in the same directory, we will still receive 2 change_info4 structures describing the change to that directory, but we only want to apply it once. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-01-12NFSv4: Call update_changeattr() from _nfs4_proc_open only if a file was createdTrond Myklebust
We don't want to invalidate the directory attribute and data cache unless we know that a file was created, or the change attribute differs from the one in our cache. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-01-12Merge tag 'xfs-for-linus-4.10-rc4-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: "As promised last week, here's some stability fixes from Christoph and Jan Kara: - fix free space request handling when low on disk space - remove redundant log failure error messages - free truncated dirty pages instead of letting them build up forever" * tag 'xfs-for-linus-4.10-rc4-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Timely free truncated dirty pages xfs: don't print warnings when xfs_log_force fails xfs: don't rely on ->total in xfs_alloc_space_available xfs: adjust allocation length in xfs_alloc_space_available xfs: fix bogus minleft manipulations xfs: bump up reserved blocks in xfs_alloc_set_aside
2017-01-12ceph: fix get_oldest_context()Geng, Jichao
For no snapshot case, we should use ci->truncate_{seq,size}. Fixes: 5f743e456606 ("ceph: record truncate size/seq for snap data writeback") Signed-off-by: Geng, Jichao <geng.jichao@h3c.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2017-01-12ceph: fix mds cluster availability checkYan, Zheng
We should apply the check after getting the initial mdsmap. Fixes: e9e427f0a14f ("ceph: check availability of mds cluster on mount") Link: http://tracker.ceph.com/issues/18161 Signed-off-by: Yan, Zheng <zyan@redhat.com>
2017-01-12nfs: Don't take a reference on fl->fl_file for LOCK operationBenjamin Coddington
I have reports of a crash that look like __fput() was called twice for a NFSv4.0 file. It seems possible that the state manager could try to reclaim a lock and take a reference on the fl->fl_file at the same time the file is being released if, during the close(), a signal interrupts the wait for outstanding IO while removing locks which then skips the removal of that lock. Since 83bfff23e9ed ("nfs4: have do_vfs_lock take an inode pointer") has removed the need to traverse fl->fl_file->f_inode in nfs4_lock_done(), taking that reference is no longer necessary. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-01-12block: Rename blk_queue_zone_size and bdev_zone_sizeDamien Le Moal
All block device data fields and functions returning a number of 512B sectors are by convention named xxx_sectors while names in the form xxx_size are generally used for a number of bytes. The blk_queue_zone_size and bdev_zone_size functions were not following this convention so rename them. No functional change is introduced by this patch. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Collapsed the two patches, they were nonsensically split and broke bisection. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-11xfs: Timely free truncated dirty pagesJan Kara
Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started to skip dirty pages in xfs_vm_releasepage() which also has the effect that if a dirty page is truncated, it does not get freed by block_invalidatepage() and is lingering in LRU list waiting for reclaim. So a simple loop like: while true; do dd if=/dev/zero of=file bs=1M count=100 rm file done will keep using more and more memory until we hit low watermarks and start pagecache reclaim which will eventually reclaim also the truncate pages. Keeping these truncated (and thus never usable) pages in memory is just a waste of memory, is unnecessarily stressing page cache reclaim, and reportedly also leads to anonymous mmap(2) returning ENOMEM prematurely. So instead of just skipping dirty pages in xfs_vm_releasepage(), return to old behavior of skipping them only if they have delalloc or unwritten buffers and fix the spurious warnings by warning only if the page is clean. CC: stable@vger.kernel.org CC: Brian Foster <bfoster@redhat.com> CC: Vlastimil Babka <vbabka@suse.cz> Reported-by: Petr Tůma <petr.tuma@d3s.mff.cuni.cz> Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-11Merge branch 'tracepoint-updates-4.10' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.10
2017-01-10ocfs2: fix crash caused by stale lvb with fsdlm pluginEric Ren
The crash happens rather often when we reset some cluster nodes while nodes contend fiercely to do truncate and append. The crash backtrace is below: dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18) ocfs2: End replay journal (node 318952601, slot 2) on device (253,18) ocfs2: Beginning quota recovery on device (253,18) for slot 2 ocfs2: Finishing quota recovery on device (253,18) for slot 2 (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode) (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1 ------------[ cut here ]------------ kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: No, Unsupported modules are loaded CPU: 1 PID: 30154 Comm: truncate Tainted: G OE N 4.4.21-69-default #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014 task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000 RIP: 0010:[<ffffffffa05c8c30>] [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2] RSP: 0018:ffff880074e6bd50 EFLAGS: 00010282 RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246 RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414 R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448 R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020 FS: 00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0 Call Trace: ocfs2_setattr+0x698/0xa90 [ocfs2] notify_change+0x1ae/0x380 do_truncate+0x5e/0x90 do_sys_ftruncate.constprop.11+0x108/0x160 entry_SYSCALL_64_fastpath+0x12/0x6d Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff RIP [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2] It's because ocfs2_inode_lock() get us stale LVB in which the i_size is not equal to the disk i_size. We mistakenly trust the LVB because the underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with DLM_SBF_VALNOTVALID properly for us. But, why? The current code tries to downconvert lock without DLM_LKF_VALBLK flag to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even if the lock resource type needs LVB. This is not the right way for fsdlm. The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on DLM_LKF_VALBLK to decide if we care about the LVB in the LKB. If DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node failure happens. The following diagram briefly illustrates how this crash happens: RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB; The 1st round: Node1 Node2 RSB1: PR RSB1(master): NULL->EX ocfs2_downconvert_lock(PR->NULL, set_lvb==0) ocfs2_dlm_lock(no DLM_LKF_VALBLK) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - dlm_lock(no DLM_LKF_VALBLK) convert_lock(overwrite lkb->lkb_exflags with no DLM_LKF_VALBLK) RSB1: NULL RSB1: EX reset Node2 dlm_recover_rsbs() recover_lvb() /* The LVB is not trustable if the node with EX fails and * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1. */ if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to return; * to invalid the LVB here. */ The 2nd round: Node 1 Node2 RSB1(become master from recovery) ocfs2_setattr() ocfs2_inode_lock(NULL->EX) /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */ ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */ ocfs2_truncate_file() mlog_bug_on_msg(disk isize != i_size_read(inode)) /* crash! */ The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin is uesed. Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.com Signed-off-by: Eric Ren <zren@suse.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10dax: wrprotect pmd_t in dax_mapping_entry_mkcleanRoss Zwisler
Currently dax_mapping_entry_mkclean() fails to clean and write protect the pmd_t of a DAX PMD entry during an *sync operation. This can result in data loss in the following sequence: 1) mmap write to DAX PMD, dirtying PMD radix tree entry and making the pmd_t dirty and writeable 2) fsync, flushing out PMD data and cleaning the radix tree entry. We currently fail to mark the pmd_t as clean and write protected. 3) more mmap writes to the PMD. These don't cause any page faults since the pmd_t is dirty and writeable. The radix tree entry remains clean. 4) fsync, which fails to flush the dirty PMD data because the radix tree entry was clean. 5) crash - dirty data that should have been fsync'd as part of 4) could still have been in the processor cache, and is lost. Fix this by marking the pmd_t clean and write protected in dax_mapping_entry_mkclean(), which is called as part of the fsync operation 2). This will cause the writes in step 3) above to generate page faults where we'll re-dirty the PMD radix tree entry, resulting in flushes in the fsync that happens in step 4). Fixes: 4b4bb46d00b3 ("dax: clear dirty entry tags on cache flush") Link: http://lkml.kernel.org/r/1482272586-21177-3-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10do_direct_IO: Use inode->i_blkbits to compute block count to be cleanedChandan Rajendra
The code currently uses sdio->blkbits to compute the number of blocks to be cleaned. However sdio->blkbits is derived from the logical block size of the underlying block device (Refer to the definition of do_blockdev_direct_IO()). Due to this, generic/299 test would rarely fail when executed on an ext4 filesystem with 64k as the block size and when using a virtio based disk (having 512 byte as the logical block size) inside a kvm guest. This commit fixes the bug by using inode->i_blkbits to compute the number of blocks to be cleaned. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixed up by Jeff Moyer to only use/evaluate inode->i_blkbits once, to avoid issues with block size changes with IO in flight. Signed-off-by: Jens Axboe <axboe@fb.com>