summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-03-30jfs: Remove unnecessary line continuations and terminating newlinesJoe Perches
These jfs_<level> uses need neither a line continuation to assemble the format strings nor newline terminations in the formats. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2016-03-30jfs: Remove terminating newlines from jfs_info, jfs_warn, jfs_err usesJoe Perches
These macros add the newline so these cause extra blank lines in logging output. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2016-03-29fs: kernfs: Replace CURRENT_TIME by current_fs_time()Deepa Dinamani
This is in preparation for the series that transitions filesystem timestamps to use 64 bit time and hence make them y2038 safe. CURRENT_TIME macro will be deleted before merging the aforementioned series. Use current_fs_time() instead of CURRENT_TIME for inode timestamps. struct kernfs_node is associated with a sysfs file/ directory. Truncate the values to appropriate time granularity when writing to inode timestamps of the files. ktime_get_real_ts() is used to obtain times for struct kernfs_iattrs. Since these times are later assigned to inode times using timespec_truncate() for all filesystem based operations, we can save the supers list traversal time here by using ktime_get_real_ts() directly. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-29fs: debugfs: Replace CURRENT_TIME by current_fs_time()Deepa Dinamani
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-29debugfs: fix inode i_nlink references for automount dentryRoman Pen
Directory inodes should start off with i_nlink == 2 (one extra ref for "." entry). debugfs_create_automount() increases neither the i_nlink reference for current inode nor for parent inode. On attempt to remove the automount dentry, kernel complains: [ 86.288070] WARNING: CPU: 1 PID: 3616 at fs/inode.c:273 drop_nlink+0x3e/0x50() [ 86.288461] Modules linked in: debugfs_example2(O-) [ 86.288745] CPU: 1 PID: 3616 Comm: rmmod Tainted: G O 4.4.0-rc3-next-20151207+ #135 [ 86.289197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150617_082717-anatol 04/01/2014 [ 86.289696] ffffffff81be05c9 ffff8800b9e6fda0 ffffffff81352e2c 0000000000000000 [ 86.290110] ffff8800b9e6fdd8 ffffffff81065142 ffff8801399175e8 ffff8800bb78b240 [ 86.290507] ffff8801399175e8 ffff8800b73d7898 ffff8800b73d7840 ffff8800b9e6fde8 [ 86.290933] Call Trace: [ 86.291080] [<ffffffff81352e2c>] dump_stack+0x4e/0x82 [ 86.291340] [<ffffffff81065142>] warn_slowpath_common+0x82/0xc0 [ 86.291640] [<ffffffff8106523a>] warn_slowpath_null+0x1a/0x20 [ 86.291932] [<ffffffff811ae62e>] drop_nlink+0x3e/0x50 [ 86.292208] [<ffffffff811ba35b>] simple_unlink+0x4b/0x60 [ 86.292481] [<ffffffff811ba3a7>] simple_rmdir+0x37/0x50 [ 86.292748] [<ffffffff812d9808>] __debugfs_remove.part.16+0xa8/0xd0 [ 86.293082] [<ffffffff812d9a0b>] debugfs_remove_recursive+0xdb/0x1c0 [ 86.293406] [<ffffffffa00004dd>] cleanup_module+0x2d/0x3b [debugfs_example2] [ 86.293762] [<ffffffff810d959b>] SyS_delete_module+0x16b/0x220 [ 86.294077] [<ffffffff818ef857>] entry_SYSCALL_64_fastpath+0x12/0x6a [ 86.294405] ---[ end trace c9fc53353fe14a36 ]--- [ 86.294639] ------------[ cut here ]------------ To reproduce the issue it is enough to invoke these lines: autom = debugfs_create_automount("automount", NULL, vfsmount_cb, data); BUG_ON(IS_ERR_OR_NULL(autom)); debugfs_remove(autom); The issue is fixed by increasing inode i_nlink references for current and parent inodes. Signed-off-by: Roman Pen <r.peniaev@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-29chrdev: emit a warning when we go below dynamic major rangeLinus Walleij
Currently a dynamically allocated character device major is taken from 254 and downward. This mechanism is used for RTC, IIO and a few other subsystems. The kernel currently has no check prevening these dynamic allocations from eating into the assigned numbers at 233 and downward. In a recent test it was reported that so many dynamic device majors were used on a test server, that the major number for infiniband (231) was stolen. This occurred when allocating a new major number for GPIO chips. The error messages from the kernel were not helpful. (See: https://lkml.org/lkml/2016/2/14/124) This patch adds a defined lower limit of the dynamic major allocation region will henceforth emit a warning if we start to eat into the assigned numbers. It does not do any semantic changes and will not change the kernels behaviour: numbers will still continue to be stolen, but we will know from dmesg what is going on. This also updates the Documentation/devices.txt to clearly reflect that we are using this range of major numbers for dynamic allocation. Reported-by: Ying Huang <ying.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Cox <alan@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-29ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotasJan Kara
When Q_GETNEXTQUOTA was called for filesystem with quotas disabled ocfs2_get_next_id() oopses. Fix the problem by checking early whether the filesystem has quotas enabled. Signed-off-by: Jan Kara <jack@suse.cz>
2016-03-29dlm: config: Fix ENOMEM failures in make_cluster()Andrew Price
Commit 1ae1602de0 "configfs: switch ->default groups to a linked list" left the NULL gps pointer behind after removing the kcalloc() call which made it non-NULL. It also left the !gps check in place so make_cluster() now fails with ENOMEM. Remove the remaining uses of the gps variable to fix that. Reviewed-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2016-03-29quota: Handle Q_GETNEXTQUOTA when quota is disabledJan Kara
Currently we oopsed when Q_GETNEXTQUOTA got called when quota was disabled. Properly check whether quota is enabled for the filesystem before calling into quota format handler. Reported-by: Ted Tso <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>
2016-03-28cifs: don't bother with kmap on read_pages sideAl Viro
just do ITER_BVEC recvmsg Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-28cifs_readv_receive: use cifs_read_from_socket()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-28cifs: no need to wank with copying and advancing iovec on recvmsg side eitherAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-28cifs: quit playing games with draining iovecsAl Viro
... and use ITER_BVEC for the page part of request to send Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-28cifs: merge the hash calculation helpersAl Viro
three practically identical copies... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-28jfs: Remove unnecessary code in jfs_get_aclAndreas Gruenbacher
The get_acl inode operation is called only when no ACL is cached. It makes no sense to check for a cached ACL as the first thing inside such inode operations. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-28reiserfs_cache_default_acl(): use get_acl()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-28f2fs: cover large section in sanity check of superJaegeuk Kim
This patch fixes the bug which does not cover a large section case when checking the sanity of superblock. If f2fs detects misalignment, it will fix the superblock during the mount time, so it doesn't need to trigger fsck.f2fs further. Reported-by: Matthias Prager <linux@matthiasprager.de> Reported-by: David Gnedt <david.gnedt@davizone.at> Cc: stable 4.5+ <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-28constify security_path_{mkdir,mknod,symlink}Al Viro
... as well as unix_mknod() and may_o_create() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-28constify chmod_common/security_path_chmodAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-28constify chown_common/security_path_chownAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-28constify vfs_truncate()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-27__d_alloc(): treat NULL name as QSTR("/", 1)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is quite a bit here, including some overdue refactoring and cleanup on the mon_client and osd_client code from Ilya, scattered writeback support for CephFS and a pile of bug fixes from Zheng, and a few random cleanups and fixes from others" [ I already decided not to pull this because of it having been rebased recently, but ended up changing my mind after all. Next time I'll really hold people to it. Oh well. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits) libceph: use KMEM_CACHE macro ceph: use kmem_cache_zalloc rbd: use KMEM_CACHE macro ceph: use lookup request to revalidate dentry ceph: kill ceph_get_dentry_parent_inode() ceph: fix security xattr deadlock ceph: don't request vxattrs from MDS ceph: fix mounting same fs multiple times ceph: remove unnecessary NULL check ceph: avoid updating directory inode's i_size accidentally ceph: fix race during filling readdir cache libceph: use sizeof_footer() more ceph: kill ceph_empty_snapc ceph: fix a wrong comparison ceph: replace CURRENT_TIME by current_fs_time() ceph: scattered page writeback libceph: add helper that duplicates last extent operation libceph: enable large, variable-sized OSD requests libceph: osdc->req_mempool should be backed by a slab pool libceph: make r_request msg_size calculation clearer ...
2016-03-26ext4 crypto: use dget_parent() in ext4_d_revalidate()Theodore Ts'o
This avoids potential problems caused by a race where the inode gets renamed out from its parent directory and the parent directory is deleted while ext4_d_revalidate() is running. Fixes: 28b4c263961c Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-03-26ext4: use file_dentry()Miklos Szeredi
EXT4 may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Reported-by: Daniel Axtens <dja@axtens.net> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Fixes: ff978b09f973 ("ext4 crypto: move context consistency check to ext4_file_open()") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> # v4.5
2016-03-26ext4: use dget_parent() in ext4_file_open()Miklos Szeredi
In f_op->open() lock on parent is not held, so there's no guarantee that parent dentry won't go away at any time. Even after this patch there's no guarantee that 'dir' will stay the parent of 'inode', but at least it won't be freed while being used. Fixes: ff978b09f973 ("ext4 crypto: move context consistency check to ext4_file_open()") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: <stable@vger.kernel.org> # v4.5
2016-03-26nfs: use file_dentry()Miklos Szeredi
NFS may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: <stable@vger.kernel.org> # v4.2 Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk>
2016-03-26fs: add file_dentry()Miklos Szeredi
This series fixes bugs in nfs and ext4 due to 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay"). Regular files opened on overlayfs will result in the file being opened on the underlying filesystem, while f_path points to the overlayfs mount/dentry. This confuses filesystems which get the dentry from struct file and assume it's theirs. Add a new helper, file_dentry() [*], to get the filesystem's own dentry from the file. This checks file->f_path.dentry->d_flags against DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not set (this is the common, non-overlayfs case). In the uncommon case it will call into overlayfs's ->d_real() to get the underlying dentry, matching file_inode(file). The reason we need to check against the inode is that if the file is copied up while being open, d_real() would return the upper dentry, while the open file comes from the lower dentry. [*] If possible, it's better simply to use file_inode() instead. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: <stable@vger.kernel.org> # v4.2 Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Daniel Axtens <dja@axtens.net>
2016-03-26ext4 crypto: don't let data integrity writebacks fail with ENOMEMTheodore Ts'o
We don't want the writeback triggered from the journal commit (in data=writeback mode) to cause the journal to abort due to generic_writepages() returning an ENOMEM error. In addition, if fsync() fails with ENOMEM, most applications will probably not do the right thing. So if we are doing a data integrity sync, and ext4_encrypt() returns ENOMEM, we will submit any queued I/O to date, and then retry the allocation using GFP_NOFAIL. Google-Bug-Id: 27641567 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-26Merge tag 'ofs-pull-tag-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs filesystem from Mike Marshall. This finally merges the long-pending orangefs filesystem, which has been much cleaned up with input from Al Viro over the last six months. From the documentation file: "OrangeFS is an LGPL userspace scale-out parallel storage system. It is ideal for large storage problems faced by HPC, BigData, Streaming Video, Genomics, Bioinformatics. Orangefs, originally called PVFS, was first developed in 1993 by Walt Ligon and Eric Blumer as a parallel file system for Parallel Virtual Machine (PVM) as part of a NASA grant to study the I/O patterns of parallel programs. Orangefs features include: - Distributes file data among multiple file servers - Supports simultaneous access by multiple clients - Stores file data and metadata on servers using local file system and access methods - Userspace implementation is easy to install and maintain - Direct MPI support - Stateless" see Documentation/filesystems/orangefs.txt for more in-depth details. * tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (174 commits) orangefs: fix orangefs_superblock locking orangefs: fix do_readv_writev() handling of error halfway through orangefs: have ->kill_sb() evict the VFS side of things first orangefs: sanitize ->llseek() orangefs-bufmap.h: trim unused junk orangefs: saner calling conventions for getting a slot orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer orangefs: get rid of readdir_handle_s ornagefs: ensure that truncate has an up to date inode size orangefs: move code which sets i_link to orangefs_inode_getattr orangefs: remove needless wrapper around GFP_KERNEL orangefs: remove wrapper around mutex_lock(&inode->i_mutex) orangefs: refactor inode type or link_target change detection orangefs: use new getattr for revalidate and remove old getattr orangefs: use new getattr in inode getattr and permission orangefs: use new orangefs_inode_getattr to get size in write and llseek orangefs: use new orangefs_inode_getattr to create new inodes orangefs: rename orangefs_inode_getattr to orangefs_inode_old_getattr orangefs: remove inode->i_lock wrapper orangefs: put register_chrdev immediately before register_filesystem ...
2016-03-26f2fs/crypto: fix xts_tweak initializationLinus Torvalds
Commit 0b81d07790726 ("fs crypto: move per-file encryption from f2fs tree to fs/crypto") moved the f2fs crypto files to fs/crypto/ and renamed the symbol prefixes from "f2fs_" to "fscrypt_" (and from "F2FS_" to just "FS" for preprocessor symbols). Because of the symbol renaming, it's a bit hard to see it as a file move: use git show -M30 0b81d07790726 to lower the rename detection to just 30% similarity and make git show the files as renamed (the header file won't be shown as a rename even then - since all it contains is symbol definitions, it looks almost completely different). Even with the renames showing as renames, the diffs are not all that easy to read, since so much is just the renames. But Eric Biggers noticed that it's not just all renames: the initialization of the xts_tweak had been broken too, using the inode number rather than the page offset. That's not right - it makes the xfs_tweak the same for all pages of each inode. It _might_ make sense to make the xfs_tweak contain both the offset _and_ the inode number, but not just the inode number. Reported-by: Eric Biggers <ebiggers3@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-26orangefs: fix orangefs_superblock lockingAl Viro
* switch orangefs_remount() to taking ORANGEFS_SB(sb) instead of sb * remove from the list _before_ orangefs_unmount() - request_mutex in the latter will make sure that nothing observed in the loop in ORANGEFS_DEV_REMOUNT_ALL handling will get freed until the end of loop * on removal, keep the forward pointer and zero the back one. That way we can drop and regain the spinlock in the loop body (again, ORANGEFS_DEV_REMOUNT_ALL one) and still be able to get to the rest of the list. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-25orangefs: fix do_readv_writev() handling of error halfway throughAl Viro
Error should only be returned if nothing had been read/written. Otherwise we need to report a short read/write instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-25orangefs: have ->kill_sb() evict the VFS side of things firstAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-25orangefs: sanitize ->llseek()Al Viro
a) open files can't have NULL inodes b) it's SEEK_END, not ORANGEFS_SEEK_END; no need to get cute. c) make_bad_inode() on lseek()? Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-25orangefs-bufmap.h: trim unused junkAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-25orangefs: saner calling conventions for getting a slotAl Viro
just have it return the slot number or -E... - the caller checks the sign anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-25orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointerAl Viro
it's always __orangefs_bufmap Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-25orangefs: get rid of readdir_handle_sAl Viro
no point, really - we couldn't keep those across the calls of getdents(); it would be too easy to DoS, having all slots exhausted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-25ocfs2: extend enough credits for freeing one truncate record while replaying ↵Xue jiufei
truncate records Now function ocfs2_replay_truncate_records() first modifies tl_used, then calls ocfs2_extend_trans() to extend transactions for gd and alloc inode used for freeing clusters. jbd2_journal_restart() may be called and it may happen that tl_used in truncate log is decreased but the clusters are not freed, which means these clusters are lost. So we should avoid extending transactions in these two operations. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Acked-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ocfs2: extend transaction for ocfs2_remove_rightmost_path() and ↵Xue jiufei
ocfs2_update_edge_lengths() before to avoid inconsistency between inode and et I found that jbd2_journal_restart() is called in some places without keeping things consistently before. However, jbd2_journal_restart() may commit the handle's transaction and restart another one. If the first transaction is committed successfully while another not, it may cause filesystem inconsistency or read only. This is an effort to fix this kind of problems. This patch (of 3): The following functions will be called while truncating an extent: ocfs2_remove_btree_range -> ocfs2_start_trans -> ocfs2_remove_extent -> ocfs2_truncate_rec -> ocfs2_extend_rotate_transaction -> jbd2_journal_restart if jbd2_journal_extend fail -> ocfs2_rotate_tree_left -> ocfs2_remove_rightmost_path -> ocfs2_extend_rotate_transaction -> ocfs2_unlink_subtree -> ocfs2_update_edge_lengths -> ocfs2_extend_trans -> jbd2_journal_restart if jbd2_journal_extend fail -> ocfs2_et_update_clusters -> ocfs2_commit_trans jbd2_journal_restart() may be called and it may happened that the buffers dirtied in ocfs2_truncate_rec() are committed while buffers dirtied in ocfs2_et_update_clusters() are not, the total clusters on extent tree and i_clusters in ocfs2_dinode is inconsistency. So the clusters got from ocfs2_dinode is incorrect, and it also cause read-only problem when call ocfs2_commit_truncate() with the error message: "Inode %llu has empty extent block at %llu". We should extend enough credits for function ocfs2_remove_rightmost_path and ocfs2_update_edge_lengths to avoid this inconsistency. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Acked-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ocfs2/dlm: move lock to the tail of grant queue while doing in-place convertxuejiufei
We have found a bug when two nodes doing umount one after another. 1) Node 1 migrate a lockres that has 3 locks in grant queue such as N2(PR)<->N3(NL)<->N4(PR) to N2. After migration, lvb of the lock N3(NL) and N4(PR) are empty on node 2 because migration target do not copy lvb to these two lock. 2) Node 3 want to convert to PR, it can be granted in __dlmconvert_master(), and the order of these locks is unchanged. The lvb of the lock N3(PR) on node 2 is copyed from lockres in function dlm_update_lvb() while the lvb of lock N4(PR) is still empty. 3) Node 2 want to leave domain, it will migrate this lockres to node 3. Then node 2 will trigger the BUG in dlm_prepare_lvb_for_migration() when adding the lock N4(PR) to mres with the following message because the lvb of mres is already copied from lock N3(PR), but the lvb of lock N4(PR) is empty. "Mismatched lvb in lock cookie=%u:%llu, name=%.*s, node=%u" [akpm@linux-foundation.org: tweak comment] Signed-off-by: xuejiufei <xuejiufei@huawei.com> Acked-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ocfs2: solve a problem of crossing the boundary in updating backupsjiangyiwen
In update_backups() there exists a problem of crossing the boundary as follows: we assume that lun will be resized to 1TB(cluster_size is 32kb), it will include 0~33554431 cluster, in update_backups func, it will backup super block in location of 1TB which is the 33554432th cluster, so the phenomenon of crossing the boundary happens. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Xue jiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ocfs2: fix occurring deadlock by changing ocfs2_wq from global to localjiangyiwen
This patch fixes a deadlock, as follows: Node 1 Node 2 Node 3 1)volume a and b are only mount vol a only mount vol b mounted 2) start to mount b start to mount a 3) check hb of Node 3 check hb of Node 2 in vol a, qs_holds++ in vol b, qs_holds++ 4) -------------------- all nodes' network down -------------------- 5) progress of mount b the same situation as failed, and then call Node 2 ocfs2_dismount_volume. but the process is hung, since there is a work in ocfs2_wq cannot beo completed. This work is about vol a, because ocfs2_wq is global wq. BTW, this work which is scheduled in ocfs2_wq is ocfs2_orphan_scan_work, and the context in this work needs to take inode lock of orphan_dir, because lockres owner are Node 1 and all nodes' nework has been down at the same time, so it can't get the inode lock. 6) Why can't this node be fenced when network disconnected? Because the process of mount is hung what caused qs_holds is not equal 0. Because all works in the ocfs2_wq are relative to the super block. The solution is to change the ocfs2_wq from global to local. In other words, move it into struct ocfs2_super. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Xue jiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_listJoseph Qi
When master handles convert request, it queues ast first and then returns status. This may happen that the ast is sent before the request status because the above two messages are sent by two threads. And right after the ast is sent, if master down, it may trigger BUG in dlm_move_lockres_to_recovery_list in the requested node because ast handler moves it to grant list without clear lock->convert_pending. So remove BUG_ON statement and check if the ast is processed in dlmconvert_remote. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ocfs2/dlm: fix race between convert and recoveryJoseph Qi
There is a race window between dlmconvert_remote and dlm_move_lockres_to_recovery_list, which will cause a lock with OCFS2_LOCK_BUSY in grant list, thus system hangs. dlmconvert_remote { spin_lock(&res->spinlock); list_move_tail(&lock->list, &res->converting); lock->convert_pending = 1; spin_unlock(&res->spinlock); status = dlm_send_remote_convert_request(); >>>>>> race window, master has queued ast and return DLM_NORMAL, and then down before sending ast. this node detects master down and calls dlm_move_lockres_to_recovery_list, which will revert the lock to grant list. Then OCFS2_LOCK_BUSY won't be cleared as new master won't send ast any more because it thinks already be authorized. spin_lock(&res->spinlock); lock->convert_pending = 0; if (status != DLM_NORMAL) dlm_revert_pending_convert(res, lock); spin_unlock(&res->spinlock); } In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set (res is still in recovering) or res master changed (new master has finished recovery), reset the status to DLM_RECOVERING, then it will retry convert. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ocfs2: fix a deadlock issue in ocfs2_dio_end_io_write()Ryan Ding
The code should call ocfs2_free_alloc_context() to free meta_ac & data_ac before calling ocfs2_run_deallocs(). Because ocfs2_run_deallocs() will acquire the system inode's i_mutex hold by meta_ac. So try to release the lock before ocfs2_run_deallocs(). Fixes: af1310367f41 ("ocfs2: fix sparse file & data ordering issue in direct io.") Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Acked-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ocfs2: fix disk file size and memory file size mismatchRyan Ding
When doing append direct write in an already allocated cluster, and fast path in ocfs2_dio_get_block() is triggered, function ocfs2_dio_end_io_write() will be skipped as there is no context allocated. As a result, the disk file size will not be changed as it should be. The solution is to skip fast path when we are about to change file size. Fixes: af1310367f41 ("ocfs2: fix sparse file & data ordering issue in direct io.") Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Acked-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ocfs2: take ip_alloc_sem in ocfs2_dio_get_block & ocfs2_dio_end_io_writeRyan Ding
Take ip_alloc_sem to prevent concurrent access to extent tree, which may cause the extent tree in an unstable state. Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ocfs2: fix ip_unaligned_aio deadlock with dio work queueRyan Ding
In the current implementation of unaligned aio+dio, lock order behave as follow: in user process context: -> call io_submit() -> get i_mutex <== window1 -> get ip_unaligned_aio -> submit direct io to block device -> release i_mutex -> io_submit() return in dio work queue context(the work queue is created in __blockdev_direct_IO): -> release ip_unaligned_aio <== window2 -> get i_mutex -> clear unwritten flag & change i_size -> release i_mutex There is a limitation to the thread number of dio work queue. 256 at default. If all 256 thread are in the above 'window2' stage, and there is a user process in the 'window1' stage, the system will became deadlock. Since the user process hold i_mutex to wait ip_unaligned_aio lock, while there is a direct bio hold ip_unaligned_aio mutex who is waiting for a dio work queue thread to be schedule. But all the dio work queue thread is waiting for i_mutex lock in 'window2'. This case only happened in a test which send a large number(more than 256) of aio at one io_submit() call. My design is to remove ip_unaligned_aio lock. Change it to a sync io instead. Just like ip_unaligned_aio lock, serialize the unaligned aio dio. [akpm@linux-foundation.org: remove OCFS2_IOCB_UNALIGNED_IO, per Junxiao Bi] Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>