summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-02-29jffs2: Improve post-mount CRC scan efficiencyDavid Woodhouse
We need to finish doing the CRC checks before we can allow writes to happen, and we currently process the inodes in order. This means a call to jffs2_get_ino_cache() for each possible inode# up to c->highest_ino. There may be a lot of lookups which fail, if the inode# space is used sparsely. And the inode# space is *often* used sparsely, if a file system contains a lot of stuff that was put there in the original image, followed by lots of creation and deletion of new files. Instead of processing them numerically with a lookup each time, just walk the hash buckets instead. [fix locking typo reported by Dan Carpenter] Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2016-02-29use ->d_seq to get coherency between ->d_inode and ->d_flagsAl Viro
Games with ordering and barriers are way too brittle. Just bump ->d_seq before and after updating ->d_inode and ->d_flags type bits, so that verifying ->d_seq would guarantee they are coherent. Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-29Fix cifs_uniqueid_to_ino_t() function for s390xYadan Fan
This issue is caused by commit 02323db17e3a7 ("cifs: fix cifs_uniqueid_to_ino_t not to ever return 0"), when BITS_PER_LONG is 64 on s390x, the corresponding cifs_uniqueid_to_ino_t() function will cast 64-bit fileid to 32-bit by using (ino_t)fileid, because ino_t (typdefed __kernel_ino_t) is int type. It's defined in arch/s390/include/uapi/asm/posix_types.h #ifndef __s390x__ typedef unsigned long __kernel_ino_t; ... #else /* __s390x__ */ typedef unsigned int __kernel_ino_t; So the #ifdef condition is wrong for s390x, we can just still use one cifs_uniqueid_to_ino_t() function with comparing sizeof(ino_t) and sizeof(u64) to choose the correct execution accordingly. Signed-off-by: Yadan Fan <ydfan@suse.com> CC: stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-02-29CIFS: Fix SMB2+ interim response processing for read requestsPavel Shilovsky
For interim responses we only need to parse a header and update a number credits. Now it is done for all SMB2+ command except SMB2_READ which is wrong. Fix this by adding such processing. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Tested-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-02-29cifs: fix out-of-bounds access in lease parsingJustin Maggard
When opening a file, SMB2_open() attempts to parse the lease state from the SMB2 CREATE Response. However, the parsing code was not careful to ensure that the create contexts are not empty or invalid, which can lead to out- of-bounds memory access. This can be seen easily by trying to read a file from a OSX 10.11 SMB3 server. Here is sample crash output: BUG: unable to handle kernel paging request at ffff8800a1a77cc6 IP: [<ffffffff8828a734>] SMB2_open+0x804/0x960 PGD 8f77067 PUD 0 Oops: 0000 [#1] SMP Modules linked in: CPU: 3 PID: 2876 Comm: cp Not tainted 4.5.0-rc3.x86_64.1+ #14 Hardware name: NETGEAR ReadyNAS 314 /ReadyNAS 314 , BIOS 4.6.5 10/11/2012 task: ffff880073cdc080 ti: ffff88005b31c000 task.ti: ffff88005b31c000 RIP: 0010:[<ffffffff8828a734>] [<ffffffff8828a734>] SMB2_open+0x804/0x960 RSP: 0018:ffff88005b31fa08 EFLAGS: 00010282 RAX: 0000000000000015 RBX: 0000000000000000 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff88007eb8c8b0 RBP: ffff88005b31fad8 R08: 666666203d206363 R09: 6131613030383866 R10: 3030383866666666 R11: 00000000000002b0 R12: ffff8800660fd800 R13: ffff8800a1a77cc2 R14: 00000000424d53fe R15: ffff88005f5a28c0 FS: 00007f7c8a2897c0(0000) GS:ffff88007eb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a1a77cc6 CR3: 000000005b281000 CR4: 00000000000006e0 Stack: ffff88005b31fa70 ffffffff88278789 00000000000001d3 ffff88005f5a2a80 ffffffff00000003 ffff88005d029d00 ffff88006fde05a0 0000000000000000 ffff88005b31fc78 ffff88006fde0780 ffff88005b31fb2f 0000000100000fe0 Call Trace: [<ffffffff88278789>] ? cifsConvertToUTF16+0x159/0x2d0 [<ffffffff8828cf68>] smb2_open_file+0x98/0x210 [<ffffffff8811e80c>] ? __kmalloc+0x1c/0xe0 [<ffffffff882685f4>] cifs_open+0x2a4/0x720 [<ffffffff88122cef>] do_dentry_open+0x1ff/0x310 [<ffffffff88268350>] ? cifsFileInfo_get+0x30/0x30 [<ffffffff88123d92>] vfs_open+0x52/0x60 [<ffffffff88131dd0>] path_openat+0x170/0xf70 [<ffffffff88097d48>] ? remove_wait_queue+0x48/0x50 [<ffffffff88133a29>] do_filp_open+0x79/0xd0 [<ffffffff8813f2ca>] ? __alloc_fd+0x3a/0x170 [<ffffffff881240c4>] do_sys_open+0x114/0x1e0 [<ffffffff881241a9>] SyS_open+0x19/0x20 [<ffffffff8896e257>] entry_SYSCALL_64_fastpath+0x12/0x6a Code: 4d 8d 6c 07 04 31 c0 4c 89 ee e8 47 6f e5 ff 31 c9 41 89 ce 44 89 f1 48 c7 c7 28 b1 bd 88 31 c0 49 01 cd 4c 89 ee e8 2b 6f e5 ff <45> 0f b7 75 04 48 c7 c7 31 b1 bd 88 31 c0 4d 01 ee 4c 89 f6 e8 RIP [<ffffffff8828a734>] SMB2_open+0x804/0x960 RSP <ffff88005b31fa08> CR2: ffff8800a1a77cc6 ---[ end trace d9f69ba64feee469 ]--- Signed-off-by: Justin Maggard <jmaggard@netgear.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2016-02-29ext4: Fix data exposure after failed AIO DIOJan Kara
When AIO DIO fails e.g. due to IO error, we must not convert unwritten extents as that will expose uninitialized data. Handle this case by clearing unwritten flag from io_end in case of error and thus preventing extent conversion. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-02-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: do_last(): ELOOP failure exit should be done after leaving RCU mode should_follow_link(): validate ->d_seq after having decided to follow namei: ->d_inode of a pinned dentry is stable only for positives do_last(): don't let a bogus return value from ->open() et.al. to confuse us fs: return -EOPNOTSUPP if clone is not supported hpfs: don't truncate the file when delete fails
2016-02-27do_last(): ELOOP failure exit should be done after leaving RCU modeAl Viro
... or we risk seeing a bogus value of d_is_symlink() there. Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27should_follow_link(): validate ->d_seq after having decided to followAl Viro
... otherwise d_is_symlink() above might have nothing to do with the inode value we've got. Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27namei: ->d_inode of a pinned dentry is stable only for positivesAl Viro
both do_last() and walk_component() risk picking a NULL inode out of dentry about to become positive, *then* checking its flags and seeing that it's not negative anymore and using (already stale by then) value they'd fetched earlier. Usually ends up oopsing soon after that... Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27do_last(): don't let a bogus return value from ->open() et.al. to confuse usAl Viro
... into returning a positive to path_openat(), which would interpret that as "symlink had been encountered" and proceed to corrupt memory, etc. It can only happen due to a bug in some ->open() instance or in some LSM hook, etc., so we report any such event *and* make sure it doesn't trick us into further unpleasantness. Cc: stable@vger.kernel.org # v3.6+, at least Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27fs: return -EOPNOTSUPP if clone is not supportedChristoph Hellwig
-EBADF is a rather confusing error if an operations is not supported, and nfsd gets rather upset about it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27hpfs: don't truncate the file when delete failsMikulas Patocka
The delete opration can allocate additional space on the HPFS filesystem due to btree split. The HPFS driver checks in advance if there is available space, so that it won't corrupt the btree if we run out of space during splitting. If there is not enough available space, the HPFS driver attempted to truncate the file, but this results in a deadlock since the commit 7dd29d8d865efdb00c0542a5d2c87af8c52ea6c7 ("HPFS: Introduce a global mutex and lock it on every callback from VFS"). This patch removes the code that tries to truncate the file and -ENOSPC is returned instead. If the user hits -ENOSPC on delete, he should try to delete other files (that are stored in a leaf btree node), so that the delete operation will make some space for deleting the file stored in non-leaf btree node. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: stable@vger.kernel.org # 2.6.39+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: dax: move writeback calls into the filesystems dax: give DAX clearing code correct bdev ext4: online defrag not supported with DAX ext2, ext4: only set S_DAX for regular inodes block: disable block device DAX by default ocfs2: unlock inode if deleting inode from orphan fails mm: ASLR: use get_random_long() drivers: char: random: add get_random_long() mm: numa: quickly fail allocations for NUMA balancing on full nodes mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
2016-02-27Merge tag 'tags/ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext2/4 DAX fix from Ted Ts'o: "This fixes a file system corruption bug with DAX" * tag 'tags/ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()
2016-02-27ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()Ross Zwisler
As it is currently written ext4_dax_mkwrite() assumes that the call into __dax_mkwrite() will not have to do a block allocation so it doesn't create a journal entry. For a read that creates a zero page to cover a hole followed by a write that actually allocates storage this is incorrect. The ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls get_blocks() to allocate storage. Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault() as this function already has all the logic needed to allocate a journal entry and call __dax_fault(). Also update the ext2 fault handlers in this same way to remove duplicate code and keep the logic between ext2 and ext4 the same. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-27dax: move writeback calls into the filesystemsRoss Zwisler
Previously calls to dax_writeback_mapping_range() for all DAX filesystems (ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range(). dax_writeback_mapping_range() needs a struct block_device, and it used to get that from inode->i_sb->s_bdev. This is correct for normal inodes mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw block devices and for XFS real-time files. Instead, call dax_writeback_mapping_range() directly from the filesystem ->writepages function so that it can supply us with a valid block device. This also fixes DAX code to properly flush caches in response to sync(2). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27dax: give DAX clearing code correct bdevRoss Zwisler
dax_clear_blocks() needs a valid struct block_device and previously it was using inode->i_sb->s_bdev in all cases. This is correct for normal inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for DAX raw block devices and for XFS real-time devices. Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change its arguments to take a bdev and a sector instead of an inode and a block. This better reflects what the function does, and it allows the filesystem and raw block device code to pass in an appropriate struct block_device. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27ext4: online defrag not supported with DAXRoss Zwisler
Online defrag operations for ext4 are hard coded to use the page cache. See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page() When combined with DAX I/O, which circumvents the page cache, this can result in data corruption. This was observed with xfstests ext4/307 and ext4/308. Fix this by only allowing online defrag for non-DAX files. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27ext2, ext4: only set S_DAX for regular inodesRoss Zwisler
When S_DAX is set on an inode we assume that if there are pages attached to the mapping (mapping->nrpages != 0), those pages are clean zero pages that were used to service reads from holes. Any dirty data associated with the inode should be in the form of DAX exceptional entries (mapping->nrexceptional) that is written back via dax_writeback_mapping_range(). With the current code, though, this isn't always true. For example, ext2 and ext4 directory inodes can have S_DAX set, but have their dirty data stored as dirty page cache entries. For these types of inodes, having S_DAX set doesn't really make sense since their I/O doesn't actually happen through the DAX code path. Instead, only allow S_DAX to be set for regular inodes for ext2 and ext4. This allows us to have strict DAX vs non-DAX paths in the writeback code. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27block: disable block device DAX by defaultDan Williams
The recent *sync enabling discovered that we are inserting into the block_device pagecache counter to the expectations of the dirty data tracking for dax mappings. This can lead to data corruption. We want to support DAX for block devices eventually, but it requires wider changes to properly manage the pagecache. dump_stack+0x85/0xc2 dax_writeback_mapping_range+0x60/0xe0 blkdev_writepages+0x3f/0x50 do_writepages+0x21/0x30 __filemap_fdatawrite_range+0xc6/0x100 filemap_write_and_wait+0x4a/0xa0 set_blocksize+0x70/0xd0 sb_set_blocksize+0x1d/0x50 ext4_fill_super+0x75b/0x3360 mount_bdev+0x180/0x1b0 ext4_mount+0x15/0x20 mount_fs+0x38/0x170 Mark the support broken so its disabled by default, but otherwise still available for testing. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27ocfs2: unlock inode if deleting inode from orphan failsGuozhonghua
When doing append direct io cleanup, if deleting inode fails, it goes out without unlocking inode, which will cause the inode deadlock. This issue was introduced by commit cf1776a9e834 ("ocfs2: fix a tiny race when truncate dio orohaned entry"). Signed-off-by: Guozhonghua <guozhonghua@h3c.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27mm: ASLR: use get_random_long()Daniel Cashman
Replace calls to get_random_int() followed by a cast to (unsigned long) with calls to get_random_long(). Also address shifting bug which, in case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits. Signed-off-by: Daniel Cashman <dcashman@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-26f2fs: fix to avoid deadlock when merging inline dataChao Yu
When testing with fsstress, kworker and user threads were both blocked: INFO: task kworker/u16:1:16580 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u16:1 D ffff8803f2595390 0 16580 2 0x00000000 Workqueue: writeback bdi_writeback_workfn (flush-251:0) ffff8802730e5760 0000000000000046 ffff880274729fc0 0000000000012440 ffff8802730e5fd8 ffff8802730e4010 0000000000012440 0000000000012440 ffff8802730e5fd8 0000000000012440 ffff880274729fc0 ffff88026eb50000 Call Trace: [<ffffffff816fe9d9>] schedule+0x29/0x70 [<ffffffff816ff895>] rwsem_down_read_failed+0xa5/0xf9 [<ffffffff81378584>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffffa0694feb>] f2fs_write_data_page+0x31b/0x420 [f2fs] [<ffffffffa0690f1a>] __f2fs_writepage+0x1a/0x50 [f2fs] [<ffffffffa06922a0>] f2fs_write_data_pages+0xe0/0x290 [f2fs] [<ffffffff811473b3>] do_writepages+0x23/0x40 [<ffffffff811cc3ee>] __writeback_single_inode+0x4e/0x250 [<ffffffff811cd4f1>] writeback_sb_inodes+0x2c1/0x470 [<ffffffff811cd73e>] __writeback_inodes_wb+0x9e/0xd0 [<ffffffff811cda0b>] wb_writeback+0x1fb/0x2d0 [<ffffffff811cdb7c>] wb_do_writeback+0x9c/0x220 [<ffffffff811ce232>] bdi_writeback_workfn+0x72/0x1c0 [<ffffffff8106b74e>] process_one_work+0x1de/0x5b0 [<ffffffff8106e78f>] worker_thread+0x11f/0x3e0 [<ffffffff810750ce>] kthread+0xde/0xf0 [<ffffffff817093f8>] ret_from_fork+0x58/0x90 fsstress thread stack: [<ffffffff81139f0e>] sleep_on_page+0xe/0x20 [<ffffffff81139ef7>] __lock_page+0x67/0x70 [<ffffffff8113b100>] find_lock_page+0x50/0x80 [<ffffffff8113b24f>] find_or_create_page+0x3f/0xb0 [<ffffffffa06983a9>] sync_node_pages+0x259/0x810 [f2fs] [<ffffffffa068d874>] write_checkpoint+0x1a4/0xce0 [f2fs] [<ffffffffa0686b0c>] f2fs_sync_fs+0x7c/0xd0 [f2fs] [<ffffffffa067c813>] f2fs_sync_file+0x143/0x5f0 [f2fs] [<ffffffff811d301b>] vfs_fsync_range+0x2b/0x40 [<ffffffff811d304c>] vfs_fsync+0x1c/0x20 [<ffffffff811d3291>] do_fsync+0x41/0x70 [<ffffffff811d32d3>] SyS_fdatasync+0x13/0x20 [<ffffffff817094a2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff The reason of this issue is: CPU0: CPU1: - f2fs_write_data_pages - f2fs_sync_fs - write_checkpoint - block_operations - f2fs_lock_all - down_write(sbi->cp_rwsem) - lock_page(page) - f2fs_write_data_page - sync_node_pages - flush_inline_data - pagecache_get_page(page, GFP_LOCK) - f2fs_lock_op - down_read(sbi->cp_rwsem) This patch alters to use trylock_page in flush_inline_data to fix this ABBA deadlock issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26f2fs: introduce f2fs_flush_merged_bios for cleanupChao Yu
Add a new helper f2fs_flush_merged_bios to clean up redundant codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26f2fs: introduce f2fs_update_data_blkaddr for cleanupChao Yu
Add a new help f2fs_update_data_blkaddr to clean up redundant codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26f2fs crypto: fix incorrect positioning for GCing encrypted data pageChao Yu
For now, flow of GCing an encrypted data page: 1) try to grab meta page in meta inode's mapping with index of old block address of that data page 2) load data of ciphertext into meta page 3) allocate new block address 4) write the meta page into new block address 5) update block address pointer in direct node page. Other reader/writer will use f2fs_wait_on_encrypted_page_writeback to check and wait on GCed encrypted data cached in meta page writebacked in order to avoid inconsistence among data page cache, meta page cache and data on-disk when updating. However, we will use new block address updated in step 5) as an index to lookup meta page in inner bio buffer. That would be wrong, and we will never find the GCing meta page, since we use the old block address as index of that page in step 1). This patch fixes the issue by adjust the order of step 1) and step 3), and in step 1) grab page with index generated in step 3). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26Orangefs: update orangefs.txtMike Marshall
Al Viro has cleaned up the way ops are processed and waited for, now orangefs.txt has an overview of how it works. Several recent related commits have added to the comments in the code as well. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26Orangefs: code sanitation.Mike Marshall
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26orangefs: remove unused 'diff' functionArnd Bergmann
orangefs contains a helper function to calculate the difference between two timeval structures. We are trying to remove all instances of timespec from the kernel, and this one is not used at all, so let's remove it now. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26orangefs: avoid time conversion functionArnd Bergmann
The new orangefs code uses a helper function to read a time field to its private structures from struct iattr. This will conflict with the move to 64-bit timestamps in the kernel and is generally not necessary. This replaces the conversion with a simple cast to time64_t that shows what is going on. As the orangefs-internal representation already uses 64-bit timestamps, there should be no ambiguity to negative values, and the cast ensures that we treat them as times before 1970 on both 32-bit and 64-bit architectures, rather than times after 2038. This patch keeps that behavior. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26Merge branch 'dev/control-ioctl' into for-chris-4.6David Sterba
2016-02-26Merge branch 'misc-4.6' into for-chris-4.6David Sterba
# Conflicts: # fs/btrfs/file.c
2016-02-26Merge branch 'cleanups-4.6' into for-chris-4.6David Sterba
2016-02-26Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6David Sterba
2016-02-26Merge branch 'foreign/josef/space-updates' into for-chris-4.6David Sterba
2016-02-26Merge branch 'foreign/zhaolei/reada' into for-chris-4.6David Sterba
2016-02-26Merge branch 'foreign/qu/norecovery-v7' into for-chris-4.6David Sterba
2016-02-26Merge branch 'dev/rename-keys' into for-chris-4.6David Sterba
2016-02-26Merge branch 'dev/gfp-flags' into for-chris-4.6David Sterba
2016-02-26Merge branch 'chandan/prep-subpage-blocksize' into for-chris-4.6David Sterba
# Conflicts: # fs/btrfs/file.c
2016-02-26configfs: Replace CURRENT_TIME by current_fs_time()Deepa Dinamani
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-02-25f2fs: fix incorrect upper bound when iterating inode mapping treeChao Yu
1. Inode mapping tree can index page in range of [0, ULONG_MAX], however, in some places, f2fs only search or iterate page in ragne of [0, LONG_MAX], result in miss hitting in page cache. 2. filemap_fdatawait_range accepts range parameters in unit of bytes, so the max range it covers should be [0, LLONG_MAX], if we use [0, LONG_MAX] as range for waiting on writeback, big number of pages will not be covered. This patch corrects above two issues. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-25Fix directory hardlinks from deleted directoriesDavid Woodhouse
When a directory is deleted, we don't take too much care about killing off all the dirents that belong to it — on the basis that on remount, the scan will conclude that the directory is dead anyway. This doesn't work though, when the deleted directory contained a child directory which was moved *out*. In the early stages of the fs build we can then end up with an apparent hard link, with the child directory appearing both in its true location, and as a child of the original directory which are this stage of the mount process we don't *yet* know is defunct. To resolve this, take out the early special-casing of the "directories shall not have hard links" rule in jffs2_build_inode_pass1(), and let the normal nlink processing happen for directories as well as other inodes. Then later in the build process we can set ic->pino_nlink to the parent inode#, as is required for directories during normal operaton, instead of the nlink. And complain only *then* about hard links which are still in evidence even after killing off all the unreachable paths. Reported-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@vger.kernel.org
2016-02-25jffs2: Fix page lock / f->sem deadlockDavid Woodhouse
With this fix, all code paths should now be obtaining the page lock before f->sem. Reported-by: Szabó Tamás <sztomi89@gmail.com> Tested-by: Thomas Betker <thomas.betker@rohde-schwarz.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@vger.kernel.org
2016-02-25Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"Thomas Betker
This reverts commit 5ffd3412ae55 ("jffs2: Fix lock acquisition order bug in jffs2_write_begin"). The commit modified jffs2_write_begin() to remove a deadlock with jffs2_garbage_collect_live(), but this introduced new deadlocks found by multiple users. page_lock() actually has to be called before mutex_lock(&c->alloc_sem) or mutex_lock(&f->sem) because jffs2_write_end() and jffs2_readpage() are called with the page locked, and they acquire c->alloc_sem and f->sem, resp. In other words, the lock order in jffs2_write_begin() was correct, and it is the jffs2_garbage_collect_live() path that has to be changed. Revert the commit to get rid of the new deadlocks, and to clear the way for a better fix of the original deadlock. Reported-by: Deng Chao <deng.chao1@zte.com.cn> Reported-by: Ming Liu <liu.ming50@gmail.com> Reported-by: wangzaiwei <wangzaiwei@top-vision.cn> Signed-off-by: Thomas Betker <thomas.betker@rohde-schwarz.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@vger.kernel.org
2016-02-24orangefs: clean up fill_default_sys_attrsMartin Brandenburg
Size and type are read-only and not in the mask. The times were left unset despite being in the mask. We zero-fill the times since the server will fill them in and we will get the correct time when we fill the inode with getattr. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24orangefs: we never lookup with sym_follow setMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24orangefs: remove vestigial async io codeMartin Brandenburg
I have verified that there is nothing in the userspace daemon version we are implementing this protocol against that ever looks at this field. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24orangefs: use ORANGEFS_NAME_LEN everywhere; remove ORANGEFS_NAME_MAXMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>