summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-07-15nfsd/blocklayout: Make sure calculate signature/designator length alignedKinglong Mee
These values are all multiples of 4 already, so there's no change in behavior from this patch. But perhaps this will prevent mistakes in the future. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-15xfs: abstract block export operations from nfsd layoutsBenjamin Coddington
Instead of creeping pnfs layout configuration into filesystems, move the definition of block-based export operations under a more abstract configuration. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Dave Chinner <david@fromorbit.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-15x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2H.J. Lu
Don't use the same syscall numbers for 2 different syscalls: 534 x32 preadv compat_sys_preadv64 535 x32 pwritev compat_sys_pwritev64 534 x32 preadv2 compat_sys_preadv2 535 x32 pwritev2 compat_sys_pwritev2 Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset is passed in one 64-bit register on x32, similar to compat_sys_preadv64() and compat_sys_pwritev64(). Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15ext4: verify extent header depthVegard Nossum
Although the extent tree depth of 5 should enough be for the worst case of 2*32 extents of length 1, the extent tree code does not currently to merge nodes which are less than half-full with a sibling node, or to shrink the tree depth if possible. So it's possible, at least in theory, for the tree depth to be greater than 5. However, even in the worst case, a tree depth of 32 is highly unlikely, and if the file system is maliciously corrupted, an insanely large eh_depth can cause memory allocation failures that will trigger kernel warnings (here, eh_depth = 65280): JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580 CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508 Stack: 604a8947 625badd8 0002fd09 00000000 60078643 00000000 62623910 601bf9bc 62623970 6002fc84 626239b0 900000125 Call Trace: [<6001c2dc>] show_stack+0xdc/0x1a0 [<601bf9bc>] dump_stack+0x2a/0x2e [<6002fc84>] __warn+0x114/0x140 [<6002fdff>] warn_slowpath_null+0x1f/0x30 [<60165829>] start_this_handle+0x569/0x580 [<60165d4e>] jbd2__journal_start+0x11e/0x220 [<60146690>] __ext4_journal_start_sb+0x60/0xa0 [<60120a81>] ext4_truncate+0x131/0x3a0 [<60123677>] ext4_setattr+0x757/0x840 [<600d5d0f>] notify_change+0x16f/0x2a0 [<600b2b16>] do_truncate+0x76/0xc0 [<600c3e56>] path_openat+0x806/0x1300 [<600c55c9>] do_filp_open+0x89/0xf0 [<600b4074>] do_sys_open+0x134/0x1e0 [<600b4140>] SyS_open+0x20/0x30 [<6001ea68>] handle_syscall+0x88/0x90 [<600295fd>] userspace+0x3fd/0x500 [<6001ac55>] fork_handler+0x85/0x90 ---[ end trace 08b0b88b6387a244 ]--- [ Commit message modified and the extent tree depath check changed from 5 to 32 -- tytso ] Cc: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-07-14ext4: short-cut orphan cleanup on errorVegard Nossum
If we encounter a filesystem error during orphan cleanup, we should stop. Otherwise, we may end up in an infinite loop where the same inode is processed again and again. EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters Aborting journal on device loop0-8. EXT4-fs (loop0): Remounting filesystem read-only EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed! [...] EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed! [...] EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed! [...] See-also: c9eb13a9105 ("ext4: fix hang when processing corrupted orphaned inode list") Cc: Jan Kara <jack@suse.cz> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-07-14ext4: fix reference counting bug on block allocation errorVegard Nossum
If we hit this error when mounted with errors=continue or errors=remount-ro: EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to continue. However, ext4_mb_release_context() is the wrong thing to call here since we are still actually using the allocation context. Instead, just error out. We could retry the allocation, but there is a possibility of getting stuck in an infinite loop instead, so this seems safer. [ Fixed up so we don't return EAGAIN to userspace. --tytso ] Fixes: 8556e8f3b6 ("ext4: Don't allow new groups to be added during block allocation") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: stable@vger.kernel.org
2016-07-14NFSv4: Revert "Truncating file opens should also sync O_DIRECT writes"Trond Myklebust
We're not holding any locks, so both nfs_wb_all() and inode_dio_wait() are unenforcible and have livelock potential. Just limit ourselves to flushing out the data. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-14chardev: add missing line break in pr_warnFengguang Wu
To fix super long dmesg error lines like CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation rangeCHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation rangeswapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK) After fix, it should look like CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation range CHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation range swapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK) Reported-by: Philip Li <philip.li@intel.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-13nfsd: Fix some indent inconsistancyChristophe JAILLET
Silent a few smatch warnings about indentation Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13nfsd: Correct a comment for NFSD_MAY_ defines locationOleg Drokin
Those are now defined in fs/nfsd/vfs.h Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13nfsd: Add a super simple flex file serverTom Haynes
Have a simple flex file server where the mds (NFSv4.1 or NFSv4.2) is also the ds (NFSv3). I.e., the metadata and the data file are the exact same file. This will allow testing of the flex file client. Simply add the "pnfs" export option to your export in /etc/exports and mount from a client that supports flex files. Signed-off-by: Tom Haynes <loghyr@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13nfsd: flex file device id encoding will need the server addressTom Haynes
Signed-off-by: Tom Haynes <loghyr@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13nfsd: implement machine credential support for some operationsAndrew Elble
This addresses the conundrum referenced in RFC5661 18.35.3, and will allow clients to return state to the server using the machine credentials. The biggest part of the problem is that we need to allow the client to send a compound op with integrity/privacy on mounts that don't have it enabled. Add server support for properly decoding and using spo_must_enforce and spo_must_allow bits. Add support for machine credentials to be used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN, and TEST/FREE STATEID. Implement a check so as to not throw WRONGSEC errors when these operations are used if integrity/privacy isn't turned on. Without this, Linux clients with credentials that expired while holding delegations were getting stuck in an endless loop. Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13nfsd: allow mach_creds_match to be used more broadlyAndrew Elble
Rename mach_creds_match() to nfsd4_mach_creds_match() and un-staticify Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-12pmem: kill __pmem address spaceDan Williams
The __pmem address space was meant to annotate codepaths that touch persistent memory and need to coordinate a call to wmb_pmem(). Now that wmb_pmem() is gone, there is little need to keep this annotation. Cc: Christoph Hellwig <hch@lst.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12fs/dax: remove wmb_pmem()Dan Williams
Flushing posted-write queues is now deferred to REQ_FLUSH context, or otherwise handled by an ADR event at the platform level. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12cifs: Check for existing directory when opening file with O_CREATSachin Prabhu
When opening a file with O_CREAT flag, check to see if the file opened is an existing directory. This prevents the directory from being opened which subsequently causes a crash when the close function for directories cifs_closedir() is called which frees up the file->private_data memory while the file is still listed on the open file list for the tcon. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reported-by: Xiaoli Feng <xifeng@redhat.com>
2016-07-12GFS2: Check rs_free with rd_rsspin protectionBob Peterson
For the last process to close a file opened for write, function gfs2_rsqa_delete was deleting the file's inode's block reservation out of the rgrp reservations tree. Then it was checking to make sure rs_free was 0, but it was performing the check outside the protection of rd_rsspin spin_lock. The rd_rsspin spin_lock protection is needed to prevent a race between the process freeing the reservation and another who is allocating a new set of blocks inside the same rgrp for the same inode, thus changing its value. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-07-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: posix_acl: de-union a_refcount and a_rcu nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashed Use the right predicate in ->atomic_open() instances
2016-07-11Add MF-Symlinks support for SMB 2.0Sachin Prabhu
We should be able to use the same helper functions used for SMB 2.1 and later versions. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-07-11NFS: Don't drop CB requests with invalid principalsChuck Lever
Before commit 778be232a207 ("NFS do not find client in NFSv4 pg_authenticate"), the Linux callback server replied with RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB request. Let's restore that behavior so the server has a chance to do something useful about it, and provide a warning that helps admins correct the problem. Fixes: 778be232a207 ("NFS do not find client in NFSv4 ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-10ext4 crypto: migrate into vfs's crypto engineJaegeuk Kim
This patch removes the most parts of internal crypto codes. And then, it modifies and adds some ext4-specific crypt codes to use the generic facility. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-07-10configfs: don't set buffer_needs_fill to zero if show() returns errorTal Shorer
A confgifs attribute's show() callback is called once the first time the user attempts to read from it. If it returns an error, that error is returned to the user. However, the open file's buffer_needs_fill is still set to zero and consecutive read() calls will find an empty buffer that doesn't need filling and return 0 to the user. This could give the user the wrong impression that the attribute was read successfully. Fix this by not setting buffer_needs_fill if show() returns an error, making consecutive read() calls call show() again and either get an error again or get data. Signed-off-by: Tal Shorer <tal.shorer@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-07-08Merge branch 'topic/cec' into patchworkMauro Carvalho Chehab
* topic/cec: [media] DocBook/media: add CEC documentation [media] s5p_cec: get rid of an unused var [media] move s5p-cec to staging [media] vivid: add CEC emulation [media] cec: s5p-cec: Add s5p-cec driver [media] cec: adv7511: add cec support [media] cec: adv7842: add cec support [media] cec: adv7604: add cec support [media] cec: add compat32 ioctl support [media] cec/TODO: add TODO file so we know why this is still in staging [media] cec: add HDMI CEC framework (api) [media] cec: add HDMI CEC framework (adapter) [media] cec: add HDMI CEC framework (core) [media] cec-funcs.h: static inlines to pack/unpack CEC messages [media] cec.h: add cec header [media] cec-edid: add module for EDID CEC helper functions [media] cec.txt: add CEC framework documentation [media] rc: Add HDMI CEC protocol handling
2016-07-08f2fs: avoid mark_inode_dirtyJaegeuk Kim
Let's check inode's dirtiness before calling mark_inode_dirty. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: move i_size_write in f2fs_write_endJaegeuk Kim
We don't need to do i_size_write under page lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: fix to avoid redundant discard during fstrimChao Yu
With below test steps, f2fs will issue redundant discard when doing fstrim, the reason is that we issue discards for both prefree segments and consecutive freed region user wants to trim, part regions they covered are overlapped, here, we change to do not to issue any discards for prefree segments in trimmed range. 1. mount -t f2fs -o discard /dev/zram0 /mnt/f2fs 2. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/ 3. dd if=/dev/zero of=/mnt/f2fs/a bs=2M count=1 4. dd if=/dev/zero of=/mnt/f2fs/b bs=1M count=1 5. sync 6. rm /mnt/f2fs/a /mnt/f2fs/b 7. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/ Before: <...>-5428 [001] ...1 9511.052125: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x200 <...>-5428 [001] ...1 9511.052787: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300 After: <...>-6764 [000] ...1 9720.382504: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: avoid mismatching block range for discardYunlei He
This patch skip discard block range smaller than trim_minlen, and can not be merged by neighbour Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: fix incorrect f_bfree calculation in ->statfsChao Yu
As manual described, f_bfree indicates total free blocks in fs, in f2fs, it includes two parts: visible free blocks and over-provision blocks. This patch corrrects the calculation. fsblkcnt_t f_bfree; /* free blocks in fs */ Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: use percpu_rw_semaphoreJaegeuk Kim
This patch replaces rw_semaphore with percpu_rw_semaphore for: sbi->cp_rwsem nm_i->nat_tree_lock Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: skip to check the block address of node pageJaegeuk Kim
If the node page is up-to-date, it should be alive. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: shrink critical region in spin_lockJaegeuk Kim
This patch shrinks the critical region in spin_lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: call SetPageUptodate if neededJaegeuk Kim
SetPageUptodate() issues memory barrier, resulting in performance degrdation. Let's avoid that. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: introduce f2fs_set_page_dirty_nobufferJaegeuk Kim
This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer. When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the overall performance. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: remove unnecessary goto statementTiezhu Yang
When base_addr is NULL, there is no need to call kzfree, it should return -ENOMEM directly. Additionally, it is better to initialize variable 'error' with 0. Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: add nodiscard mount optionChao Yu
This patch adds 'nodiscard' mount option. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: fix to redirty page if fail to gc data pageChao Yu
If we fail to move data page during foreground GC, we should give another chance to writeback that page which was set dirty previously by writer. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: fix to detect truncation prior rather than EIO during readChao Yu
In procedure of synchonized read, after sending out the read request, reader will try to lock the page for waiting device to finish the read jobs and unlock the page, but meanwhile, truncater will race with reader, so after reader get lock of the page, it should check page's mapping to detect whether someone has truncated the page in advance, then reader has the chance to do the retry if truncation was done, otherwise read can be failed due to previous condition check. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: fix to avoid reading out encrypted data in page cacheChao Yu
For encrypted inode, if user overwrites data of the inode, f2fs will read encrypted data into page cache, and then do the decryption. However reader can race with overwriter, and it will see encrypted data which has not been decrypted by overwriter yet. Fix it by moving decrypting work to background and keep page non-uptodated until data is decrypted. Thread A Thread B - f2fs_file_write_iter - __generic_file_write_iter - generic_perform_write - f2fs_write_begin - f2fs_submit_page_bio - generic_file_read_iter - do_generic_file_read - lock_page_killable - unlock_page - copy_page_to_iter hit the encrypted data in updated page - lock_page - fscrypt_decrypt_page Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08Merge tag 'ecryptfs-4.7-rc7-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull eCryptfs fixes from Tyler Hicks: "Provide a more concise fix for CVE-2016-1583: - Additionally fixes linux-stable regressions caused by the cherry-picking of the original fix Some very minor changes that have queued up: - Fix typos in code comments - Remove unnecessary check for NULL before destroying kmem_cache" * tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: ecryptfs: don't allow mmap when the lower fs doesn't support it Revert "ecryptfs: forbid opening files without mmap handler" ecryptfs: fix spelling mistakes eCryptfs: fix typos in comment ecryptfs: drop null test before destroy functions
2016-07-08ecryptfs: don't allow mmap when the lower fs doesn't support itJeff Mahoney
There are legitimate reasons to disallow mmap on certain files, notably in sysfs or procfs. We shouldn't emulate mmap support on file systems that don't offer support natively. CVE-2016-1583 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org [tyhicks: clean up f_op check by using ecryptfs_file_to_lower()] Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2016-07-07Revert "ecryptfs: forbid opening files without mmap handler"Jeff Mahoney
This reverts commit 2f36db71009304b3f0b95afacd8eba1f9f046b87. It fixed a local root exploit but also introduced a dependency on the lower file system implementing an mmap operation just to open a file, which is a bit of a heavy hammer. The right fix is to have mmap depend on the existence of the mmap handler instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2016-07-07Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block IO fixes from Jens Axboe: "Three small fixes that have been queued up and tested for this series: - A bug fix for xen-blkfront from Bob Liu, fixing an issue with incomplete requests during migration. - A fix for an ancient issue in retrieving the IO priority of a different PID than self, preventing that task from going away while we access it. From Omar. - A writeback fix from Tahsin, fixing a case where we'd call ihold() with a zero ref count inode" * 'for-linus' of git://git.kernel.dk/linux-block: block: fix use-after-free in sys_ioprio_get() writeback: inode cgroup wb switch should not call ihold() xen-blkfront: save uncompleted reqs in blkfront_resume()
2016-07-07Merge tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfsLinus Torvalds
Pull configfs fix from Christoph Hellwig: "A fix from Marek for ppos handling in configfs_write_bin_file, which was introduced in Linux 4.5, but didn't have any users until recently" * tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfs: configfs: Remove ppos increment in configfs_write_bin_file
2016-07-07Btrfs: use FLUSH_LIMIT for relocation in reserve_metadata_bytesJosef Bacik
We used to allow you to set FLUSH_ALL and then just wouldn't do things like commit transactions or wait on ordered extents if we noticed you were in a transaction. However now that all the flushing for FLUSH_ALL is asynchronous we've lost the ability to tell, and we could end up deadlocking. So instead use FLUSH_LIMIT in reserve_metadata_bytes in relocation and then return -EAGAIN if we error out to preserve the previous behavior. I've also added an ASSERT() to catch anybody else who tries to do this. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-07Btrfs: fill relocation block rsv after allocationJosef Bacik
Since we set the reloc control before we've reserved our space for relocation we could race with a root being dirtied and not actually have space to do our init reloc root. So once we've allocated it and set it up go ahead and make our reservation before setting the relocate control, that way anybody who tries to do the reloc root init has space to use. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-07Btrfs: always use trans->block_rsv for orphansJosef Bacik
This is the case all the time anyway except for relocation which could be doing a reloc root for a non ref counted root, in which case we'd end up with some random block rsv rather than the one we have our reservation in. If there isn't enough space in the block rsv we are trying to steal from we'll BUG() because we expect there to be space for the orphan to make its reservation. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-07Btrfs: change how we calculate the global block rsvJosef Bacik
Traditionally we've calculated the global block rsv by guessing how much of the metadata used amount was the extent tree, and then taking the data size and figuring out how large the csum tree would have to be to hold that much data. This is imprecise and falls down on MIXED file systems as we can't trust the data used amount. This resulted in failures for xfstests generic/333 because it creates lots of clones, which explodes out the extent tree. Our global reserve calculations were woefully inaccurate in this case which meant we got into a situation where we did not have enough reserved to do our work. We know we only use the global block rsv for the extent, csum, and root trees, so just get the bytes used for these trees and use that as the basis of our global reserve. Since these are not reference counted trees the bytes_used value will be accurate. This fixed the transaction aborts seen with generic/333. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-07Btrfs: use root when checking need_async_flushJosef Bacik
Instead of doing fs_info->fs_root in need_async_flush, which may not be set during recovery when mounting, just pass the root itself in, which makes more sense as thats what btrfs_calc_reclaim_metadata_size takes. Signed-off-by: Josef Bacik <jbacik@fb.com> Reported-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-07Btrfs: don't bother kicking async if there's nothing to reclaimJosef Bacik
We do this check when we start the async reclaimer thread, might as well check before we kick it off to save us some cycles. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>