summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-04-25f2fs: fix multiple f2fs_add_link() having same name for inline dentrySheng Yong
Commit 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having same name) does not cover the scenario where inline dentry is enabled. In that case, F2FS_I(dir)->task will be NULL, and __f2fs_add_link will lookup dentries one more time. This patch fixes it by moving the assigment of current task to a upper level to cover both normal and inline dentry. Cc: <stable@vger.kernel.org> Fixes: 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having same name) Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-25nfsd: stricter decoding of write-like NFSv2/v3 opsJ. Bruce Fields
The NFSv2/v3 code does not systematically check whether we decode past the end of the buffer. This generally appears to be harmless, but there are a few places where we do arithmetic on the pointers involved and don't account for the possibility that a length could be negative. Add checks to catch these. Reported-by: Tuomas Haanpää <thaan@synopsys.com> Reported-by: Ari Kauppi <ari@synopsys.com> Reviewed-by: NeilBrown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25nfsd4: minor NFSv2/v3 write decoding cleanupJ. Bruce Fields
Use a couple shortcuts that will simplify a following bugfix. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25nfsd: check for oversized NFSv2/v3 argumentsJ. Bruce Fields
A client can append random data to the end of an NFSv2 or NFSv3 RPC call without our complaining; we'll just stop parsing at the end of the expected data and ignore the rest. Encoded arguments and replies are stored together in an array of pages, and if a call is too large it could leave inadequate space for the reply. This is normally OK because NFS RPC's typically have either short arguments and long replies (like READ) or long arguments and short replies (like WRITE). But a client that sends an incorrectly long reply can violate those assumptions. This was observed to cause crashes. Also, several operations increment rq_next_page in the decode routine before checking the argument size, which can leave rq_next_page pointing well past the end of the page array, causing trouble later in svc_free_pages. So, following a suggestion from Neil Brown, add a central check to enforce our expectation that no NFSv2/v3 call has both a large call and a large reply. As followup we may also want to rewrite the encoding routines to check more carefully that they aren't running off the end of the page array. We may also consider rejecting calls that have any extra garbage appended. That would be safer, and within our rights by spec, but given the age of our server and the NFS protocol, and the fact that we've never enforced this before, we may need to balance that against the possibility of breaking some oddball client. Reported-by: Tuomas Haanpää <thaan@synopsys.com> Reported-by: Ari Kauppi <ari@synopsys.com> Cc: stable@vger.kernel.org Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25NFSv3: nfs3_nlm_alloc_call should be declared staticTrond Myklebust
Fix compiler warnings. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-25block: remove block_device_operations ->direct_access()Dan Williams
Now that all the producers and consumers of dax interfaces have been converted to using dax_operations on a dax_device, remove the block device direct_access enabling. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25block, dax: convert bdev_dax_supported() to dax_direct_access()Dan Williams
Kill of the final user of bdev_direct_access() and struct blk_dax_ctl. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25filesystem-dax: convert to dax_direct_access()Dan Williams
Now that a dax_device is plumbed through all dax-capable drivers we can switch from block_device_operations to dax_operations for invoking ->direct_access. This also lets us kill off some usages of struct blk_dax_ctl on the way to its eventual removal. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25Revert "block: use DAX for partition table reads"Dan Williams
commit d1a5f2b4d8a1 ("block: use DAX for partition table reads") was part of a stalled effort to allow dax mappings of block devices. Since then the device-dax mechanism has filled the role of dax-mapping static device ranges. Now that we are moving ->direct_access() from a block_device operation to a dax_inode operation we would need block devices to map and carry their own dax_inode reference. Unless / until we decide to revive dax mapping of raw block devices through the dax_inode scheme, there is no need to carry read_dax_sector(). Its removal in turn allows for the removal of bdev_direct_access() and should have been included in commit 223757016837 ("block_dev: remove DAX leftovers"). Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25ext2, ext4, xfs: retrieve dax_device for iomap operationsDan Williams
In preparation for converting fs/dax.c to use dax_direct_access() instead of bdev_direct_access(), add the plumbing to retrieve the dax_device associated with a given block_device. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25NFS: Don't write back further requests if there is a pending write errorTrond Myklebust
If the server has already returned a fatal write error that the user has not yet received on this file, then don't write back the other pages. Instead, act as if they have been sent, and have returned with the same error. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-25pNFS: Fix use after free issues in pnfs_do_read()Trond Myklebust
The assumption should be that if the caller returns PNFS_ATTEMPTED, then hdr has been consumed, and so we should not be testing hdr->task.tk_status. If the caller returns PNFS_TRY_AGAIN, then we need to recoalesce and free hdr. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-25ceph: fix recursion between ceph_set_acl() and __ceph_setattr()Yan, Zheng
ceph_set_acl() calls __ceph_setattr() if the setacl operation needs to modify inode's i_mode. __ceph_setattr() updates inode's i_mode, then calls posix_acl_chmod(). The problem is that __ceph_setattr() calls posix_acl_chmod() before sending the setattr request. The get_acl() call in posix_acl_chmod() can trigger a getxattr request. The reply of the getxattr request can restore inode's i_mode to its old value. The set_acl() call in posix_acl_chmod() sees old value of inode's i_mode, so it calls __ceph_setattr() again. Cc: stable@vger.kernel.org # needs backporting for < 4.9 Link: http://tracker.ceph.com/issues/19688 Reported-by: Jerry Lee <leisurelysw24@gmail.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Tested-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-04-25xfs: better log intent item refcount checkingDarrick J. Wong
Use ASSERTs on the log intent item refcounts so that we fail noisily if anyone tries to double-free the item. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-04-25xfs: fix up quotacheck buffer list error handlingBrian Foster
The quotacheck error handling of the delwri buffer list assumes the resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag on the buffers that are dequeued. This can lead to assert failures on buffer release and possibly other locking problems. Move this code to a delwri queue cancel helper function to encapsulate the logic required to properly release buffers from a delwri queue. Update the helper to clear the delwri queue flag and call it from quotacheck. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: remove xfs_trans_ail_delete_bulkChristoph Hellwig
xfs_iflush_done uses an on-stack variable length array to pass the log items to be deleted to xfs_trans_ail_delete_bulk. On-stack VLAs are a nasty gcc extension that can lead to unbounded stack allocations, but fortunately we can easily avoid them by simply open coding xfs_trans_ail_delete_bulk in xfs_iflush_done, which is the only caller of it except for the single-item xfs_trans_ail_delete. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: don't use bool values in trace buffersChristoph Hellwig
Using bool values produces sparse warnings of this form: fs/xfs/./xfs_trace.h:2252:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) fs/xfs/./xfs_trace.h:2252:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) fs/xfs/./xfs_trace.h:2278:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) fs/xfs/./xfs_trace.h:2278:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) fs/xfs/./xfs_trace.h:2307:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) Just use a char instead to fix those up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: fix getfsmap userspace memory corruption while setting OF_LASTDarrick J. Wong
At the end of a getfsmap call, we will set FMR_OF_LAST in the last struct fsmap that was handed in by userspace if we've truly run out of space mapping record (as opposed to simply running out of space in the user array). Unfortunately, fmh_entries is the wrong check for whether or not we've filled out anything in the user array because the ioctl provides that fmh_count==0 sets fmh_entries without filling out the user array. Therefore we end up writing things into user memory areas that we weren't given, and kaboom. Since Christoph amended the getfsmap structure to track the number of fsmap entries we've actually filled out, use that as part of deciding if we have to set the OF_LAST flag. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-04-25xfs: fix __user annotations for xfs_ioc_getfsmapChristoph Hellwig
By passing the whole fsmap_head structure and an index we can get the user point annotations right for the embedded variable sized array in struct fsmap_head. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: change idx to unsigned int] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: corruption needs to respect endianess too!Christoph Hellwig
At least if we want to be able to recognize the pattern. Add a missing byte swap to the corruption injection case in xlog_sync. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmapChristoph Hellwig
Found by sparse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmapChristoph Hellwig
Found by sparse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: simplify validation of the unwritten extent bitChristoph Hellwig
XFS only supports the unwritten extent bit in the data fork, and only if the file system has a version 5 superblock or the unwritten extent feature bit. We currently have two routines that validate the invariant: xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met, and xfs_validate_extent that triggers and assert in debug build. Both of them iterate over all extents of an inode fork when called, which isn't very efficient. This patch instead adds a new helper that verifies the invariant one extent at a time, and calls it from the places where we iterate over all extents to converted them from or two the in-memory format. The callers then return -EFSCORRUPTED when reading invalid extents from disk, or trigger an assert when writing them to disk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: remove unused values from xfs_exntst_tChristoph Hellwig
We only ever use the normal and unwritten states. And the actual ondisk format (this enum isn't despite being in xfs_format.h) only has space for the unwritten bit anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: remove the unused XFS_MAXLINK_1 defineChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: more do_div cleanupsEric Sandeen
On some architectures do_div does the pointer compare trick to make sure that we've sent it an unsigned 64-bit number. (Why unsigned? I don't know.) Fix up the few places that squawk about this; in xfs_bmap_wants_extents() we just used a bare int64_t so change that to unsigned. In xfs_adjust_extent_unmap_boundaries() all we wanted was the mod, and we have an xfs-specific function to handle that w/o side effects, which includes proper casting for do_div. In xfs_daddr_to_ag[b]no, we were using the wrong type anyway; XFS_BB_TO_FSBT returns a block in the filesystem, so use xfs_rfsblock_t not xfs_daddr_t, and gain the unsignedness from that type as a bonus. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: remove use of do_div with 32-bit dividend in quotaEric Sandeen
The kbuild test robot caught this; in debug code we have another caller of do_div with a 32-bit dividend (j) which is caught now that we are using the kernel-supplied do_div. None of the values used here are 64-bit; just use simple division. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: remove the trailing newline used in the fmt parameter of TP_printkHou Tao
The trailing newlines wil lead to extra newlines in the trace file which looks like the following output, so remove them. >kworker/4:1H-1508 [004] .... 47879.101608: xfs_discard_extent: dev 8:0 > >kworker/u16:2-238 [004] .... 47879.101725: xfs_extent_busy_clear: dev 8:0 Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix the getfsmap tracepoints too] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: prevent multi-fsb dir readahead from reading random blocksBrian Foster
Directory block readahead uses a complex iteration mechanism to map between high-level directory blocks and underlying physical extents. This mechanism attempts to traverse the higher-level dir blocks in a manner that handles multi-fsb directory blocks and simultaneously maintains a reference to the corresponding physical blocks. This logic doesn't handle certain (discontiguous) physical extent layouts correctly with multi-fsb directory blocks. For example, consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory block size and a directory with the following extent layout: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..7]: 88..95 0 (88..95) 8 1: [8..15]: 80..87 0 (80..87) 8 2: [16..39]: 168..191 0 (168..191) 24 3: [40..63]: 5242952..5242975 1 (72..95) 24 Directory block 0 spans physical extents 0 and 1, dirblk 1 lies entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because extent 2 is larger than the directory block size, the readahead code erroneously assumes the block is contiguous and issues a readahead based on the physical mapping of the first fsb of the dirblk. This results in read verifier failure and a spurious corruption or crc failure, depending on the filesystem format. Further, the subsequent readahead code responsible for walking through the physical table doesn't correctly advance the physical block reference for dirblk 2. Instead of advancing two physical filesystem blocks, the first iteration of the loop advances 1 block (correctly), but the subsequent iteration advances 2 more physical blocks because the next physical extent (extent 3, above) happens to cover more than dirblk 2. At this point, the higher-level directory block walking is completely off the rails of the actual physical layout of the directory for the respective mapping table. Update the contiguous dirblock logic to consider the current offset in the physical extent to avoid issuing directory readahead to unrelated blocks. Also, update the mapping table advancing code to consider the current offset within the current dirblock to avoid advancing the mapping reference too far beyond the dirblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: handle array index overrun in xfs_dir2_leaf_readbuf()Eric Sandeen
Carlos had a case where "find" seemed to start spinning forever and never return. This was on a filesystem with non-default multi-fsb (8k) directory blocks, and a fragmented directory with extents like this: 0:[0,133646,2,0] 1:[2,195888,1,0] 2:[3,195890,1,0] 3:[4,195892,1,0] 4:[5,195894,1,0] 5:[6,195896,1,0] 6:[7,195898,1,0] 7:[8,195900,1,0] 8:[9,195902,1,0] 9:[10,195908,1,0] 10:[11,195910,1,0] 11:[12,195912,1,0] 12:[13,195914,1,0] ... i.e. the first extent is a contiguous 2-fsb dir block, but after that it is fragmented into 1 block extents. At the top of the readdir path, we allocate a mapping array which (for this filesystem geometry) can hold 10 extents; see the assignment to map_info->map_size. During readdir, we are therefore able to map extents 0 through 9 above into the array for readahead purposes. If we count by 2, we see that the last mapped index (9) is the first block of a 2-fsb directory block. At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill more readahead; the outer loop assumes one full dir block is processed each loop iteration, and an inner loop that ensures that this is so by advancing to the next extent until a full directory block is mapped. The problem is that this inner loop may step past the last extent in the mapping array as it tries to reach the end of the directory block. This will read garbage for the extent length, and as a result the loop control variable 'j' may become corrupted and never fail the loop conditional. The number of valid mappings we have in our array is stored in map->map_valid, so stop this inner loop based on that limit. There is an ASSERT at the top of the outer loop for this same condition, but we never made it out of the inner loop, so the ASSERT never fired. Huge appreciation for Carlos for debugging and isolating the problem. Debugged-and-analyzed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25iomap_dio_rw: Prevent reading file data beyond iomap_dio->i_sizeChandan Rajendra
On a ppc64 machine executing overlayfs/019 with xfs as the lower and upper filesystem causes the following call trace, WARNING: CPU: 2 PID: 8034 at /root/repos/linux/fs/iomap.c:765 .iomap_dio_actor+0xcc/0x420 Modules linked in: CPU: 2 PID: 8034 Comm: fsstress Tainted: G L 4.11.0-rc5-next-20170405 #100 task: c000000631314880 task.stack: c0000003915d4000 NIP: c00000000035a72c LR: c00000000035a6f4 CTR: c00000000035a660 REGS: c0000003915d7570 TRAP: 0700 Tainted: G L (4.11.0-rc5-next-20170405) MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI> CR: 24004284 XER: 00000000 CFAR: c0000000006f7190 SOFTE: 1 GPR00: c00000000035a6f4 c0000003915d77f0 c0000000015a3f00 000000007c22f600 GPR04: 000000000022d000 0000000000002600 c0000003b2d56360 c0000003915d7960 GPR08: c0000003915d7cd0 0000000000000002 0000000000002600 c000000000521cc0 GPR12: 0000000024004284 c00000000fd80a00 000000004b04ae64 ffffffffffffffff GPR16: 000000001000ca70 0000000000000000 c0000003b2d56380 c00000000153d2b8 GPR20: 0000000000000010 c0000003bc87bac8 0000000000223000 000000000022f5ff GPR24: c0000003b2d56360 000000000000000c 0000000000002600 000000000022d000 GPR28: 0000000000000000 c0000003915d7960 c0000003b2d56360 00000000000001ff NIP [c00000000035a72c] .iomap_dio_actor+0xcc/0x420 LR [c00000000035a6f4] .iomap_dio_actor+0x94/0x420 Call Trace: [c0000003915d77f0] [c00000000035a6f4] .iomap_dio_actor+0x94/0x420 (unreliable) [c0000003915d78f0] [c00000000035b9f4] .iomap_apply+0xf4/0x1f0 [c0000003915d79d0] [c00000000035c320] .iomap_dio_rw+0x230/0x420 [c0000003915d7ae0] [c000000000512a14] .xfs_file_dio_aio_read+0x84/0x160 [c0000003915d7b80] [c000000000512d24] .xfs_file_read_iter+0x104/0x130 [c0000003915d7c10] [c0000000002d6234] .__vfs_read+0x114/0x1a0 [c0000003915d7cf0] [c0000000002d7a8c] .vfs_read+0xac/0x1a0 [c0000003915d7d90] [c0000000002d96b8] .SyS_read+0x58/0x100 [c0000003915d7e30] [c00000000000b8e0] system_call+0x38/0xfc Instruction dump: 78630020 7f831b78 7ffc07b4 7c7ce039 40820360 a13d0018 2f890003 419e0288 2f890004 419e00a0 2f890001 419e02a8 <0fe00000> 3b80fffb 38210100 7f83e378 The above problem can also be recreated on a regular xfs filesystem using the command, $ fsstress -d /mnt -l 1000 -n 1000 -p 1000 The reason for the call trace is, 1. When 'reserving' blocks for delayed allocation , XFS reserves more blocks (i.e. past file's current EOF) than required. This is done because XFS assumes that userspace might write more data and hence 'reserving' more blocks might lead to the file's new data being stored contiguously on disk. 2. The in-memory 'struct xfs_bmbt_irec' mapping the file's last extent would then cover the prealloc-ed EOF blocks in addition to the regular blocks. 3. When flushing the dirty blocks to disk, we only flush data till the file's EOF. But before writing out the dirty data, we allocate blocks on the disk for holding the file's new data. This allocation includes the blocks that are part of the 'prealloc EOF blocks'. 4. Later, when the last reference to the inode is being closed, XFS frees the unused 'prealloc EOF blocks' in xfs_inactive(). In step 3 above, When allocating space on disk for the delayed allocation range, the space allocator might sometimes allocate less blocks than required. If such an allocation ends right at the current EOF of the file, We will not be able to clear the "delayed allocation" flag for the 'prealloc EOF blocks', since we won't have dirty buffer heads associated with that range of the file. In such a situation if a Direct I/O read operation is performed on file range [X, Y] (where X < EOF and Y > EOF), we flush dirty data in the range [X, Y] and invalidate page cache for that range (Refer to iomap_dio_rw()). Later for performing the Direct I/O read, XFS obtains the extent items (which are still cached in memory) for the file range. When doing so we are not supposed to get an extent item with IOMAP_DELALLOC flag set, since the previous "flush" operation should have converted any delayed allocation data in the range [X, Y]. Hence we end up hitting a WARN_ON_ONCE(1) statement in iomap_dio_actor(). This commit fixes the bug by preventing the read operation from going beyond iomap_dio->i_size. Reported-by: Santhosh G <santhog4@linux.vnet.ibm.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: remove bmap block allocation retriesChristoph Hellwig
Now that reflink operations don't set the firstblock value we don't need the workarounds for non-NULL firstblock values without a prior allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: remove xfs_bmap_remap_allocChristoph Hellwig
The main thing that xfs_bmap_remap_alloc does is fixing the AGFL, similar to what we do in the space allocator. But the reflink code doesn't touch the allocation btree unlike the normal space allocator, so we couldn't care less about the state of the AGFL. So remove xfs_bmap_remap_alloc and just handle the di_nblocks update in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: introduce xfs_bmapi_remapChristoph Hellwig
Add a new helper to be used for reflink extent list additions instead of funneling them through xfs_bmapi_write and overloading the firstblock member in struct xfs_bmalloca and struct xfs_alloc_args. With some small changes to xfs_bmap_remap_alloc this also means we do not need a xfs_bmalloca structure for this case at all. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: pass individual arguments to xfs_bmap_add_extent_hole_realChristoph Hellwig
For the reflink case we'd much rather pass the required arguments than faking up a struct xfs_bmalloca. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: remove attr fork handling in xfs_bmap_finish_oneChristoph Hellwig
We never do COW operations for the attr fork, so don't pretend we handle them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25xfs: fix integer truncation in xfs_bmap_remap_allocChristoph Hellwig
bno should be a xfs_fsblock_t, which is 64-bit wides instead of a xfs_aglock_t, which truncates the value to 32 bits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25pNFS: Ensure we check layout segment validity in the pg_init() callbackTrond Myklebust
If we have a layout segment cached in pgio->pg_lseg, we should check it for validity before reusing it in a new RPC request. Otherwise, if we recoalesce, we can end up looping forever. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-25fanotify: don't expose EOPENSTALE to userspaceAmir Goldstein
When delivering an event to userspace for a file on an NFS share, if the file is deleted on server side before user reads the event, user will not get the event. If the event queue contained several events, the stale event is quietly dropped and read() returns to user with events read so far in the buffer. If the event queue contains a single stale event or if the stale event is a permission event, read() returns to user with the kernel internal error code 518 (EOPENSTALE), which is not a POSIX error code. Check the internal return value -EOPENSTALE in fanotify_read(), just the same as it is checked in path_openat() and drop the event in the cases that it is not already dropped. This is a reproducer from Marko Rauhamaa: Just take the example program listed under "man fanotify" ("fantest") and follow these steps: ============================================================== NFS Server NFS Client(1) NFS Client(2) ============================================================== # echo foo >/nfsshare/bar.txt # cat /nfsshare/bar.txt foo # ./fantest /nfsshare Press enter key to terminate. Listening for events. # rm -f /nfsshare/bar.txt # cat /nfsshare/bar.txt read: Unknown error 518 cat: /nfsshare/bar.txt: Operation not permitted ============================================================== where NFS Client (1) and (2) are two terminal sessions on a single NFS Client machine. Reported-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com> Tested-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com> Cc: <linux-api@vger.kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-24f2fs: skip encrypted inode in ASYNC IPU policyHou Pengyang
Async request may be throttled in block layer, so page for async may keep WRITE_BACK for a long time. For encrytped inode, we need wait on page writeback no matter if the device supports BDI_CAP_STABLE_WRITES. This may result in a higher waiting page writeback time for async encrypted inode page. This patch skips IPU for encrypted inode's updating write. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: fix out-of free segmentsJaegeuk Kim
This patch also reverts d0db7703ac1 ("f2fs: do SSR in higher priority"). This patch fixes out of free segments caused by many small file creation by 1) mkfs -s 1 2G 2) mount 3) untar - preoduce 60000 small files burstly 4) sync - flush node pages - flush imeta Here, when we do f2fs_balance_fs, we missed # of imeta blocks, resulting in skipping to check has_not_enough_free_secs. Another test is done by 1) mkfs -s 12 2G 2) mount 3) untar - preoduce 60000 small files burstly 4) sync - flush node pages - flush imeta In this case, this patch also fixes wrong block allocation under large section size. Reported-by: William Brana <wbrana@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: improve definition of statistic macrosArnd Bergmann
With a recent addition of f2fs_lookup_extent_tree(), we get a warning about the use of empty macros: fs/f2fs/extent_cache.c: In function 'f2fs_lookup_extent_tree': fs/f2fs/extent_cache.c:358:32: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body] stat_inc_rbtree_node_hit(sbi); A good way to avoid the warning and make the code more robust is to define all no-op macros as 'do { } while (0)'. Fixes: 54c2258cd63a ("f2fs: extract rb-tree operation infrastructure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reivewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: assign allocation hint for warm/cold dataJaegeuk Kim
This patch gives slower device region to warm/cold data area more eagerly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: fix _IOW usageJaegeuk Kim
This patch fixes wrong _IOW usage. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: add ioctl to flush data from faster device to cold areaJaegeuk Kim
This patch adds an ioctl to flush data in faster device to cold area. User can give device number and number of segments to move. It doesn't move it if there is only one device. The parameter looks like: struct f2fs_flush_device { u32 dev_num; /* device number to flush */ u32 segments; /* # of segments to flush */ }; Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24ext4: Improve comments in ext4_quota_{on|off}()Jan Kara
Improve comments in ext4_quota_{on|off}() to explain that returning success despite ext4_journal_start() failing is deliberate. Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-24fsnotify: remove a stray unlockDan Carpenter
We recently shifted this code around, so we're no longer holding the lock on this path. Fixes: 755b5bc681eb ("fsnotify: Remove indirection from mark list addition") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-24udf: use kmap_atomic for memcpy copyingFabian Frederick
Use temporary mapping for memory copying operations. To avoid any sleeping problem, mark_inode_dirty(inode) was moved after kunmap() in udf_adinicb_readpage() down_write(&iinfo->i_data_sem) set before kmap_atomic() in udf_expand_file_adinicb() Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-24udf: use octal for permissionsFabian Frederick
According to commit f90774e1fd27 ("checkpatch: look for symbolic permissions and suggest octal instead") Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-23Merge tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI/UBIFS fixes from Richard Weinberger: "This contains fixes for issues in both UBI and UBIFS: - more O_TMPFILE fallout - RENAME_WHITEOUT regression due to a mis-merge - memory leak in ubifs_mknod() - power-cut problem in UBI's update volume feature" * tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifs: ubifs: Fix O_TMPFILE corner case in ubifs_link() ubifs: Fix RENAME_WHITEOUT support ubifs: Fix debug messages for an invalid filename in ubifs_dump_inode ubifs: Fix debug messages for an invalid filename in ubifs_dump_node ubifs: Remove filename from debug messages in ubifs_readdir ubifs: Fix memory leak in error path in ubifs_mknod ubi/upd: Always flush after prepared for an update