summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-05-23mnt: In umount propagation reparent in a separate passEric W. Biederman
It was observed that in some pathlogical cases that the current code does not unmount everything it should. After investigation it was determined that the issue is that mnt_change_mntpoint can can change which mounts are available to be unmounted during mount propagation which is wrong. The trivial reproducer is: $ cat ./pathological.sh mount -t tmpfs test-base /mnt cd /mnt mkdir 1 2 1/1 mount --bind 1 1 mount --make-shared 1 mount --bind 1 2 mount --bind 1/1 1/1 mount --bind 1/1 1/1 echo grep test-base /proc/self/mountinfo umount 1/1 echo grep test-base /proc/self/mountinfo $ unshare -Urm ./pathological.sh The expected output looks like: 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 The output without the fix looks like: 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 That last mount in the output was in the propgation tree to be unmounted but was missed because the mnt_change_mountpoint changed it's parent before the walk through the mount propagation tree observed it. Cc: stable@vger.kernel.org Fixes: 1064f874abc0 ("mnt: Tuck mounts under others instead of creating shadow/side mounts.") Acked-by: Andrei Vagin <avagin@virtuozzo.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-05-21ext4: keep existing extra fields when inode expandsKonstantin Khlebnikov
ext4_expand_extra_isize() should clear only space between old and new size. Fixes: 6dd4ee7cab7e # v2.6.23 Cc: stable@vger.kernel.org Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-21ext4: handle the rest of ext4_mb_load_buddy() ENOMEM errorsKonstantin Khlebnikov
I've got another report about breaking ext4 by ENOMEM error returned from ext4_mb_load_buddy() caused by memory shortage in memory cgroup. This time inside ext4_discard_preallocations(). This patch replaces ext4_error() with ext4_warning() where errors returned from ext4_mb_load_buddy() are not fatal and handled by caller: * ext4_mb_discard_group_preallocations() - called before generating ENOSPC, we'll try to discard other group or return ENOSPC into user-space. * ext4_trim_all_free() - just stop trimming and return ENOMEM from ioctl. Some callers cannot handle errors, thus __GFP_NOFAIL is used for them: * ext4_discard_preallocations() * ext4_mb_discard_lg_preallocations() Fixes: adb7ef600cc9 ("ext4: use __GFP_NOFAIL in ext4_free_blocks()") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-21ext4: fix off-by-in in loop termination in ext4_find_unwritten_pgoff()Jan Kara
There is an off-by-one error in loop termination conditions in ext4_find_unwritten_pgoff() since 'end' may index a page beyond end of desired range if 'endoff' is page aligned. It doesn't have any visible effects but still it is good to fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-21ext4: fix SEEK_HOLEJan Kara
Currently, SEEK_HOLE implementation in ext4 may both return that there's a hole at some offset although that offset already has data and skip some holes during a search for the next hole. The first problem is demostrated by: xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file wrote 57344/57344 bytes at offset 0 56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec) Whence Result HOLE 0 Where we can see that SEEK_HOLE wrongly returned offset 0 as containing a hole although we have written data there. The second problem can be demonstrated by: xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k" -c "seek -h 0" file wrote 57344/57344 bytes at offset 0 56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec) wrote 8192/8192 bytes at offset 131072 8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec) Whence Result HOLE 139264 Where we can see that hole at offsets 56k..128k has been ignored by the SEEK_HOLE call. The underlying problem is in the ext4_find_unwritten_pgoff() which is just buggy. In some cases it fails to update returned offset when it finds a hole (when no pages are found or when the first found page has higher index than expected), in some cases conditions for detecting hole are just missing (we fail to detect a situation where indices of returned pages are not contiguous). Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page indices and also handle all cases where we got less pages then expected in one place and handle it properly there. CC: stable@vger.kernel.org Fixes: c8c0df241cc2719b1262e627f999638411934f60 CC: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-21jbd2: preserve original nofs flag during journal restartTahsin Erdogan
When a transaction starts, start_this_handle() saves current PF_MEMALLOC_NOFS value so that it can be restored at journal stop time. Journal restart is a special case that calls start_this_handle() without stopping the transaction. start_this_handle() isn't aware that the original value is already stored so it overwrites it with current value. For instance, a call sequence like below leaves PF_MEMALLOC_NOFS flag set at the end: jbd2_journal_start() jbd2__journal_restart() jbd2_journal_stop() Make jbd2__journal_restart() restore the original value before calling start_this_handle(). Fixes: 81378da64de6 ("jbd2: mark the transaction context with the scope GFP_NOFS context") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2017-05-21ext4: clear lockdep subtype for quota files on quota offJan Kara
Quota files have special ranking of i_data_sem lock. We inform lockdep about it when turning on quotas however when turning quotas off, we don't clear the lockdep subclass from i_data_sem lock and thus when the inode gets later reused for a normal file or directory, lockdep gets confused and complains about possible deadlocks. Fix the problem by resetting lockdep subclass of i_data_sem on quota off. Cc: stable@vger.kernel.org Fixes: daf647d2dd58cec59570d7698a45b98e580f2076 Reported-and-tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-20Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A small collection of fixes that should go into this cycle. - a pull request from Christoph for NVMe, which ended up being manually applied to avoid pulling in newer bits in master. Mostly fibre channel fixes from James, but also a few fixes from Jon and Vijay - a pull request from Konrad, with just a single fix for xen-blkback from Gustavo. - a fuseblk bdi fix from Jan, fixing a regression in this series with the dynamic backing devices. - a blktrace fix from Shaohua, replacing sscanf() with kstrtoull(). - a request leak fix for drbd from Lars, fixing a regression in the last series with the kref changes. This will go to stable as well" * 'for-linus' of git://git.kernel.dk/linux-block: nvmet: release the sq ref on rdma read errors nvmet-fc: remove target cpu scheduling flag nvme-fc: stop queues on error detection nvme-fc: require target or discovery role for fc-nvme targets nvme-fc: correct port role bits nvme: unmap CMB and remove sysfs file in reset path blktrace: fix integer parse fuseblk: Fix warning in super_setup_bdi_name() block: xen-blkback: add null check to avoid null pointer dereference drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()
2017-05-19Merge branch 'libnvdimm-for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A couple of compile fixes. With the removal of the ->direct_access() method from block_device_operations in favor of a new dax_device + dax_operations we broke two configurations. The CONFIG_BLOCK=n case is fixed by compiling out the block+dax helpers in the dax core. Configurations with FS_DAX=n EXT4=y / XFS=y and DAX=m fail due to the helpers the builtin filesystem needs being in a module, so we stub out the helpers in the FS_DAX=n case." * 'libnvdimm-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax, xfs, ext4: compile out iomap-dax paths in the FS_DAX=n case dax: fix false CONFIG_BLOCK dependency
2017-05-19xfs: avoid mount-time deadlock in CoW extent recoveryDarrick J. Wong
If a malicious user corrupts the refcount btree to cause a cycle between different levels of the tree, the next mount attempt will deadlock in the CoW recovery routine while grabbing buffer locks. We can use the ability to re-grab a buffer that was previous locked to a transaction to avoid deadlocks, so do that here. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-05-19ovl: mark upper dir with type origin entries "impure"Amir Goldstein
When moving a merge dir or non-dir with copy up origin into a non-merge upper dir (a.k.a pure upper dir), we are marking the target parent dir "impure". ovl_iterate() iterates pure upper dirs directly, because there is no need to filter out whiteouts and merge dir content with lower dir. But for the case of an "impure" upper dir, ovl_iterate() will not be able to iterate the real upper dir directly, because it will need to lookup the origin inode and use it to fill d_ino. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-19ovl: remove unused arg from ovl_lookup_temp()Miklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-19ovl: handle rename when upper doesn't support xattrAmir Goldstein
On failure to set opaque/redirect xattr on rename, skip setting xattr and return -EXDEV. On failure to set opaque xattr when creating a new directory, -EIO is returned instead of -EOPNOTSUPP. Any failure to set those xattr will be recorded in super block and then setting any xattr on upper won't be attempted again. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-18Merge remote-tracking branch 'mauro-exp/docbook3' into death-to-docbookJonathan Corbet
Mauro says: This patch series convert the remaining DocBooks to ReST. The first version was originally send as 3 patch series: [PATCH 00/36] Convert DocBook documents to ReST [PATCH 0/5] Convert more books to ReST [PATCH 00/13] Get rid of DocBook The lsm book was added as if it were a text file under Documentation. The plan is to merge it with another file under Documentation/security, after both this series and a security Documentation patch series gets merged. It also adjusts some Sphinx-pedantic errors/warnings on some kernel-doc markups. I also added some patches here to add PDF output for all existing ReST books.
2017-05-18ovl: don't fail copy-up if upper doesn't support xattrMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-18ovl: check on mount time if upper fs supports setting xattrAmir Goldstein
xattr are needed by overlayfs for setting opaque dir, redirect dir and copy up origin. Check at mount time by trying to set the overlay.opaque xattr on the workdir and if that fails issue a warning message. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-18ovl: fix creds leak in copy up error pathAmir Goldstein
Fixes: 42f269b92540 ("ovl: rearrange code in ovl_copy_up_locked()") Cc: <stable@vger.kernel.org> # v4.11 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-17fuseblk: Fix warning in super_setup_bdi_name()Jan Kara
Commit 5f7f7543f52e "fuse: Convert to separately allocated bdi" didn't properly handle fuseblk filesystem. When fuse_bdi_init() is called for that filesystem type, sb->s_bdi is already initialized (by set_bdev_super()) to point to block device's bdi and consequently super_setup_bdi_name() complains about this fact when reseting bdi to the private one. Fix the problem by properly dropping bdi reference in fuse_bdi_init() before creating a private bdi in super_setup_bdi_name(). Fixes: 5f7f7543f52e ("fuse: Convert to separately allocated bdi") Reported-by: Rakesh Pandit <rakesh@tuxera.com> Tested-by: Rakesh Pandit <rakesh@tuxera.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-16f2fs: sanity check checkpoint segno and blkoffJin Qian
Make sure segno and blkoff read from raw image are valid. Cc: stable@vger.kernel.org Signed-off-by: Jin Qian <jinqian@google.com> [Jaegeuk Kim: adjust minor coding style] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-16nfsd: Revert "nfsd: check for oversized NFSv2/v3 arguments"J. Bruce Fields
This reverts commit 51f567777799 "nfsd: check for oversized NFSv2/v3 arguments", which breaks support for NFSv3 ACLs. That patch was actually an earlier draft of a fix for the problem that was eventually fixed by e6838a29ecb "nfsd: check for oversized NFSv2/v3 arguments". But somehow I accidentally left this earlier draft in the branch that was part of my 2.12 pull request. Reported-by: Eryu Guan <eguan@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-05-16xfs: only return detailed fsmap info if the caller has CAP_SYS_ADMINDarrick J. Wong
There were a number of handwaving complaints that one could "possibly" use inode numbers and extent maps to fingerprint a filesystem hosting multiple containers and somehow use the information to guess at the contents of other containers and attack them. Despite the total lack of any demonstration that this is actually possible, it's easier to restrict access now and broaden it later, so use the rmapbt fsmap backends only if the caller has CAP_SYS_ADMIN. Unprivileged users will just have to make do with only getting the free space and static metadata placement information. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2017-05-16xfs: bad assertion for delalloc an extent that start at i_sizeZorro Lang
By run fsstress long enough time enough in RHEL-7, I find an assertion failure (harder to reproduce on linux-4.11, but problem is still there): XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c The assertion is in xfs_getbmap() funciton: if (map[i].br_startblock == DELAYSTARTBLOCK && --> map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip))) ASSERT((iflags & BMV_IF_DELALLOC) != 0); When map[i].br_startoff == XFS_B_TO_FSB(mp, XFS_ISIZE(ip)), the startoff is just at EOF. But we only need to make sure delalloc extents that are within EOF, not include EOF. Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-16xfs: fix warnings about unused stack variablesDarrick J. Wong
Reduce stack usage and get rid of compiler warnings by eliminating unused variables. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2017-05-16xfs: BMAPX shouldn't barf on inline-format directoriesDarrick J. Wong
When we're fulfilling a BMAPX request, jump out early if the data fork is in local format. This prevents us from hitting a debugging check in bmapi_read and barfing errors back to userspace. The on-disk extent count check later isn't sufficient for IF_DELALLOC mode because da extents are in memory and not on disk. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-05-16xfs: fix indlen accounting error on partial delalloc conversionBrian Foster
The delalloc -> real block conversion path uses an incorrect calculation in the case where the middle part of a delalloc extent is being converted. This is documented as a rare situation because XFS generally attempts to maximize contiguity by converting as much of a delalloc extent as possible. If this situation does occur, the indlen reservation for the two new delalloc extents left behind by the conversion of the middle range is calculated and compared with the original reservation. If more blocks are required, the delta is allocated from the global block pool. This delta value can be characterized as the difference between the new total requirement (temp + temp2) and the currently available reservation minus those blocks that have already been allocated (startblockval(PREV.br_startblock) - allocated). The problem is that the current code does not account for previously allocated blocks correctly. It subtracts the current allocation count from the (new - old) delta rather than the old indlen reservation. This means that more indlen blocks than have been allocated end up stashed in the remaining extents and free space accounting is broken as a result. Fix up the calculation to subtract the allocated block count from the original extent indlen and thus correctly allocate the reservation delta based on the difference between the new total requirement and the unused blocks from the original reservation. Also remove a bogus assert that contradicts the fact that the new indlen reservation can be larger than the original indlen reservation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-16btrfs: fix incorrect error return ret being passed to mapping_set_errorColin Ian King
The setting of return code ret should be based on the error code passed into function end_extent_writepage and not on ret. Thanks to Liu Bo for spotting this mistake in the original fix I submitted. Detected by CoverityScan, CID#1414312 ("Logically dead code") Fixes: 5dca6eea91653e ("Btrfs: mark mapping with error flag to report errors to userspace") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-05-16btrfs: Make flush bios explicitely syncJan Kara
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. CC: David Sterba <dsterba@suse.com> CC: linux-btrfs@vger.kernel.org Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-05-16btrfs: fiemap: Cache and merge fiemap extent before submit it to userQu Wenruo
[BUG] Cycle mount btrfs can cause fiemap to return different result. Like: # mount /dev/vdb5 /mnt/btrfs # dd if=/dev/zero bs=16K count=4 oflag=dsync of=/mnt/btrfs/file # xfs_io -c "fiemap -v" /mnt/btrfs/file /mnt/test/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 25088..25215 128 0x1 # umount /mnt/btrfs # mount /dev/vdb5 /mnt/btrfs # xfs_io -c "fiemap -v" /mnt/btrfs/file /mnt/test/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..31]: 25088..25119 32 0x0 1: [32..63]: 25120..25151 32 0x0 2: [64..95]: 25152..25183 32 0x0 3: [96..127]: 25184..25215 32 0x1 But after above fiemap, we get correct merged result if we call fiemap again. # xfs_io -c "fiemap -v" /mnt/btrfs/file /mnt/test/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 25088..25215 128 0x1 [REASON] Btrfs will try to merge extent map when inserting new extent map. btrfs_fiemap(start=0 len=(u64)-1) |- extent_fiemap(start=0 len=(u64)-1) |- get_extent_skip_holes(start=0 len=64k) | |- btrfs_get_extent_fiemap(start=0 len=64k) | |- btrfs_get_extent(start=0 len=64k) | | Found on-disk (ino, EXTENT_DATA, 0) | |- add_extent_mapping() | |- Return (em->start=0, len=16k) | |- fiemap_fill_next_extent(logic=0 phys=X len=16k) | |- get_extent_skip_holes(start=0 len=64k) | |- btrfs_get_extent_fiemap(start=0 len=64k) | |- btrfs_get_extent(start=16k len=48k) | | Found on-disk (ino, EXTENT_DATA, 16k) | |- add_extent_mapping() | | |- try_merge_map() | | Merge with previous em start=0 len=16k | | resulting em start=0 len=32k | |- Return (em->start=0, len=32K) << Merged result |- Stripe off the unrelated range (0~16K) of return em |- fiemap_fill_next_extent(logic=16K phys=X+16K len=16K) ^^^ Causing split fiemap extent. And since in add_extent_mapping(), em is already merged, in next fiemap() call, we will get merged result. [FIX] Here we introduce a new structure, fiemap_cache, which records previous fiemap extent. And will always try to merge current fiemap_cache result before calling fiemap_fill_next_extent(). Only when we failed to merge current fiemap extent with cached one, we will call fiemap_fill_next_extent() to submit cached one. So by this method, we can merge all fiemap extents. It can also be done in fs/ioctl.c, however the problem is if fieinfo->fi_extents_max == 0, we have no space to cache previous fiemap extent. So I choose to merge it in btrfs. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-05-16fs: fix the location of the kernel-api bookMauro Carvalho Chehab
The kernel-api book is now part of the core-api. Update its location. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-16fs: update location of filesystems documentationMauro Carvalho Chehab
The filesystem documentation was moved from DocBook to Documentation/filesystems/. Update it at the sources. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-16fs: jbd2: escape a string with special chars on a kernel-docMauro Carvalho Chehab
kernel-doc will try to interpret a foo() string, except if properly escaped. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-16fs: eventfd: fix identation on kernel-docMauro Carvalho Chehab
Sphinx require explicit tags in order to use a list of possible values, otherwise it produces this error: ./fs/eventfd.c:219: WARNING: Option list ends without a blank line; unexpected unindent. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-16fs: add a blank lines on some kernel-doc commentsMauro Carvalho Chehab
Sphinx gets confused when it finds identation without a good reason for it and without a preceding blank line: ./fs/mpage.c:347: ERROR: Unexpected indentation. ./fs/namei.c:4303: ERROR: Unexpected indentation. ./fs/fs-writeback.c:2060: ERROR: Unexpected indentation. No functional changes. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-16fs: jbd2: make jbd2_journal_start() kernel-doc parseableMauro Carvalho Chehab
kernel-doc script expects that a function documentation to be just before the function, otherwise it will be ignored. So, move the kernel-doc markup to the right place. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-15Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "A set of minor cifs fixes" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] Minor cleanup of xattr query function fs: cifs: transport: Use time_after for time comparison SMB2: Fix share type handling cifs: cifsacl: Use a temporary ops variable to reduce code length Don't delay freeing mids when blocked on slow socket write of request CIFS: silence lockdep splat in cifs_relock_file()
2017-05-15nfsd4: const-ify nfsd4_opsChristoph Hellwig
nfsd4_ops contains function pointers, and marking it as constant avoids it being able to be used as an attach vector for code injections. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15sunrpc: mark all struct svc_version instances as constChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-15sunrpc: mark all struct svc_procinfo instances as constChristoph Hellwig
struct svc_procinfo contains function pointers, and marking it as constant avoids it being able to be used as an attach vector for code injections. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15sunrpc: move pc_count out of struct svc_procinfoChristoph Hellwig
pc_count is the only writeable memeber of struct svc_procinfo, which is a good candidate to be const-ified as it contains function pointers. This patch moves it into out out struct svc_procinfo, and into a separate writable array that is pointed to by struct svc_version. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15nfsd4: properly type op_func callbacksChristoph Hellwig
Pass union nfsd4_op_u to the op_func callbacks instead of using unsafe function pointer casts. It also adds two missing structures to struct nfsd4_op.u to facilitate this. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15nfsd4: remove nfsd4op_rsizeChristoph Hellwig
Except for a lot of unnecessary casts this typedef only has one user, so remove the casts and expand it in struct nfsd4_operation. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15nfsd4: properly type op_get_currentstateid callbacksChristoph Hellwig
Pass union nfsd4_op_u to the op_set_currentstateid callbacks instead of using unsafe function pointer casts. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15nfsd4: properly type op_set_currentstateid callbacksChristoph Hellwig
Given the args union in struct nfsd4_op a name, and pass it to the op_set_currentstateid callbacks instead of using unsafe function pointer casts. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15sunrpc: properly type pc_encode callbacksChristoph Hellwig
Drop the resp argument as it can trivially be derived from the rqstp argument. With that all functions now have the same prototype, and we can remove the unsafe casting to kxdrproc_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-15sunrpc: properly type pc_decode callbacksChristoph Hellwig
Drop the argp argument as it can trivially be derived from the rqstp argument. With that all functions now have the same prototype, and we can remove the unsafe casting to kxdrproc_t. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15sunrpc: properly type pc_release callbacksChristoph Hellwig
Drop the p and resp arguments as they are always NULL or can trivially be derived from the rqstp argument. With that all functions now have the same prototype, and we can remove the unsafe casting to kxdrproc_t. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15sunrpc: properly type pc_func callbacksChristoph Hellwig
Drop the argp and resp arguments as they can trivially be derived from the rqstp argument. With that all functions now have the same prototype, and we can remove the unsafe casting to svc_procfunc as well as the svc_procfunc typedef itself. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15nfsd: remove the unused PROC() macro in nfs3proc.cChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15nfsd: use named initializers in PROC()Christoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15nfsd4: const-ify nfs_cb_version4Christoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de>