summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-10-06Merge tag 'for-f2fs-4.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've investigated how f2fs deals with errors given by our fault injection facility. With this, we could fix several corner cases. And, in order to improve the performance, we set inline_dentry by default and enhance the exisiting discard issue flow. In addition, we added f2fs_migrate_page for better memory management. Enhancements: - set inline_dentry by default - improve discard issue flow - add more fault injection cases in f2fs - allow block preallocation for encrypted files - introduce migrate_page callback function - avoid truncating the next direct node block at every checkpoint Bug fixes: - set page flag correctly between write_begin and write_end - missing error handling cases detected by fault injection - preallocate blocks regarding to 4KB alignement correctly - dentry and filename handling of encryption - lost xattrs of directories" * tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (69 commits) f2fs: introduce update_ckpt_flags to clean up f2fs: don't submit irrelevant page f2fs: fix to commit bio cache after flushing node pages f2fs: introduce get_checkpoint_version for cleanup f2fs: remove dead variable f2fs: remove redundant io plug f2fs: support checkpoint error injection f2fs: fix to recover old fault injection config in ->remount_fs f2fs: do fault injection initialization in default_options f2fs: remove redundant value definition f2fs: support configuring fault injection per superblock f2fs: adjust display format of segment bit f2fs: remove dirty inode pages in error path f2fs: do not unnecessarily null-terminate encrypted symlink data f2fs: handle errors during recover_orphan_inodes f2fs: avoid gc in cp_error case f2fs: should put_page for summary page f2fs: assign return value in f2fs_gc f2fs: add customized migrate_page callback f2fs: introduce cp_lock to protect updating of ckpt_flags ...
2016-10-06Merge tag 'pstore-v4.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: - Fix bug in module unloading - Switch to always using spinlock over cmpxchg - Explicitly define pstore backend's supported modes - Remove bounce buffer from pmsg - Switch to using memcpy_to/fromio() - Error checking improvements * tag 'pstore-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: ramoops: move spin_lock_init after kmalloc error checking pstore/ram: Use memcpy_fromio() to save old buffer pstore/ram: Use memcpy_toio instead of memcpy pstore/pmsg: drop bounce buffer pstore/ram: Set pstore flags dynamically pstore: Split pstore fragile flags pstore/core: drop cmpxchg based updates pstore/ramoops: fixup driver removal
2016-10-06Merge tag 'for-linus-4.9-ofs1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Miscellaneous improvements: - clean up debugfs globals - remove dead code in sysfs - reorganize duplicated sysfs attribute structs - consolidate sysfs show and store functions - remove duplicated sysfs_ops structures - describe organization of sysfs - make devreq_mutex static - g_orangefs_stats -> orangefs_stats for consistency - rename most remaining global variables Feature negotiation: - enable Orangefs userspace and kernel module to negotiate mutually supported features" * tag 'for-linus-4.9-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: Revert "orangefs: bump minimum userspace version" orangefs: bump minimum userspace version orangefs: rename most remaining global variables orangefs: g_orangefs_stats -> orangefs_stats for consistency orangefs: make devreq_mutex static orangefs: describe organization of sysfs orangefs: remove duplicated sysfs_ops structures orangefs: consolidate sysfs show and store functions orangefs: reorganize duplicated sysfs attribute structs orangefs: remove dead code in sysfs orangefs: clean up debugfs globals orangefs: do not allow client readahead cache without feature bit orangefs: add features op orangefs: record userspace version for feature compatbility orangefs: add readahead count and size to sysfs orangefs: re-add flush_racache from out-of-tree orangefs: turn param response value into union orangefs: add missing param request ops orangefs: rename remaining bits of mmap readahead cache
2016-10-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "This set of changes is a number of smaller things that have been overlooked in other development cycles focused on more fundamental change. The devpts changes are small things that were a distraction until we managed to kill off DEVPTS_MULTPLE_INSTANCES. There is an trivial regression fix to autofs for the unprivileged mount changes that went in last cycle. A pair of ioctls has been added by Andrey Vagin making it is possible to discover the relationships between namespaces when referring to them through file descriptors. The big user visible change is starting to add simple resource limits to catch programs that misbehave. With namespaces in general and user namespaces in particular allowing users to use more kinds of resources, it has become important to have something to limit errant programs. Because the purpose of these limits is to catch errant programs the code needs to be inexpensive to use as it always on, and the default limits need to be high enough that well behaved programs on well behaved systems don't encounter them. To this end, after some review I have implemented per user per user namespace limits, and use them to limit the number of namespaces. The limits being per user mean that one user can not exhause the limits of another user. The limits being per user namespace allow contexts where the limit is 0 and security conscious folks can remove from their threat anlysis the code used to manage namespaces (as they have historically done as it root only). At the same time the limits being per user namespace allow other parts of the system to use namespaces. Namespaces are increasingly being used in application sand boxing scenarios so an all or nothing disable for the entire system for the security conscious folks makes increasing use of these sandboxes impossible. There is also added a limit on the maximum number of mounts present in a single mount namespace. It is nontrivial to guess what a reasonable system wide limit on the number of mount structure in the kernel would be, especially as it various based on how a system is using containers. A limit on the number of mounts in a mount namespace however is much easier to understand and set. In most cases in practice only about 1000 mounts are used. Given that some autofs scenarious have the potential to be 30,000 to 50,000 mounts I have set the default limit for the number of mounts at 100,000 which is well above every known set of users but low enough that the mount hash tables don't degrade unreaonsably. These limits are a start. I expect this estabilishes a pattern that other limits for resources that namespaces use will follow. There has been interest in making inotify event limits per user per user namespace as well as interest expressed in making details about what is going on in the kernel more visible" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (28 commits) autofs: Fix automounts by using current_real_cred()->uid mnt: Add a per mount namespace limit on the number of mounts netns: move {inc,dec}_net_namespaces into #ifdef nsfs: Simplify __ns_get_path tools/testing: add a test to check nsfs ioctl-s nsfs: add ioctl to get a parent namespace nsfs: add ioctl to get an owning user namespace for ns file descriptor kernel: add a helper to get an owning user namespace for a namespace devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts devpts: Remove sync_filesystems devpts: Make devpts_kill_sb safe if fsi is NULL devpts: Simplify devpts_mount by using mount_nodev devpts: Move the creation of /dev/pts/ptmx into fill_super devpts: Move parse_mount_options into fill_super userns: When the per user per user namespace limit is reached return ENOSPC userns; Document per user per user namespace limits. mntns: Add a limit on the number of mount namespaces. netns: Add a limit on the number of net namespaces cgroupns: Add a limit on the number of cgroup namespaces ipcns: Add a limit on the number of ipc namespaces ...
2016-10-06Merge tag 'xfs-for-linus-4.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs and iomap updates from Dave Chinner: "The main things in this update are the iomap-based DAX infrastructure, an XFS delalloc rework, and a chunk of fixes to how log recovery schedules writeback to prevent spurious corruption detections when recovery of certain items was not required. The other main chunk of code is some preparation for the upcoming reflink functionality. Most of it is generic and cleanups that stand alone, but they were ready and reviewed so are in this pull request. Speaking of reflink, I'm currently planning to send you another pull request next week containing all the new reflink functionality. I'm working through a similar process to the last cycle, where I sent the reverse mapping code in a separate request because of how large it was. The reflink code merge is even bigger than reverse mapping, so I'll be doing the same thing again.... Summary for this update: - change of XFS mailing list to linux-xfs@vger.kernel.org - iomap-based DAX infrastructure w/ XFS and ext2 support - small iomap fixes and additions - more efficient XFS delayed allocation infrastructure based on iomap - a rework of log recovery writeback scheduling to ensure we don't fail recovery when trying to replay items that are already on disk - some preparation patches for upcoming reflink support - configurable error handling fixes and documentation - aio access time update race fixes for XFS and generic_file_read_iter" * tag 'xfs-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (40 commits) fs: update atime before I/O in generic_file_read_iter xfs: update atime before I/O in xfs_file_dio_aio_read ext2: fix possible integer truncation in ext2_iomap_begin xfs: log recovery tracepoints to track current lsn and buffer submission xfs: update metadata LSN in buffers during log recovery xfs: don't warn on buffers not being recovered due to LSN xfs: pass current lsn to log recovery buffer validation xfs: rework log recovery to submit buffers on LSN boundaries xfs: quiesce the filesystem after recovery on readonly mount xfs: remote attribute blocks aren't really userdata ext2: use iomap to implement DAX ext2: stop passing buffer_head to ext2_get_blocks xfs: use iomap to implement DAX xfs: refactor xfs_setfilesize xfs: take the ilock shared if possible in xfs_file_iomap_begin xfs: fix locking for DAX writes dax: provide an iomap based fault handler dax: provide an iomap based dax read/write path dax: don't pass buffer_head to copy_user_dax dax: don't pass buffer_head to dax_insert_mapping ...
2016-10-06Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM updates from Russell King: - Correct ARMs dma-mapping to use the correct printk format strings. - Avoid defining OBJCOPYFLAGS globally which upsets lkdtm rodata testing. - Cleanups to ARMs asm/memory.h include. - L2 cache cleanups. - Allow flat nommu binaries to be executed on ARM MMU systems. - Kernel hardening - add more read-only after init annotations, including making some kernel vdso variables const. - Ensure AMBA primecell clocks are appropriately defaulted. - ARM breakpoint cleanup. - Various StrongARM 11x0 and companion chip (SA1111) updates to bring this legacy platform to use more modern APIs for (eg) GPIOs and interrupts, which will allow us in the future to reduce some of the board-level driver clutter and elimate function callbacks into board code via platform data. There still appears to be interest in these platforms! - Remove the now redundant secure_flush_area() API. - Module PLT relocation optimisations. Ard says: This series of 4 patches optimizes the ARM PLT generation code that is invoked at module load time, to get rid of the O(n^2) algorithm that results in pathological load times of 10 seconds or more for large modules on certain STB platforms. - ARMv7M cache maintanence support. - L2 cache PMU support * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (35 commits) ARM: sa1111: provide to_sa1111_device() macro ARM: sa1111: add sa1111_get_irq() ARM: sa1111: clean up duplication in IRQ chip implementation ARM: sa1111: implement a gpio_chip for SA1111 GPIOs ARM: sa1111: move irq cleanup to separate function ARM: sa1111: use devm_clk_get() ARM: sa1111: use devm_kzalloc() ARM: sa1111: ensure we only touch RAB bus type devices when removing ARM: 8611/1: l2x0: add PMU support ARM: 8610/1: V7M: Add dsb before jumping in handler mode ARM: 8609/1: V7M: Add support for the Cortex-M7 processor ARM: 8608/1: V7M: Indirect proc_info construction for V7M CPUs ARM: 8607/1: V7M: Wire up caches for V7M processors with cache support. ARM: 8606/1: V7M: introduce cache operations ARM: 8605/1: V7M: fix notrace variant of save_and_disable_irqs ARM: 8604/1: V7M: Add support for reading the CTR with read_cpuid_cachetype() ARM: 8603/1: V7M: Add addresses for mem-mapped V7M cache operations ARM: 8602/1: factor out CSSELR/CCSIDR operations that use cp15 directly ARM: kernel: avoid brute force search on PLT generation ARM: kernel: sort relocation sections before allocating PLTs ...
2016-10-06exportfs: be careful to only return expected errors.NeilBrown
When nfsd calls fh_to_dentry, it expect ESTALE or ENOMEM as errors. In particular it can be tempting to return ENOENT, but this is not handled well by nfsd. Rather than requiring strict adherence to error code code filesystems, treat all unexpected error codes the same as ESTALE. This is safest. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-06Merge branches 'misc' and 'sa1111-base' into for-linusRussell King
2016-10-06afs: Check for fatal error when in waiting for ack stateDavid Howells
When it's in the waiting-for-ACK state, the AFS filesystem needs to check the result of rxrpc_kernel_recv_data() any time it is notified to see if it is indicating a fatal error. If this is the case, it needs to mark the call completed otherwise the call just sits there and never goes away. Signed-off-by: David Howells <dhowells@redhat.com>
2016-10-05xfs: implement swapext for rmap filesystemsDarrick J. Wong
Implement swapext for filesystems that have reverse mapping. Back in the reflink patches, we augmented the bmap code with a 'REMAP' flag that updates only the bmbt and doesn't touch the allocator and implemented log redo items for those two operations. Now we can rewrite extent swapping as a (looong) series of remap operations. This is far less efficient than the fork swapping method implemented in the past, so we only switch this on for rmap. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: refactor swapext codeDarrick J. Wong
Refactor the swapext function to pull out the fork swapping piece into a separate function. In the next patch we'll add in the bit we need to make it work with rmap filesystems. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: various swapext cleanupsDarrick J. Wong
Replace structure typedefs with struct expressions and fix some whitespace issues that result. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: recognize the reflink feature bitDarrick J. Wong
Add the reflink feature flag to the set of recognized feature flags. This enables users to write to reflink filesystems. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: simulate per-AG reservations being critically lowDarrick J. Wong
Create an error injection point that enables us to simulate being critically low on per-AG block reservations. This should enable us to simulate this specific ENOSPC condition so that we can test falling back to a regular file copy. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: don't mix reflink and DAX mode for nowDarrick J. Wong
Since we don't have a strategy for handling both DAX and reflink, for now we'll just prohibit both being set at the same time. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: check for invalid inode reflink flagsDarrick J. Wong
We don't support sharing blocks on the realtime device. Flag inodes with the reflink or cowextsize flags set when the reflink feature is disabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: set a default CoW extent size of 32 blocksDarrick J. Wong
If the admin doesn't set a CoW extent size or a regular extent size hint, default to creating CoW reservations 32 blocks long to reduce fragmentation. Signed-off-by: DarricK J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: convert unwritten status of reverse mappings for shared filesDarrick J. Wong
Provide a function to convert an unwritten extent to a real one and vice versa when shared extents are possible. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: use interval query for rmap alloc operations on shared filesDarrick J. Wong
When it's possible for reverse mappings to overlap (data fork extents of files on reflink filesystems), use the interval query function to find the left neighbor of an extent we're trying to add; and be careful to use the lookup functions to update the neighbors and/or add new extents. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: add shared rmap map/unmap/convert log item typesDarrick J. Wong
Wire up some rmap log redo item type codes to map, unmap, or convert shared data block extents. The actual log item recovery comes in a later patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: increase log reservations for reflinkDarrick J. Wong
Increase the log reservations to handle the increased rolling that happens at the end of copy-on-write operations. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: garbage collect old cowextsz reservationsDarrick J. Wong
Trim CoW reservations made on behalf of a cowextsz hint if they get too old or we run low on quota, so long as we don't have dirty data awaiting writeback or directio operations in progress. Garbage collection of the cowextsize extents are kept separate from prealloc extent reaping because setting the CoW prealloc lifetime to a (much) higher value than the regular prealloc extent lifetime has been useful for combatting CoW fragmentation on VM hosts where the VMs experience bursty write behaviors and we can keep the utilization ratios low enough that we don't start to run out of space. IOWs, it benefits us to keep the CoW fork reservations around for as long as we can unless we run out of blocks or hit inode reclaim. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: try other AGs to allocate a BMBT blockDarrick J. Wong
Prior to the introduction of reflink, allocating a block and mapping it into a file was performed in a single transaction with a single block reservation, and the allocator was supposed to find enough blocks to allocate the extent and any BMBT blocks that might be necessary (unless we're low on space). However, due to the way copy on write works, allocation and mapping have been split into two transactions, which means that we must be able to handle the case where we allocate an extent for CoW but that AG runs out of free space before the blocks can be mapped into a file, and the mapping requires a new BMBT block. When this happens, look in one of the other AGs for a BMBT block instead of taking the FS down. The same applies to the functions that convert a data fork to extents and later btree format. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: don't allow reflink when the AG is low on spaceDarrick J. Wong
If the AG free space is down to the reserves, refuse to reflink our way out of space. Hopefully userspace will make a real copy and/or go elsewhere. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: preallocate blocks for worst-case btree expansionDarrick J. Wong
To gracefully handle the situation where a CoW operation turns a single refcount extent into a lot of tiny ones and then run out of space when a tree split has to happen, use the per-AG reserved block pool to pre-allocate all the space we'll ever need for a maximal btree. For a 4K block size, this only costs an overhead of 0.3% of available disk space. When reflink is enabled, we have an unfortunate problem with rmap -- since we can share a block billions of times, this means that the reverse mapping btree can expand basically infinitely. When an AG is so full that there are no free blocks with which to expand the rmapbt, the filesystem will shut down hard. This is rather annoying to the user, so use the AG reservation code to reserve a "reasonable" amount of space for rmap. We'll prevent reflinks and CoW operations if we think we're getting close to exhausting an AG's free space rather than shutting down, but this permanent reservation should be enough for "most" users. Hopefully. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [hch@lst.de: ensure that we invalidate the freed btree buffer] Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: create a separate cow extent size hint for the allocatorDarrick J. Wong
Create a per-inode extent size allocator hint for copy-on-write. This hint is separate from the existing extent size hint so that CoW can take advantage of the fragmentation-reducing properties of extent size hints without disabling delalloc for regular writes. The extent size hint that's fed to the allocator during a copy on write operation is the greater of the cowextsize and regular extsize hint. During reflink, if we're sharing the entire source file to the entire destination file and the destination file doesn't already have a cowextsize hint, propagate the source file's cowextsize hint to the destination file. Furthermore, zero the bulkstat buffer prior to setting the fields so that we don't copy kernel memory contents into userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: unshare a range of blocks via fallocateDarrick J. Wong
Unshare all shared extents if the user calls fallocate with the new unshare mode flag set, so that we can guarantee that a subsequent write will not ENOSPC. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [hch: pass inode instead of file to xfs_reflink_dirty_range, use iomap infrastructure for copy up] Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: swap inode reflink flags when swapping inode extentsDarrick J. Wong
When we're swapping the extents of two inodes, be sure to swap the reflink inode flag too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: teach get_bmapx about shared extents and the CoW forkDarrick J. Wong
Teach xfs_getbmapx how to report shared extents and CoW fork contents accurately in the bmap output by querying the refcount btree appropriately. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: add dedupe range vfs functionDarrick J. Wong
Define a VFS function which allows userspace to request that the kernel reflink a range of blocks between two files if the ranges' contents match. The function fits the new VFS ioctl that standardizes the checking for the btrfs EXTENT SAME ioctl. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: add clone file and clone range vfs functionsDarrick J. Wong
Define two VFS functions which allow userspace to reflink a range of blocks between two files or to reflink one file's contents to another. These functions fit the new VFS ioctls that standardize the checking for the btrfs CLONE and CLONE RANGE ioctls. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: reflink extents from one file to anotherDarrick J. Wong
Reflink extents from one file to another; that is to say, iteratively remove the mappings from the destination file, copy the mappings from the source file to the destination file, and increment the reference count of all the blocks that got remapped. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: store in-progress CoW allocations in the refcount btreeDarrick J. Wong
Due to the way the CoW algorithm in XFS works, there's an interval during which blocks allocated to handle a CoW can be lost -- if the FS goes down after the blocks are allocated but before the block remapping takes place. This is exacerbated by the cowextsz hint -- allocated reservations can sit around for a while, waiting to get used. Since the refcount btree doesn't normally store records with refcount of 1, we can use it to record these in-progress extents. In-progress blocks cannot be shared because they're not user-visible, so there shouldn't be any conflicts with other programs. This is a better solution than holding EFIs during writeback because (a) EFIs can't be relogged currently, (b) even if they could, EFIs are bound by available log space, which puts an unnecessary upper bound on how much CoW we can have in flight, and (c) we already have a mechanism to track blocks. At mount time, read the refcount records and free anything we find with a refcount of 1 because those were in-progress when the FS went down. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: cancel pending CoW reservations when destroying inodesDarrick J. Wong
When destroying the inode, cancel all pending reservations in the CoW fork so that all the reserved blocks go back to the free pile. In theory this sort of cleanup is only needed to clean up after write errors. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: cancel CoW reservations and clear inode reflink flag when freeing blocksDarrick J. Wong
When we're freeing blocks (truncate, punch, etc.), clear all CoW reservations in the range being freed. If the file block count drops to zero, also clear the inode reflink flag. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: implement CoW for directio writesDarrick J. Wong
For O_DIRECT writes to shared blocks, we have to CoW them just like we would with buffered writes. For writes that are not block-aligned, just bounce them to the page cache. For block-aligned writes, however, we can do better than that. Use the same mechanisms that we employ for buffered CoW to set up a delalloc reservation, allocate all the blocks at once, issue the writes against the new blocks and use the same ioend functions to remap the blocks after the write. This should be fairly performant. Christoph discovered that xfs_reflink_allocate_cow_range may stumble over invalid entries in the extent array given that it drops the ilock but still expects the index to be stable. Simple fixing it to a new lookup for every iteration still isn't correct given that xfs_bmapi_allocate will trigger a BUG_ON() if hitting a hole, and there is nothing preventing a xfs_bunmapi_cow call removing extents once we dropped the ilock either. This patch duplicates the inner loop of xfs_bmapi_allocate into a helper for xfs_reflink_allocate_cow_range so that it can be done under the same ilock critical section as our CoW fork delayed allocation. The directio CoW warts will be revisited in a later patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: report shared extent mappings to userspace correctlyDarrick J. Wong
Report shared extents through the iomap interface so that FIEMAP flags shared blocks accurately. Have xfs_vm_bmap return zero for reflinked files because the bmap-based swap code requires static block mappings, which is incompatible with copy on write. NOTE: Existing userspace bmap users such as lilo will have the same problem with reflink files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-10-05proc: switch auxv to use of __mem_open()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05hpfs: support FIEMAPMikulas Patocka
Support the FIEMAP ioctl that reports extents allocated by a file. Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05pipe: add pipe_buf_steal() helperMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05pipe: add pipe_buf_confirm() helperMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05pipe: add pipe_buf_release() helperMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05pipe: add pipe_buf_get() helperMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05switch default_file_splice_read() to use of pipe-backed iov_iterAl Viro
we only use iov_iter_get_pages_alloc() and iov_iter_advance() - pages are filled by kernel_readv() via a kvec array (as we used to do all along), so iov_iter here is used only as a way of arranging for those pages to be in pipe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05switch generic_file_splice_read() to use of ->read_iter()Al Viro
... and kill the ->splice_read() instances that can be switched to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05new iov_iter flavour: pipe-backedAl Viro
iov_iter variant for passing data into pipe. copy_to_iter() copies data into page(s) it has allocated and stuffs them into the pipe; copy_page_to_iter() stuffs there a reference to the page given to it. Both will try to coalesce if possible. iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages() and friends will do as copy_to_iter() would have and return the pages where the data would've been copied. iov_iter_advance() will truncate everything past the spot it has advanced to. New primitive: iov_iter_pipe(), used for initializing those. pipe should be locked all along. Running out of space acts as fault would for iovec-backed ones; in other words, giving it to ->read_iter() may result in short read if the pipe overflows, or -EFAULT if it happens with nothing copied there. In other words, ->read_iter() on those acts pretty much like ->splice_read(). Moreover, all generic_file_splice_read() users, as well as many other ->splice_read() instances can be switched to that scheme - that'll happen in the next commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05xfs: move mappings from cow fork to data fork after copy-writeDarrick J. Wong
After the write component of a copy-write operation finishes, clean up the bookkeeping left behind. On error, we simply free the new blocks and pass the error up. If we succeed, however, then we must remove the old data fork mapping and move the cow fork mapping to the data fork. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [hch: Call the CoW failure function during xfs_cancel_ioend] Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: support removing extents from CoW forkDarrick J. Wong
Create a helper method to remove extents from the CoW fork without any of the side effects (rmapbt/bmbt updates) of the regular extent deletion routine. We'll eventually use this to clear out the CoW fork during ioend processing. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05fs/block_dev.c: return the right error in thaw_bdev()Pierre Morel
When triggering thaw-filesystems via magic sysrq, the system enters a loop in do_thaw_one(), as thaw_bdev() still returns success if bd_fsfreeze_count == 0. To fix this, let thaw_bdev() always return error (and simplify the code a bit at the same time). Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "This adds POSIX ACL permission checking to the fuse kernel module. In addition there are minor bug fixes as well as cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: limit xattr returned size fuse: remove duplicate cs->offset assignment fuse: don't use fuse_ioctl_copy_user() helper fuse_ioctl_copy_user(): don't open-code copy_page_{to,from}_iter() fuse: get rid of fc->flags fuse: use timespec64 fuse: don't use ->d_time fuse: Add posix ACL support fuse: handle killpriv in userspace fs fuse: fix killing s[ug]id in setattr fuse: invalidate dir dentry after chmod fuse: Use generic xattr ops fuse: listxattr: verify xattr list