summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-01-29Merge tag 'upstream-4.16-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI/UBIFS updates from Richard Weinberger: - use the new fscrypt APIs - a fix for a Fastmap issue - other minor bug fixes * tag 'upstream-4.16-rc1' of git://git.infradead.org/linux-ubifs: ubi: block: Fix locking for idr_alloc/idr_remove mtd: ubi: wl: Fix error return code in ubi_wl_init() ubi: Fix copy/paste error in function documentation ubi: Fastmap: Fix typo ubifs: remove error message in ubifs_xattr_get ubi: fastmap: Erase outdated anchor PEBs during attach ubifs: switch to fscrypt_prepare_setattr() ubifs: switch to fscrypt_prepare_lookup() ubifs: switch to fscrypt_prepare_rename() ubifs: switch to fscrypt_prepare_link() ubifs: switch to fscrypt_file_open() ubi: fastmap: Clean up the initialization of pointer p ubi: fastmap: Use kmem_cache_free to deallocate memory ubi: Fix race condition between ubi volume creation and udev mtd: ubi: Use 'max_bad_blocks' to compute bad_peb_limit if available ubifs: Fix uninitialized variable in search_dh_cookie()
2018-01-29Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "This is the main pull request for block IO related changes for the 4.16 kernel. Nothing major in this pull request, but a good amount of improvements and fixes all over the map. This contains: - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and Paolo. - Support for SMR zones for deadline and mq-deadline from Damien and Christoph. - Set of fixes for bcache by way of Michael Lyle, including fixes from himself, Kent, Rui, Tang, and Coly. - Series from Matias for lightnvm with fixes from Hans Holmberg, Javier, and Matias. Mostly centered around pblk, and the removing rrpc 1.2 in preparation for supporting 2.0. - A couple of NVMe pull requests from Christoph. Nothing major in here, just fixes and cleanups, and support for command tracing from Johannes. - Support for blk-throttle for tracking reads and writes separately. From Joseph Qi. A few cleanups/fixes also for blk-throttle from Weiping. - Series from Mike Snitzer that enables dm to register its queue more logically, something that's alwways been problematic on dm since it's a stacked device. - Series from Ming cleaning up some of the bio accessor use, in preparation for supporting multipage bvecs. - Various fixes from Ming closing up holes around queue mapping and quiescing. - BSD partition fix from Richard Narron, fixing a problem where we can't mount newer (10/11) FreeBSD partitions. - Series from Tejun reworking blk-mq timeout handling. The previous scheme relied on atomic bits, but it had races where we would think a request had timed out if it to reused at the wrong time. - null_blk now supports faking timeouts, to enable us to better exercise and test that functionality separately. From me. - Kill the separate atomic poll bit in the request struct. After this, we don't use the atomic bits on blk-mq anymore at all. From me. - sgl_alloc/free helpers from Bart. - Heavily contended tag case scalability improvement from me. - Various little fixes and cleanups from Arnd, Bart, Corentin, Douglas, Eryu, Goldwyn, and myself" * 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits) block: remove smart1,2.h nvme: add tracepoint for nvme_complete_rq nvme: add tracepoint for nvme_setup_cmd nvme-pci: introduce RECONNECTING state to mark initializing procedure nvme-rdma: remove redundant boolean for inline_data nvme: don't free uuid pointer before printing it nvme-pci: Suspend queues after deleting them bsg: use pr_debug instead of hand crafted macros blk-mq-debugfs: don't allow write on attributes with seq_operations set nvme-pci: Fix queue double allocations block: Set BIO_TRACE_COMPLETION on new bio during split blk-throttle: use queue_is_rq_based block: Remove kblockd_schedule_delayed_work{,_on}() blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly() lib/scatterlist: Fix chaining support in sgl_alloc_order() blk-throttle: track read and write request individually block: add bdev_read_only() checks to common helpers block: fail op_is_write() requests to read-only partitions blk-throttle: export io_serviced_recursive, io_service_bytes_recursive ...
2018-01-29btrfs: drop devid as device_list_add() argAnand Jain
As struct btrfs_disk_super is being passed, so it can get devid the same way its parent does. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-29btrfs: get device pointer from device_list_add()Anand Jain
Instead of pointer to btrfs_fs_devices as an arg in device_list_add() better to get pointer to btrfs_device as return value, then we have both, pointer to btrfs_device and btrfs_fs_devices. btrfs_device is needed to handle reappearing missing device. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-29GFS2: Don't try to end a non-existent transaction in unlinkBob Peterson
Before this patch, if function gfs2_unlink failed to get a valid transaction (for example, not enough journal blocks) it would go to label out_end_trans which did gfs2_trans_end. But if the trans_begin failed, there's no transaction to end, and trying to do so results in: kernel BUG at fs/gfs2/trans.c:117! This patch changes the goto so that it does not try to end a non-existent transaction. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-29xfs: remove experimental tag for reflinksChristoph Hellwig
But reject reflink + DAX file systems for now until the code to support reflinks on DAX is actually implemented. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: port to 4.16] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29xfs: don't screw up direct writes when freesp is fragmentedDarrick J. Wong
xfs_bmap_btalloc is given a range of file offset blocks that must be allocated to some data/attr/cow fork. If the fork has an extent size hint associated with it, the request will be enlarged on both ends to try to satisfy the alignment hint. If free space is fragmentated, sometimes we can allocate some blocks but not enough to fulfill any of the requested range. Since bmapi_allocate always trims the new extent mapping to match the originally requested range, this results in bmapi_write returning zero and no mapping. The consequences of this vary -- buffered writes will simply re-call bmapi_write until it can satisfy at least one block from the original request. Direct IO overwrites notice nmaps == 0 and return -ENOSPC through the dio mechanism out to userspace with the weird result that writes fail even when we have enough space because the ENOSPC return overrides any partial write status. For direct CoW writes the situation was disastrous because nobody notices us returning an invalid zero-length wrong-offset mapping to iomap and the write goes off into space. Therefore, if free space is so fragmented that we managed to allocate some space but not enough to map into even a single block of the original allocation request range, we should break the alignment hint in order to guarantee at least some forward progress for the direct write. If we return a short allocation to iomap_apply it'll call back about the remaining blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: check reflink allocation mappingsDarrick J. Wong
There's a really bad bug in xfs_reflink_allocate_cow -- if bmapi_write can return a zero error code but no mappings. This happens if there's an extent size hint (which causes allocation requests to be rounded to extsz granularity internally), but there wasn't a big enough chunk of free space to start filling at the extsz granularity and fill even one block of the range that we actually requested. In any case, if we got no mappings we can't possibly do anything useful with the contents of imap, so we must bail out with ENOSPC here. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29iomap: warn on zero-length mappingsDarrick J. Wong
Don't let the iomap callback get away with feeding us a garbage zero length mapping -- there was a bug in xfs that resulted in those leaking out to hilarious effect. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: treat CoW fork operations as delalloc for quota accountingDarrick J. Wong
Since the CoW fork only exists in memory, it is incorrect to update the on-disk quota block counts when we modify the CoW fork. Unlike the data fork, even real extents in the CoW fork are only delalloc-style reservations (on-disk they're owned by the refcountbt) so they must not be tracked in the on disk quota info. Ensure the i_delayed_blks accounting reflects this too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: only grab shared inode locks for source file during reflinkDarrick J. Wong
Reflink and dedupe operations remap blocks from a source file into a destination file. The destination file needs exclusive locks on all levels because we're updating its block map, but the source file isn't undergoing any block map changes so we can use a shared lock. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modesDarrick J. Wong
Refactor xfs_lock_two_inodes to take separate locking modes for each inode. Specifically, this enables us to take a SHARED lock on one inode and an EXCL lock on the other. The lock class (MMAPLOCK/ILOCK) must be the same for each inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: reflink should break pnfs leases before sharing blocksDarrick J. Wong
Before we share blocks between files, we need to break the pnfs leases on the layout before we start slicing and dicing the block map. The structure of this function sets us up for the lock contention reduction in the next patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: don't clobber inobt/finobt cursors when xref with rmapDarrick J. Wong
Even if we can't use the inobt/finobt cursors to count the number of inode btree blocks, we are never allowed to clobber the cursor of the btree being checked, so don't do this. Found by fuzzing level = ones in xfs/364. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: skip CoW writes past EOF when writeback races with truncateDarrick J. Wong
Every so often we blow the ASSERT(type != XFS_IO_COW) in xfs_map_blocks when running fsstress, as we do in generic/269. The cause of this is writeback racing with truncate -- writeback doesn't take the iolock, so truncate can sneak in to decrease i_size and truncate page cache while writeback is gathering buffer heads to schedule writeout. If we hit this race on a block that has a CoW mapping, we'll get a valid imap from the CoW fork but the reduced i_size trims the mapping to zero length (which makes it invalid), so we call xfs_map_blocks to try again. This doesn't do much anyway, since any mapping we get out of that will also be invalid, so we might as well skip the assert and just stop. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: preserve i_rdev when recycling a reclaimable inodeAmir Goldstein
Commit 66f364649d870 ("xfs: remove if_rdev") moved storing of rdev value for special inodes to VFS inodes, but forgot to preserve the value of i_rdev when recycling a reclaimable xfs_inode. This was detected by xfstest overlay/017 with inodex=on mount option and xfs base fs. The test does a lookup of overlay chardev and blockdev right after drop caches. Overlayfs inodes hold a reference on underlying xfs inodes when mount option index=on is configured. If drop caches reclaim xfs inodes, before it relclaims overlayfs inodes, that can sometimes leave a reclaimable xfs inode and that test hits that case quite often. When that happens, the xfs inode cache remains broken (zere i_rdev) until the next cycle mount or drop caches. Fixes: 66f364649d870 ("xfs: remove if_rdev") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29xfs: refactor accounting updates out of xfs_bmap_btallocDarrick J. Wong
Move all the inode and quota accounting updates out of xfs_bmap_btalloc in preparation for fixing some quota accounting problems with copy on write. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-29xfs: refactor inode verifier corruption error printingDarrick J. Wong
Refactor inode verifier error reporting into a non-libxfs function so that we aren't encoding the message format in libxfs. This also changes the kernel dmesg output to resemble buffer verifier errors more closely. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: make tracepoint inode number format consistentDarrick J. Wong
Fix all the inode number formats to be consistently (0x%llx) in all trace point definitions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: always zero di_flags2 when we free the inodeDarrick J. Wong
Always zero the di_flags2 field when we free the inode so that we never end up with an on-disk record for an unallocated inode that also has the reflink iflag set. This is in keeping with the general principle that only files can have the reflink iflag set, even though we'll zero out di_flags2 if we ever reallocate the inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: call xfs_qm_dqattach before performing reflink operationsDarrick J. Wong
Ensure that we've attached all the necessary dquots before performing reflink operations so that quota accounting is accurate. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: bmap code cleanupShan Hai
Remove the extent size hint and realtime inode relevant code from the xfs_bmapi_reserve_delalloc since it is not called on the inode with extent size hint set or on a realtime inode. Signed-off-by: Shan Hai <shan.hai@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29Use list_head infra-structure for buffer's log items listCarlos Maiolino
Now that buffer's b_fspriv has been split, just replace the current singly linked list of xfs_log_items, by the list_head infrastructure. Also, remove the xfs_log_item argument from xfs_buf_resubmit_failed_buffers(), there is no need for this argument, once the log items can be walked through the list_head in the buffer. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor style cleanups] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29Split buffer's b_fspriv fieldCarlos Maiolino
By splitting the b_fspriv field into two different fields (b_log_item and b_li_list). It's possible to get rid of an old ABI workaround, by using the new b_log_item field to store xfs_buf_log_item separated from the log items attached to the buffer, which will be linked in the new b_li_list field. This way, there is no more need to reorder the log items list to place the buf_log_item at the beginning of the list, simplifying a bit the logic to handle buffer IO. This also opens the possibility to change buffer's log items list into a proper list_head. b_log_item field is still defined as a void *, because it is still used by the log buffers to store xlog_in_core structures, and there is no need to add an extra field on xfs_buf just for xlog_in_core. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor style changes] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29Get rid of xfs_buf_log_item_t typedefCarlos Maiolino
Take advantage of the rework on xfs_buf log items list, to get rid of ths typedef for xfs_buf_log_item. This patch also fix some indentation alignment issues found along the way. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29btrfs: only dirty the inode in btrfs_update_time if something was changedJeff Layton
At this point, we know that "now" and the file times may differ, and we suspect that the i_version has been flagged to be bumped. Attempt to bump the i_version, and only mark the inode dirty if that actually occurred or if one of the times was updated. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
2018-01-29xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementingJeff Layton
If XFS_ILOG_CORE is already set then go ahead and increment it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Chinner <dchinner@redhat.com>
2018-01-29fs: only set S_VERSION when updating times if necessaryJeff Layton
We only really need to update i_version if someone has queried for it since we last incremented it. By doing that, we can avoid having to update the inode if the times haven't changed. If the times have changed, then we go ahead and forcibly increment the counter, under the assumption that we'll be going to the storage anyway, and the increment itself is relatively cheap. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
2018-01-29xfs: convert to new i_version APIJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Chinner <dchinner@redhat.com>
2018-01-29ufs: use new i_version APIJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-01-29ocfs2: convert to new i_version APIJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
2018-01-29nfsd: convert to new i_version APIJeff Layton
Mostly just making sure we use the "get" wrappers so we know when it is being fetched for later use. Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-01-29nfs: convert to new i_version APIJeff Layton
For NFS, we just use the "raw" API since the i_version is mostly managed by the server. The exception there is when the client holds a write delegation, but we only need to bump it once there anyway to handle CB_GETATTR. Tested-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-01-29ext4: convert to new i_version APIJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Theodore Ts'o <tytso@mit.edu>
2018-01-29ext2: convert to new i_version APIJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
2018-01-29exofs: switch to new i_version APIJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-01-29btrfs: convert to new i_version APIJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: David Sterba <dsterba@suse.com>
2018-01-29afs: convert to new i_version APIJeff Layton
For AFS, it's generally treated as an opaque value, so we use the *_raw variants of the API here. Note that AFS has quite a different definition for this counter. AFS only increments it on changes to the data to the data in regular files and contents of the directories. Inode metadata changes do not result in a version increment. We'll need to reconcile that somehow if we ever want to present this to userspace via statx. Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-01-29affs: convert to new i_version APIJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-01-29fat: convert to new i_version APIJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-01-29fs: new API for handling inode->i_versionJeff Layton
Add a documentation blob that explains what the i_version field is, how it is expected to work, and how it is currently implemented by various filesystems. We already have inode_inc_iversion. Add several other functions for manipulating and accessing the i_version counter. For now, the implementation is trivial and basically works the way that all of the open-coded i_version accesses work today. Future patches will convert existing users of i_version to use the new API, and then convert the backend implementation to do things more efficiently. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
2018-01-28NFS: Fix a race between mmap() and O_DIRECTTrond Myklebust
When locking the file in order to do O_DIRECT on it, we must unmap any mmapped ranges on the pagecache so that we can flush out the dirty data. Fixes: a5864c999de67 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.8+
2018-01-28fs/cifs/cifsacl.c Fixes typo in a commentAchilles Gaikwad
Signed-off-by: Achilles Gaikwad <achillesgaikwad@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>
2018-01-28NFS: Remove a redundant call to unmap_mapping_range()Trond Myklebust
We don't need to call unmap_mapping_range() prior to calling nfs_sync_mapping(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-26update internal version number for cifs.koSteve French
To version 2.11 Signed-off-by: Steve French <smfrench@gmail.com>
2018-01-26cifs: add .splice_writeAndrés Souto
add splice_write support in cifs vfs using iter_file_splice_write Signed-off-by: Andrés Souto <kai670@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>
2018-01-26CIFS: document tcon/ses/server refcount danceAurelien Aptel
Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-01-26move a few externs to smbdirect.h to eliminate warningSteve French
Quiet minor sparse warnings in new SMB3 rdma patch series ("symbol was not declared ...") by moving these externs to smbdirect.h Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-01-26CIFS: zero sensitive data when freeingAurelien Aptel
also replaces memset()+kfree() by kzfree(). Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Cc: <stable@vger.kernel.org>