summaryrefslogtreecommitdiff
path: root/fs/ubifs/file.c
AgeCommit message (Collapse)Author
2017-09-08mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPYJérôme Glisse
Introduce a new migration mode that allow to offload the copy to a device DMA engine. This changes the workflow of migration and not all address_space migratepage callback can support this. This is intended to be use by migrate_vma() which itself is use for thing like HMM (see include/linux/hmm.h). No additional per-filesystem migratepage testing is needed. I disables MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i added comment in those to explain why (part of this patch). The commit message is unclear it should say that any callback that wish to support this new mode need to be aware of the difference in the migration flow from other mode. Some of these callbacks do extra locking while copying (aio, zsmalloc, balloon, ...) and for DMA to be effective you want to copy multiple pages in one DMA operations. But in the problematic case you can not easily hold the extra lock accross multiple call to this callback. Usual flow is: For each page { 1 - lock page 2 - call migratepage() callback 3 - (extra locking in some migratepage() callback) 4 - migrate page state (freeze refcount, update page cache, buffer head, ...) 5 - copy page 6 - (unlock any extra lock of migratepage() callback) 7 - return from migratepage() callback 8 - unlock page } The new mode MIGRATE_SYNC_NO_COPY: 1 - lock multiple pages For each page { 2 - call migratepage() callback 3 - abort in all problematic migratepage() callback 4 - migrate page state (freeze refcount, update page cache, buffer head, ...) } // finished all calls to migratepage() callback 5 - DMA copy multiple pages 6 - unlock all the pages To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a new callback migratepages() (for instance) that deals with multiple pages in one transaction. Because the problematic cases are not important for current usage I did not wanted to complexify this patchset even more for no good reason. Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Nellans <dnellans@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sherry Cheung <SCheung@nvidia.com> Cc: Subhash Gutti <sgutti@nvidia.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Bob Liu <liubo95@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-01fs: convert a pile of fsync routines to errseq_t based reportingJeff Layton
This patch converts most of the in-kernel filesystems that do writeback out of the pagecache to report errors using the errseq_t-based infrastructure that was recently added. This allows them to report errors once for each open file description. Most filesystems have a fairly straightforward fsync operation. They call filemap_write_and_wait_range to write back all of the data and wait on it, and then (sometimes) sync out the metadata. For those filesystems this is a straightforward conversion from calling filemap_write_and_wait_range in their fsync operation to calling file_write_and_wait_range. Acked-by: Jan Kara <jack@suse.cz> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-07-14ubifs: Change gfp flags in page allocation for bulk readHyunchul Lee
In low memory situations, page allocations for bulk read can kill applications for reclaiming memory, and print an failure message when allocations are failed. Because bulk read is just an optimization, we don't have to do these and can stop page allocations. Though this siutation happens rarely, add __GFP_NORETRY to prevent from excessive memory reclaim and killing applications, and __GFP_WARN to suppress this failure message. For this, Use readahead_gfp_mask for gfp flags when allocating pages. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: Remove dead code from ubifs_get_link()Richard Weinberger
We check the length already, no need to check later again for an empty string. Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-05ubifs: don't bother checking for encryption key in ->mmap()Eric Biggers
Since only an open file can be mmap'ed, and we only allow open()ing an encrypted file when its key is available, there is no need to check for the key again before permitting each mmap(). Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Richard Weinberger <richard@nod.at> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-05ubifs: require key for truncate(2) of encrypted fileEric Biggers
Currently, filesystems allow truncate(2) on an encrypted file without the encryption key. However, it's impossible to correctly handle the case where the size being truncated to is not a multiple of the filesystem block size, because that would require decrypting the final block, zeroing the part beyond i_size, then encrypting the block. As other modifications to encrypted file contents are prohibited without the key, just prohibit truncate(2) as well, making it fail with ENOKEY. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-05-08fs: ubifs: replace CURRENT_TIME_SEC with current_timeDeepa Dinamani
CURRENT_TIME_SEC is not y2038 safe. current_time() will be transitioned to use 64 bit time along with vfs in a separate patch. There is no plan to transition CURRENT_TIME_SEC to use y2038 safe time interfaces. current_time() returns timestamps according to the granularities set in the inode's super_block. The granularity check to call current_fs_time() or CURRENT_TIME_SEC is not required. Use current_time() directly to update inode timestamp. Use timespec_trunc during file system creation, before the first inode is created. Link: http://lkml.kernel.org/r/1491613030-11599-9-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: Richard Weinberger <richard@nod.at> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmfDave Jiang
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to take a vma and vmf parameter when the vma already resides in vmf. Remove the vma parameter to simplify things. [arnd@arndb.de: fix ARM build] Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-17Merge uncontroversial parts of branch 'readlink' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull partial readlink cleanups from Miklos Szeredi. This is the uncontroversial part of the readlink cleanup patch-set that simplifies the default readlink handling. Miklos and Al are still discussing the rest of the series. * git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: vfs: make generic_readlink() static vfs: remove ".readlink = generic_readlink" assignments vfs: default to generic_readlink() vfs: replace calling i_op->readlink with vfs_readlink() proc/self: use generic_readlink ecryptfs: use vfs_get_link() bad_inode: add missing i_op initializers
2016-12-12ubifs: Add support for encrypted symlinksRichard Weinberger
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-12-12ubifs: Implement encrypt/decrypt for all IORichard Weinberger
Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-12-12ubifs: Enforce crypto policy in mmapRichard Weinberger
We need this extra check in mmap because a process could gain an already opened fd. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-12-12ubifs: Implement file open operationRichard Weinberger
We need ->open() for files to load the crypto key. If the no key is present and the file is encrypted, refuse to open. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-12-09vfs: remove ".readlink = generic_readlink" assignmentsMiklos Szeredi
If .readlink == NULL implies generic_readlink(). Generated by: to_del="\.readlink.*=.*generic_readlink" for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-11Merge tag 'upstream-4.9-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI/UBIFS updates from Richard Weinberger: "This pull request contains: - Fixes for both UBI and UBIFS - overlayfs support (O_TMPFILE, RENAME_WHITEOUT/EXCHANGE) - Code refactoring for the upcoming MLC support" [ Ugh, we just got rid of the "rename2()" naming for the extended rename functionality. And this re-introduces it in ubifs with the cross- renaming and whiteout support. But rather than do any re-organizations in the merge itself, the naming can be cleaned up later ] * tag 'upstream-4.9-rc1' of git://git.infradead.org/linux-ubifs: (27 commits) UBIFS: improve function-level documentation ubifs: fix host xattr_len when changing xattr ubifs: Use move variable in ubifs_rename() ubifs: Implement RENAME_EXCHANGE ubifs: Implement RENAME_WHITEOUT ubifs: Implement O_TMPFILE ubi: Fix Fastmap's update_vol() ubi: Fix races around ubi_refill_pools() ubi: Deal with interrupted erasures in WL UBI: introduce the VID buffer concept UBI: hide EBA internals UBI: provide an helper to query LEB information UBI: provide an helper to check whether a LEB is mapped or not UBI: add an helper to check lnum validity UBI: simplify LEB write and atomic LEB change code UBI: simplify recover_peb() code UBI: move the global ech and vidh variables into struct ubi_attach_info UBI: provide helpers to allocate and free aeb elements UBI: fastmap: use ubi_io_{read, write}_data() instead of ubi_io_{read, write}() UBI: fastmap: use ubi_rb_for_each_entry() in unmap_peb() ...
2016-10-10Merge branch 'work.xattr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr updates from Al Viro: "xattr stuff from Andreas This completes the switch to xattr_handler ->get()/->set() from ->getxattr/->setxattr/->removexattr" * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Remove {get,set,remove}xattr inode operations xattr: Stop calling {get,set,remove}xattr inode operations vfs: Check for the IOP_XATTR flag in listxattr xattr: Add __vfs_{get,set,remove}xattr helpers libfs: Use IOP_XATTR flag for empty directory handling vfs: Use IOP_XATTR flag for bad-inode handling vfs: Add IOP_XATTR inode operations flag vfs: Move xattr_resolve_name to the front of fs/xattr.c ecryptfs: Switch to generic xattr handlers sockfs: Get rid of getxattr iop sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names kernfs: Switch to generic xattr handlers hfs: Switch to generic xattr handlers jffs2: Remove jffs2_{get,set,remove}xattr macros xattr: Remove unnecessary NULL attribute name check
2016-10-07vfs: Remove {get,set,remove}xattr inode operationsAndreas Gruenbacher
These inode operations are no longer used; remove them. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-02UBIFS: improve function-level documentationJulia Lawall
Fix various inconsistencies in the documentation associated with various functions. In the case of fs/ubifs/lprops.c, the second parameter of ubifs_get_lp_stats was renamed from st to lst in commit 84abf972ccff ("UBIFS: add re-mount debugging checks") In the case of fs/ubifs/lpt_commit.c, the excess variables have never existed in the associated functions since the code was introduced into the kernel. The others appear to be straightforward typos. Issues detected using Coccinelle (http://coccinelle.lip6.fr/) Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-09-22fs: Give dentry to inode_change_ok() instead of inodeJan Kara
inode_change_ok() will be resposible for clearing capabilities and IMA extended attributes and as such will need dentry. Give it as an argument to inode_change_ok() instead of an inode. Also rename inode_change_ok() to setattr_prepare() to better relect that it does also some modifications in addition to checks. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2016-06-23UBIFS: Implement ->migratepage()Kirill A. Shutemov
During page migrations UBIFS might get confused and the following assert triggers: [ 213.480000] UBIFS assert failed in ubifs_set_page_dirty at 1451 (pid 436) [ 213.490000] CPU: 0 PID: 436 Comm: drm-stress-test Not tainted 4.4.4-00176-geaa802524636-dirty #1008 [ 213.490000] Hardware name: Allwinner sun4i/sun5i Families [ 213.490000] [<c0015e70>] (unwind_backtrace) from [<c0012cdc>] (show_stack+0x10/0x14) [ 213.490000] [<c0012cdc>] (show_stack) from [<c02ad834>] (dump_stack+0x8c/0xa0) [ 213.490000] [<c02ad834>] (dump_stack) from [<c0236ee8>] (ubifs_set_page_dirty+0x44/0x50) [ 213.490000] [<c0236ee8>] (ubifs_set_page_dirty) from [<c00fa0bc>] (try_to_unmap_one+0x10c/0x3a8) [ 213.490000] [<c00fa0bc>] (try_to_unmap_one) from [<c00fadb4>] (rmap_walk+0xb4/0x290) [ 213.490000] [<c00fadb4>] (rmap_walk) from [<c00fb1bc>] (try_to_unmap+0x64/0x80) [ 213.490000] [<c00fb1bc>] (try_to_unmap) from [<c010dc28>] (migrate_pages+0x328/0x7a0) [ 213.490000] [<c010dc28>] (migrate_pages) from [<c00d0cb0>] (alloc_contig_range+0x168/0x2f4) [ 213.490000] [<c00d0cb0>] (alloc_contig_range) from [<c010ec00>] (cma_alloc+0x170/0x2c0) [ 213.490000] [<c010ec00>] (cma_alloc) from [<c001a958>] (__alloc_from_contiguous+0x38/0xd8) [ 213.490000] [<c001a958>] (__alloc_from_contiguous) from [<c001ad44>] (__dma_alloc+0x23c/0x274) [ 213.490000] [<c001ad44>] (__dma_alloc) from [<c001ae08>] (arm_dma_alloc+0x54/0x5c) [ 213.490000] [<c001ae08>] (arm_dma_alloc) from [<c035cecc>] (drm_gem_cma_create+0xb8/0xf0) [ 213.490000] [<c035cecc>] (drm_gem_cma_create) from [<c035cf20>] (drm_gem_cma_create_with_handle+0x1c/0xe8) [ 213.490000] [<c035cf20>] (drm_gem_cma_create_with_handle) from [<c035d088>] (drm_gem_cma_dumb_create+0x3c/0x48) [ 213.490000] [<c035d088>] (drm_gem_cma_dumb_create) from [<c0341ed8>] (drm_ioctl+0x12c/0x444) [ 213.490000] [<c0341ed8>] (drm_ioctl) from [<c0121adc>] (do_vfs_ioctl+0x3f4/0x614) [ 213.490000] [<c0121adc>] (do_vfs_ioctl) from [<c0121d30>] (SyS_ioctl+0x34/0x5c) [ 213.490000] [<c0121d30>] (SyS_ioctl) from [<c000f2c0>] (ret_fast_syscall+0x0/0x34) UBIFS is using PagePrivate() which can have different meanings across filesystems. Therefore the generic page migration code cannot handle this case correctly. We have to implement our own migration function which basically does a plain copy but also duplicates the page private flag. UBIFS is not a block device filesystem and cannot use buffer_migrate_page(). Cc: stable@vger.kernel.org Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [rw: Massaged changelog, build fixes, etc...] Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Christoph Hellwig <hch@lst.de>
2016-05-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull remaining vfs xattr work from Al Viro: "The rest of work.xattr (non-cifs conversions)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: Switch to generic xattr handlers ubifs: Switch to generic xattr handlers jfs: Switch to generic xattr handlers jfs: Clean up xattr name mapping gfs2: Switch to generic xattr handlers ceph: kill __ceph_removexattr() ceph: Switch to generic xattr handlers ceph: Get rid of d_find_alias in ceph_set_acl
2016-05-17ubifs: Switch to generic xattr handlersAndreas Gruenbacher
Ubifs internally uses special inodes for storing xattrs. Those inodes had NULL {get,set,remove}xattr inode operations before this change, so xattr operations on them would fail. The super block's s_xattr field would also apply to those special inodes. However, the inodes are not visible outside of ubifs, and so no xattr operations will ever be carried out on them anyway. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-04mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usageKirill A. Shutemov
Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22wrappers for ->i_mutex accessAl Viro
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08replace ->follow_link() with new method that could stay in RCU modeAl Viro
new method: ->get_link(); replacement of ->follow_link(). The differences are: * inode and dentry are passed separately * might be called both in RCU and non-RCU mode; the former is indicated by passing it a NULL dentry. * when called that way it isn't allowed to block and should return ERR_PTR(-ECHILD) if it needs to be called in non-RCU mode. It's a flagday change - the old method is gone, all in-tree instances converted. Conversion isn't hard; said that, so far very few instances do not immediately bail out when called in RCU mode. That'll change in the next commits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-07ubifs: introduce UBIFS_ATIME_SUPPORT to ubifsDongsheng Yang
To make ubifs support atime flexily, this commit introduces a Kconfig option named as UBIFS_ATIME_SUPPORT. With UBIFS_ATIME_SUPPORT=n: ubifs keeps the full compatibility to no_atime from the start of ubifs. =================UBIFS_ATIME_SUPPORT=n======================= -o - no atime -o atime - no atime -o noatime - no atime -o relatime - no atime -o strictatime - no atime -o lazyatime - no atime With UBIFS_ATIME_SUPPORT=y: ubifs supports the atime same with other main stream file systems. =================UBIFS_ATIME_SUPPORT=y======================= -o - default behavior (relatime currently) -o atime - atime support -o noatime - no atime support -o relatime - relative atime support -o strictatime - strict atime support -o lazyatime - lazy atime support Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Richard Weinberger <richard@nod.at>
2015-05-10ubifs: switch to simple_follow_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-15Merge tag 'upstream-4.1-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI/UBIFS updates from Richard Weinberger: "This pull request includes the following UBI/UBIFS changes: - powercut emulation for UBI - a huge update to UBI Fastmap - cleanups and bugfixes all over UBI and UBIFS" * tag 'upstream-4.1-rc1' of git://git.infradead.org/linux-ubifs: (50 commits) UBI: power cut emulation for testing UBIFS: fix output format of INUM_WATERMARK UBI: Fastmap: Fall back to scanning mode after ECC error UBI: Fastmap: Remove is_fm_block() UBI: Fastmap: Add blank line after declarations UBI: Fastmap: Remove else after return. UBI: Fastmap: Introduce may_reserve_for_fm() UBI: Fastmap: Introduce ubi_fastmap_init() UBI: Fastmap: Wire up WL accessor functions UBI: Add accessor functions for WL data structures UBI: Move fastmap specific functions out of wl.c UBI: Fastmap: Add new module parameter fm_debug UBI: Fastmap: Make self_check_eba() depend on fastmap self checking UBI: Fastmap: Add self check to detect absent PEBs UBI: Fix stale pointers in ubi->lookuptbl UBI: Fastmap: Enhance fastmap checking UBI: Add initial support for fastmap self checks UBI: Fastmap: Rework fastmap error paths UBI: Fastmap: Prepare for variable sized fastmaps UBI: Fastmap: Locking updates ...
2015-04-15VFS: normal filesystems (and lustre): d_inode() annotationsDavid Howells
that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11make new_sync_{read,write}() staticAl Viro
All places outside of core VFS that checked ->read and ->write for being NULL or called the methods directly are gone now, so NULL {read,write} with non-NULL {read,write}_iter will do the right thing in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-25fs: move struct kiocb to fs.hChristoph Hellwig
struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-25UBIFS: extend debug/message capabilitiesSheng Yong
In the case where we have more than one volumes on different UBI devices, it may be not that easy to tell which volume prints the messages. Add ubi number and volume id in ubifs_msg/warn/error to help debug. These two values are passed by struct ubifs_info. For those where ubifs_info is not initialized yet, ubifs_* is replaced by pr_*. For those where ubifs_info is not avaliable, ubifs_info is passed to the calling function as a const parameter. The output looks like, [ 95.444879] UBIFS (ubi0:1): background thread "ubifs_bgt0_1" started, PID 696 [ 95.484688] UBIFS (ubi0:1): UBIFS: mounted UBI device 0, volume 1, name "test1" [ 95.484694] UBIFS (ubi0:1): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes [ 95.484699] UBIFS (ubi0:1): FS size: 30220288 bytes (28 MiB, 238 LEBs), journal size 1523712 bytes (1 MiB, 12 LEBs) [ 95.484703] UBIFS (ubi0:1): reserved for root: 1427378 bytes (1393 KiB) [ 95.484709] UBIFS (ubi0:1): media format: w4/r0 (latest is w4/r0), UUID 40DFFC0E-70BE-4193-8905-F7D6DFE60B17, small LPT model [ 95.489875] UBIFS (ubi1:0): background thread "ubifs_bgt1_0" started, PID 699 [ 95.529713] UBIFS (ubi1:0): UBIFS: mounted UBI device 1, volume 0, name "test2" [ 95.529718] UBIFS (ubi1:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes [ 95.529724] UBIFS (ubi1:0): FS size: 19808256 bytes (18 MiB, 156 LEBs), journal size 1015809 bytes (0 MiB, 8 LEBs) [ 95.529727] UBIFS (ubi1:0): reserved for root: 935592 bytes (913 KiB) [ 95.529733] UBIFS (ubi1:0): media format: w4/r0 (latest is w4/r0), UUID EEB7779D-F419-4CA9-811B-831CAC7233D4, small LPT model [ 954.264767] UBIFS error (ubi1:0 pid 756): ubifs_read_node: bad node type (255 but expected 6) [ 954.367030] UBIFS error (ubi1:0 pid 756): ubifs_read_node: bad node at LEB 0:0, LEB mapping status 1 Signed-off-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-02-15Merge branch 'for-linus-v3.20' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI and UBIFS updates from Richard Weinberger: - cleanups and bug fixes all over UBI and UBIFS - block-mq support for UBI Block - UBI volumes can now be renamed while they are in use - security.* XATTR support for UBIFS - a maintainer update * 'for-linus-v3.20' of git://git.infradead.org/linux-ubifs: UBI: block: Fix checking for NULL instead of IS_ERR() UBI: block: Continue creating ubiblocks after an initialization error UBIFS: return -EINVAL if log head is empty UBI: Block: Explain usage of blk_rq_map_sg() UBI: fix soft lockup in ubi_check_volume() UBI: Fastmap: Care about the protection queue UBIFS: add a couple of extra asserts UBI: do propagate positive error codes up UBI: clean-up printing helpers UBI: extend UBI layer debug/messaging capabilities - cosmetics UBIFS: add ubifs_err() to print error reason UBIFS: Add security.* XATTR support for the UBIFS UBIFS: Add xattr support for symlinks UBI: Block: Add blk-mq support UBI: Add initial support for scatter gather UBI: rename_volumes: Use UBI_METAONLY UBI: Implement UBI_METAONLY Add myself as UBI co-maintainer
2015-02-10mm: drop vm_ops->remap_pages and generic_file_remap_pages() stubKirill A. Shutemov
Nobody uses it anymore. [akpm@linux-foundation.org: fix filemap_xip.c] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-28UBIFS: Add xattr support for symlinksSubodh Nijsure
Artem: rename the __ubifs_setxattr() functions to just 'setxattr()'. Signed-off-by: Subodh Nijsure <snijsure@grid-net.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Ben Shelton <ben.shelton@ni.com> Acked-by: Terry Wilcox <terry.wilcox@ni.com> Acked-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-11-07UBIFS: fix budget leak in error pathArtem Bityutskiy
We forgot to free the budget in 'write_begin_slow()' when 'do_readpage()' fails. This patch fixes the issue. Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-06-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...
2014-06-12->splice_write() via ->write_iter()Al Viro
iter_file_splice_write() - a ->splice_write() instance that gathers the pipe buffers, builds a bio_vec-based iov_iter covering those and feeds it to ->write_iter(). A bunch of simple cases coverted to that... [AV: fixed the braino spotted by Cyrill] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-28UBIFS: fix debugging checkArtem Bityutskiy
The debugging check which verifies that we never write outside of the file length was incorrect, since it was multiplying file length by the page size, instead of dividing. Fix this. Spotted-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-05-27UBIFS: add missing ui pointer in debugging codeDaniel Golle
If UBIFS_DEBUG is defined an additional assertion of the ui_lock spinlock in do_writepage cannot compile because the ui pointer has not been previously declared. Fix this by declaring and initializing the ui pointer in case UBIFS_DEBUG is defined. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-05-13UBIFS: fix an mmap and fsync race conditionhujianyang
There is a race condition in UBIFS: Thread A (mmap) Thread B (fsync) ->__do_fault ->write_cache_pages -> ubifs_vm_page_mkwrite -> budget_space -> lock_page -> release/convert_page_budget -> SetPagePrivate -> TestSetPageDirty -> unlock_page -> lock_page -> TestClearPageDirty -> ubifs_writepage -> do_writepage -> release_budget -> ClearPagePrivate -> unlock_page -> !(ret & VM_FAULT_LOCKED) -> lock_page -> set_page_dirty -> ubifs_set_page_dirty -> TestSetPageDirty (set page dirty without budgeting) -> unlock_page This leads to situation where we have a diry page but no budget allocated for this page, so further write-back may fail with -ENOSPC. In this fix we return from page_mkwrite without performing unlock_page. We return VM_FAULT_LOCKED instead. After doing this, the race above will not happen. Signed-off-by: hujianyang <hujianyang@huawei.com> Tested-by: Laurence Withers <lwithers@guralp.com> Cc: stable@vger.kernel.org Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-05-06ubifs: switch to ->write_iter()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06switch simple generic_file_aio_read() users to ->read_iter()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-07mm: implement ->map_pages for page cacheKirill A. Shutemov
filemap_map_pages() is generic implementation of ->map_pages() for filesystems who uses page cache. It should be safe to use filemap_map_pages() for ->map_pages() if filesystem use filemap_fault() for ->fault(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Ning Qu <quning@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-21mm: change invalidatepage prototype to accept lengthLukas Czerner
Currently there is no way to truncate partial page where the end truncate point is not at the end of the page. This is because it was not needed and the functionality was enough for file system truncate operation to work properly. However more file systems now support punch hole feature and it can benefit from mm supporting truncating page just up to the certain point. Specifically, with this functionality truncate_inode_pages_range() can be changed so it supports truncating partial page at the end of the range (currently it will BUG_ON() if 'end' is not at the end of the page). This commit changes the invalidatepage() address space operation prototype to accept range to be invalidated and update all the instances for it. We also change the block_invalidatepage() in the same way and actually make a use of the new length argument implementing range invalidation. Actual file system implementations will follow except the file systems where the changes are really simple and should not change the behaviour in any way .Implementation for truncate_page_range() which will be able to accept page unaligned ranges will follow as well. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com>
2013-05-07aio: don't include aio.h in sched.hKent Overstreet
Faster kernel compiles by way of fewer unnecessary includes. [akpm@linux-foundation.org: fix fallout] [akpm@linux-foundation.org: fix build] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
2013-02-22new helper: file_inode(file)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>