summaryrefslogtreecommitdiff
path: root/include/linux/fs.h
AgeCommit message (Collapse)Author
2016-03-17fs crypto: move per-file encryption from f2fs tree to fs/cryptoJaegeuk Kim
This patch adds the renamed functions moved from the f2fs crypto files. 1. definitions for per-file encryption used by ext4 and f2fs. 2. crypto.c for encrypt/decrypt functions a. IO preparation: - fscrypt_get_ctx / fscrypt_release_ctx b. before IOs: - fscrypt_encrypt_page - fscrypt_decrypt_page - fscrypt_zeroout_range c. after IOs: - fscrypt_decrypt_bio_pages - fscrypt_pullback_bio_page - fscrypt_restore_control_page 3. policy.c supporting context management. a. For ioctls: - fscrypt_process_policy - fscrypt_get_policy b. For context permission - fscrypt_has_permitted_context - fscrypt_inherit_context 4. keyinfo.c to handle permissions - fscrypt_get_encryption_info - fscrypt_free_encryption_info 5. fname.c to support filename encryption a. general wrapper functions - fscrypt_fname_disk_to_usr - fscrypt_fname_usr_to_disk - fscrypt_setup_filename - fscrypt_free_filename b. specific filename handling functions - fscrypt_fname_alloc_buffer - fscrypt_fname_free_buffer 6. Makefile and Kconfig Cc: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Ildar Muslukhov <ildarm@google.com> Signed-off-by: Uday Savagaonkar <savagaon@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-14kill dentry_unhash()Al Viro
the last user is gone Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-04vfs: add the RWF_HIPRI flag for preadv2/pwritev2Christoph Hellwig
This adds a flag that tells the file system that this is a high priority request for which it's worth to poll the hardware. The flag is purely advisory and can be ignored if not supported. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stephen Bates <stephen.bates@pmcs.com> Tested-by: Stephen Bates <stephen.bates@pmcs.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-04vfs: pass a flags argument to vfs_readv/vfs_writevChristoph Hellwig
This way we can set kiocb flags also from the sync read/write path for the read_iter/write_iter operations. For now there is no way to pass flags to plain read/write operations as there is no real need for that, and all flags passed are explicitly rejected for these files. Signed-off-by: Milosz Tanski <milosz@adfin.com> [hch: rebased on top of my kiocb changes] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stephen Bates <stephen.bates@pmcs.com> Tested-by: Stephen Bates <stephen.bates@pmcs.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-21ima: load policy using pathDmitry Kasatkin
We currently cannot do appraisal or signature vetting of IMA policies since we currently can only load IMA policies by writing the contents of the policy directly in, as follows: cat policy-file > <securityfs>/ima/policy If we provide the kernel the path to the IMA policy so it can load the policy itself it'd be able to later appraise or vet the file signature if it has one. This patch adds support to load the IMA policy with a given path as follows: echo /etc/ima/ima_policy > /sys/kernel/security/ima/policy Changelog v4+: - moved kernel_read_file_from_path() error messages to callers v3: - moved kernel_read_file_from_path() to a separate patch v2: - after re-ordering the patches, replace calling integrity_kernel_read() to read the file with kernel_read_file_from_path() (Mimi) - Patch description re-written by Luis R. Rodriguez Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2016-02-21kexec: replace call to copy_file_from_fd() with kernel versionMimi Zohar
Replace copy_file_from_fd() with kernel_read_file_from_fd(). Two new identifiers named READING_KEXEC_IMAGE and READING_KEXEC_INITRAMFS are defined for measuring, appraising or auditing the kexec image and initramfs. Changelog v3: - return -EBADF, not -ENOEXEC - identifier change - split patch, moving copy_file_from_fd() to a separate patch - split patch, moving IMA changes to a separate patch v0: - use kstat file size type loff_t, not size_t - Calculate the file hash from the in memory buffer - Dave Young Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Acked-by: Dave Young <dyoung@redhat.com>
2016-02-21module: replace copy_module_from_fd with kernel versionMimi Zohar
Replace copy_module_from_fd() with kernel_read_file_from_fd(). Although none of the upstreamed LSMs define a kernel_module_from_file hook, IMA is called, based on policy, to prevent unsigned kernel modules from being loaded by the original kernel module syscall and to measure/appraise signed kernel modules. The security function security_kernel_module_from_file() was called prior to reading a kernel module. Preventing unsigned kernel modules from being loaded by the original kernel module syscall remains on the pre-read kernel_read_file() security hook. Instead of reading the kernel module twice, once for measuring/appraising and again for loading the kernel module, the signature validation is moved to the kernel_post_read_file() security hook. This patch removes the security_kernel_module_from_file() hook and security call. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au>
2016-02-21vfs: define kernel_copy_file_from_fd()Mimi Zohar
This patch defines kernel_read_file_from_fd(), a wrapper for the VFS common kernel_read_file(). Changelog: - Separated from the kernel modules patch Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2016-02-21firmware: replace call to fw_read_file_contents() with kernel versionMimi Zohar
Replace the fw_read_file_contents with kernel_file_read_from_path(). Although none of the upstreamed LSMs define a kernel_fw_from_file hook, IMA is called by the security function to prevent unsigned firmware from being loaded and to measure/appraise signed firmware, based on policy. Instead of reading the firmware twice, once for measuring/appraising the firmware and again for reading the firmware contents into memory, the kernel_post_read_file() security hook calculates the file hash based on the in memory file buffer. The firmware is read once. This patch removes the LSM kernel_fw_from_file() hook and security call. Changelog v4+: - revert dropped buf->size assignment - reported by Sergey Senozhatsky v3: - remove kernel_fw_from_file hook - use kernel_file_read_from_path() - requested by Luis v2: - reordered and squashed firmware patches - fix MAX firmware size (Kees Cook) Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
2016-02-21vfs: define kernel_read_file_from_pathMimi Zohar
This patch defines kernel_read_file_from_path(), a wrapper for the VFS common kernel_read_file(). Changelog: - revert error msg regression - reported by Sergey Senozhatsky - Separated from the IMA patch Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk>
2016-02-18vfs: define kernel_read_file_id enumerationMimi Zohar
To differentiate between the kernel_read_file() callers, this patch defines a new enumeration named kernel_read_file_id and includes the caller identifier as an argument. Subsequent patches define READING_KEXEC_IMAGE, READING_KEXEC_INITRAMFS, READING_FIRMWARE, READING_MODULE, and READING_POLICY. Changelog v3: - Replace the IMA specific enumeration with a generic one. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk>
2016-02-18vfs: define a generic function to read a file from the kernelMimi Zohar
For a while it was looked down upon to directly read files from Linux. These days there exists a few mechanisms in the kernel that do just this though to load a file into a local buffer. There are minor but important checks differences on each. This patch set is the first attempt at resolving some of these differences. This patch introduces a common function for reading files from the kernel with the corresponding security post-read hook and function. Changelog v4+: - export security_kernel_post_read_file() - Fengguang Wu v3: - additional bounds checking - Luis v2: - To simplify patch review, re-ordered patches Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Reviewed-by: Luis R. Rodriguez <mcgrof@suse.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk>
2016-02-08direct-io: always call ->end_io if non-NULLChristoph Hellwig
This way we can pass back errors to the file system, and allow for cleanup required for all direct I/O invocations. Also allow the ->end_io handlers to return errors on their own, so that I/O completion errors can be passed on to the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-30block: revert runtime dax control of the raw block deviceDan Williams
Dynamically enabling DAX requires that the page cache first be flushed and invalidated. This must occur atomically with the change of DAX mode otherwise we confuse the fsync/msync tracking and violate data durability guarantees. Eliminate the possibilty of DAX-disabled to DAX-enabled transitions for now and revisit this for the next cycle. Cc: Jan Kara <jack@suse.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-30fs, block: force direct-I/O for dax-enabled block devicesDan Williams
Similar to the file I/O path, re-direct all I/O to the DAX path for I/O to a block-device special file. Both regular files and device special files can use the common filp->f_mapping->host lookup to determing is DAX is enabled. Otherwise, we confuse the DAX code that does not expect to find live data in the page cache: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217 __delete_from_page_cache+0x9f6/0xb60() Modules linked in: CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000 ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089 ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50 [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482 [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515 [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217 [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244 [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487 [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730 [< inline >] wp_pfn_shared mm/memory.c:2208 [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307 [< inline >] handle_pte_fault mm/memory.c:3323 [< inline >] __handle_mm_fault mm/memory.c:3417 [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446 [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238 [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331 [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264 [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986 [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185 ---[ end trace dae21e0f85f1f98c ]--- Fixes: 5a023cdba50c ("block: enable dax for raw block devices") Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull final vfs updates from Al Viro: - The ->i_mutex wrappers (with small prereq in lustre) - a fix for too early freeing of symlink bodies on shmem (they need to be RCU-delayed) (-stable fodder) - followup to dedupe stuff merged this cycle * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: abort dedupe loop if fatal signals are pending make sure that freeing shmem fast symlinks is RCU-delayed wrappers for ->i_mutex access lustre: remove unused declaration
2016-01-22dax: support dirty DAX entries in radix treeRoss Zwisler
Add support for tracking dirty DAX entries in the struct address_space radix tree. This tree is already used for dirty page writeback, and it already supports the use of exceptional (non struct page*) entries. In order to properly track dirty DAX pages we will insert new exceptional entries into the radix tree that represent dirty DAX PTE or PMD pages. These exceptional entries will also contain the writeback addresses for the PTE or PMD faults that we can use at fsync/msync time. There are currently two types of exceptional entries (shmem and shadow) that can be placed into the radix tree, and this adds a third. We rely on the fact that only one type of exceptional entry can be found in a given radix tree based on its usage. This happens for free with DAX vs shmem but we explicitly prevent shadow entries from being added to radix trees for DAX mappings. The only shadow entries that would be generated for DAX radix trees would be to track zero page mappings that were created for holes. These pages would receive minimal benefit from having shadow entries, and the choice to have only one type of exceptional entry in a given radix tree makes the logic simpler both in clear_exceptional_entry() and in the rest of DAX. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22wrappers for ->i_mutex accessAl Viro
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-13Merge tag 'libnvdimm-for-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The bulk of this has appeared in -next and independently received a build success notification from the kbuild robot. The 'for-4.5/block- dax' topic branch was rebased over the weekend to drop the "block device end-of-life" rework that Al would like to see re-implemented with a notifier, and to address bug reports against the badblocks integration. There is pending feedback against "libnvdimm: Add a poison list and export badblocks" received last week. Linda identified some localized fixups that we will handle incrementally. Summary: - Media error handling: The 'badblocks' implementation that originated in md-raid is up-levelled to a generic capability of a block device. This initial implementation is limited to being consulted in the pmem block-i/o path. Later, 'badblocks' will be consulted when creating dax mappings. - Raw block device dax: For virtualization and other cases that want large contiguous mappings of persistent memory, add the capability to dax-mmap a block device directly. - Increased /dev/mem restrictions: Add an option to treat all io-memory as IORESOURCE_EXCLUSIVE, i.e. disable /dev/mem access while a driver is actively using an address range. This behavior is controlled via the new CONFIG_IO_STRICT_DEVMEM option and can be overridden by the existing "iomem=relaxed" kernel command line option. - Miscellaneous fixes include a 'pfn'-device huge page alignment fix, block device shutdown crash fix, and other small libnvdimm fixes" * tag 'libnvdimm-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (32 commits) block: kill disk_{check|set|clear|alloc}_badblocks libnvdimm, pmem: nvdimm_read_bytes() badblocks support pmem, dax: disable dax in the presence of bad blocks pmem: fail io-requests to known bad blocks libnvdimm: convert to statically allocated badblocks libnvdimm: don't fail init for full badblocks list block, badblocks: introduce devm_init_badblocks block: clarify badblocks lifetime badblocks: rename badblocks_free to badblocks_exit libnvdimm, pmem: move definition of nvdimm_namespace_add_poison to nd.h libnvdimm: Add a poison list and export badblocks nfit_test: Enable DSMs for all test NFITs md: convert to use the generic badblocks code block: Add badblock management for gendisks badblocks: Add core badblock management code block: fix del_gendisk() vs blkdev_ioctl crash block: enable dax for raw block devices block: introduce bdev_file_inode() restrict /dev/mem to idle io memory ranges arch: consolidate CONFIG_STRICT_DEVM in lib/Kconfig.debug ...
2016-01-12Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of stuff. That probably should've been 5 or 6 separate branches, but by the time I'd realized how large and mixed that bag had become it had been too close to -final to play with rebasing. Some fs/namei.c cleanups there, memdup_user_nul() introduction and switching open-coded instances, burying long-dead code, whack-a-mole of various kinds, several new helpers for ->llseek(), assorted cleanups and fixes from various people, etc. One piece probably deserves special mention - Neil's lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets called without ->i_mutex and tries to avoid ever taking it. That, of course, means that it's not useful for any directory modifications, but things like getting inode attributes in nfds readdirplus are fine with that. I really should've asked for moratorium on lookup-related changes this cycle, but since I hadn't done that early enough... I *am* asking for that for the coming cycle, though - I'm going to try and get conversion of i_mutex to rwsem with ->lookup() done under lock taken shared. There will be a patch closer to the end of the window, along the lines of the one Linus had posted last May - mechanical conversion of ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/ inode_is_locked()/inode_lock_nested(). To quote Linus back then: ----- | This is an automated patch using | | sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/' | sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/' | sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/' | sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/' | sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/' | | with a very few manual fixups ----- I'm going to send that once the ->i_mutex-affecting stuff in -next gets mostly merged (or when Linus says he's about to stop taking merges)" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) nfsd: don't hold i_mutex over userspace upcalls fs:affs:Replace time_t with time64_t fs/9p: use fscache mutex rather than spinlock proc: add a reschedule point in proc_readfd_common() logfs: constify logfs_block_ops structures fcntl: allow to set O_DIRECT flag on pipe fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE fs: xattr: Use kvfree() [s390] page_to_phys() always returns a multiple of PAGE_SIZE nbd: use ->compat_ioctl() fs: use block_device name vsprintf helper lib/vsprintf: add %*pg format specifier fs: use gendisk->disk_name where possible poll: plug an unused argument to do_poll amdkfd: don't open-code memdup_user() cdrom: don't open-code memdup_user() rsxx: don't open-code memdup_user() mtip32xx: don't open-code memdup_user() [um] mconsole: don't open-code memdup_user_nul() [um] hostaudio: don't open-code memdup_user() ...
2016-01-12Merge branch 'work.copy_file_range' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs copy_file_range updates from Al Viro: "Several series around copy_file_range/CLONE" * 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: use new dedupe data function pointer vfs: hoist the btrfs deduplication ioctl to the vfs vfs: wire up compat ioctl for CLONE/CLONE_RANGE cifs: avoid unused variable and label nfsd: implement the NFSv4.2 CLONE operation nfsd: Pass filehandle to nfs4_preprocess_stateid_op() vfs: pull btrfs clone API to vfs layer locks: new locks_mandatory_area calling convention vfs: Add vfs_copy_file_range() support for pagecache copies btrfs: add .copy_file_range file operation x86: add sys_copy_file_range to syscall tables vfs: add copy_file_range syscall and vfs helper
2016-01-12Merge tag 'locks-v4.5-1' of git://git.samba.org/jlayton/linuxLinus Torvalds
Pull file locking updates from Jeff Layton: "File locking related changes for v4.5 (pile #1) Highlights: - new Kconfig option to allow disabling mandatory locking (which is racy anyway) - new tracepoints for setlk and close codepaths - fix for a long-standing bug in code that handles races between setting a POSIX lock and close()" * tag 'locks-v4.5-1' of git://git.samba.org/jlayton/linux: locks: rename __posix_lock_file to posix_lock_inode locks: prink more detail when there are leaked locks locks: pass inode pointer to locks_free_lock_context locks: sprinkle some tracepoints around the file locking code locks: don't check for race with close when setting OFD lock locks: fix unlock when fcntl_setlk races with a close fs: make locks.c explicitly non-modular locks: use list_first_entry_or_null() locks: Don't allow mounts in user namespaces to enable mandatory locking locks: Allow disabling mandatory locking at compile time
2016-01-11Merge branch 'work.symlinks' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs RCU symlink updates from Al Viro: "Replacement of ->follow_link/->put_link, allowing to stay in RCU mode even if the symlink is not an embedded one. No changes since the mailbomb on Jan 1" * 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->get_link() to delayed_call, kill ->put_link() kill free_page_put_link() teach nfs_get_link() to work in RCU mode teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode teach shmem_get_link() to work in RCU mode teach page_get_link() to work in RCU mode replace ->follow_link() with new method that could stay in RCU mode don't put symlink bodies in pagecache into highmem namei: page_getlink() and page_follow_link_light() are the same thing ufs: get rid of ->setattr() for symlinks udf: don't duplicate page_symlink_inode_operations logfs: don't duplicate page_symlink_inode_operations switch befs long symlinks to page_symlink_operations
2016-01-09block: enable dax for raw block devicesDan Williams
If an application wants exclusive access to all of the persistent memory provided by an NVDIMM namespace it can use this raw-block-dax facility to forgo establishing a filesystem. This capability is targeted primarily to hypervisors wanting to provision persistent memory for guests. It can be disabled / enabled dynamically via the new BLKDAXSET ioctl. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Reviewed-by: Jan Kara <jack@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-08Merge branch 'for-linus' into work.miscAl Viro
2016-01-08compat_ioctl: don't pass fd around when not neededAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08locks: pass inode pointer to locks_free_lock_contextJeff Layton
...so we can print information about it if there are leaked locks. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
2016-01-01vfs: hoist the btrfs deduplication ioctl to the vfsDarrick J. Wong
Hoist the btrfs EXTENT_SAME ioctl up to the VFS and make the name more systematic (FIDEDUPERANGE). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30switch ->get_link() to delayed_call, kill ->put_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-29kill free_page_put_link()Al Viro
all callers are better off with kfree_put_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-23new helpers: no_seek_end_llseek{,_size}()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08replace ->follow_link() with new method that could stay in RCU modeAl Viro
new method: ->get_link(); replacement of ->follow_link(). The differences are: * inode and dentry are passed separately * might be called both in RCU and non-RCU mode; the former is indicated by passing it a NULL dentry. * when called that way it isn't allowed to block and should return ERR_PTR(-ECHILD) if it needs to be called in non-RCU mode. It's a flagday change - the old method is gone, all in-tree instances converted. Conversion isn't hard; said that, so far very few instances do not immediately bail out when called in RCU mode. That'll change in the next commits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08don't put symlink bodies in pagecache into highmemAl Viro
kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-07vfs: pull btrfs clone API to vfs layerChristoph Hellwig
The btrfs clone ioctls are now adopted by other file systems, with NFS and CIFS already having support for them, and XFS being under active development. To avoid growth of various slightly incompatible implementations, add one to the VFS. Note that clones are different from file copies in several ways: - they are atomic vs other writers - they support whole file clones - they support 64-bit legth clones - they do not allow partial success (aka short writes) - clones are expected to be a fast metadata operation Because of that it would be rather cumbersome to try to piggyback them on top of the recent clone_file_range infrastructure. The converse isn't true and the clone_file_range system call could try clone file range as a first attempt to copy, something that further patches will enable. Based on earlier work from Peng Tao. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-07locks: new locks_mandatory_area calling conventionChristoph Hellwig
Pass a loff_t end for the last byte instead of the 32-bit count parameter to allow full file clones even on 32-bit architectures. While we're at it also simplify the read/write selection. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06fs/attr.c: is_sxid can be booleanYaowei Bai
This patch makes is_sxid return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06fs/bad_inode.c: is_bad_inode can be booleanYaowei Bai
This patch makes is_bad_inode return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06fs/dcache.c: is_subdir can be booleanYaowei Bai
This patch makes is_subdir return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06fs/namespace.c: path_is_under can be booleanYaowei Bai
This patch makes path_is_under return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06fs/block_dev.c: make sb_is_blkdev_sb return bool when CONFIG_BLOCK undefinedYaowei Bai
Currently when CONFIG_BLOCK is defined sb_is_blkdev_sb returns bool, while when CONFIG_BLOCK is not defined it returns int. Let's keep consistent to make sb_is_blkdev_sb return bool as well when CONFIG_BLOCK isn't defined. No functional change. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-01vfs: add copy_file_range syscall and vfs helperZach Brown
Add a copy_file_range() system call for offloading copies between regular files. This gives an interface to underlying layers of the storage stack which can copy without reading and writing all the data. There are a few candidates that should support copy offloading in the nearer term: - btrfs shares extent references with its clone ioctl - NFS has patches to add a COPY command which copies on the server - SCSI has a family of XCOPY commands which copy in the device This system call avoids the complexity of also accelerating the creation of the destination file by operating on an existing destination file descriptor, not a path. Currently the high level vfs entry point limits copy offloading to files on the same mount and super (and not in the same file). This can be relaxed if we get implementations which can copy between file systems safely. Signed-off-by: Zach Brown <zab@redhat.com> [Anna Schumaker: Change -EINVAL to -EBADF during file verification, Change flags parameter from int to unsigned int, Add function to include/linux/syscalls.h, Check copy len after file open mode, Don't forbid ranges inside the same file, Use rw_verify_area() to veriy ranges, Use file_out rather than file_in, Add COPY_FR_REFLINK flag] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-16locks: Allow disabling mandatory locking at compile timeJeff Layton
Mandatory locking appears to be almost unused and buggy and there appears no real interest in doing anything with it. Since effectively no one uses the code and since the code is buggy let's allow it to be disabled at compile time. I would just suggest removing the code but undoubtedly that will break some piece of userspace code somewhere. For the distributions that don't care about this piece of code this gives a nice starting point to make mandatory locking go away. Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jeff Layton <jeff.layton@primarydata.com> Cc: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-11-11Merge branch 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - misc stable fixes - trivial kernel-doc and comment fixups - remove never-used block_page_mkwrite() wrapper function, and rename the function that is _actually_ used to not have double underscores. * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: 9p: cache.h: Add #define of include guard vfs: remove stale comment in inode_operations vfs: remove unused wrapper block_page_mkwrite() binfmt_elf: Correct `arch_check_elf's description fs: fix writeback.c kernel-doc warnings fs: fix inode.c kernel-doc warning fs/pipe.c: return error code rather than 0 in pipe_write() fs/pipe.c: preserve alloc_file() error code binfmt_elf: Don't clobber passed executable's file header FS-Cache: Handle a write to the page immediately beyond the EOF marker cachefiles: perform test on s_blocksize when opening cache file. FS-Cache: Don't override netfs's primary_index if registering failed FS-Cache: Increase reference of parent after registering, netfs success debugfs: fix refcount imbalance in start_creating
2015-11-11vfs: remove stale comment in inode_operationsRoss Zwisler
The big warning comment that is currently at the end of struct inode_operations was added as part of this commit: 4aa7c6346be3 ("vfs: add i_op->dentry_open()") It was added to warn people not to use the newly added 'dentry_open' function pointer. This function pointer was removed as part of this commit: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") The comment was left behind and now refers to nothing, so remove it. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-10Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block IO poll support from Jens Axboe: "Various groups have been doing experimentation around IO polling for (really) fast devices. The code has been reviewed and has been sitting on the side for a few releases, but this is now good enough for coordinated benchmarking and further experimentation. Currently O_DIRECT sync read/write are supported. A framework is in the works that allows scalable stats tracking so we can auto-tune this. And we'll add libaio support as well soon. Fow now, it's an opt-in feature for test purposes" * 'for-4.4/io-poll' of git://git.kernel.dk/linux-block: direct-io: be sure to assign dio->bio_bdev for both paths directio: add block polling support NVMe: add blk polling support block: add block polling support blk-mq: return tag/queue combo in the make_request_fn handlers block: change ->make_request_fn() and users to return a queue cookie
2015-11-07block: change ->make_request_fn() and users to return a queue cookieJens Axboe
No functional changes in this patch, but it prepares us for returning a more useful cookie related to the IO that was queued up. Signed-off-by: Jens Axboe <axboe@fb.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com>
2015-11-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge patch-bomb from Andrew Morton: - inotify tweaks - some ocfs2 updates (many more are awaiting review) - various misc bits - kernel/watchdog.c updates - Some of mm. I have a huge number of MM patches this time and quite a lot of it is quite difficult and much will be held over to next time. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits) selftests: vm: add tests for lock on fault mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage mm: introduce VM_LOCKONFAULT mm: mlock: add new mlock system call mm: mlock: refactor mlock, munlock, and munlockall code kasan: always taint kernel on report mm, slub, kasan: enable user tracking by default with KASAN=y kasan: use IS_ALIGNED in memory_is_poisoned_8() kasan: Fix a type conversion error lib: test_kasan: add some testcases kasan: update reference to kasan prototype repo kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile kasan: various fixes in documentation kasan: update log messages kasan: accurately determine the type of the bad access kasan: update reported bug types for kernel memory accesses kasan: update reported bug types for not user nor kernel memory accesses mm/kasan: prevent deadlock in kasan reporting mm/kasan: don't use kasan shadow pointer in generic functions mm/kasan: MODULE_VADDR is not available on all archs ...
2015-11-05mm/filemap.c: make global sync not clear error status of individual inodesJunichi Nomura
filemap_fdatawait() is a function to wait for on-going writeback to complete but also consume and clear error status of the mapping set during writeback. The latter functionality is critical for applications to detect writeback error with system calls like fsync(2)/fdatasync(2). However filemap_fdatawait() is also used by sync(2) or FIFREEZE ioctl, which don't check error status of individual mappings. As a result, fsync() may not be able to detect writeback error if events happen in the following order: Application System admin ---------------------------------------------------------- write data on page cache Run sync command writeback completes with error filemap_fdatawait() clears error fsync returns success (but the data is not on disk) This patch adds filemap_fdatawait_keep_errors() for call sites where writeback error is not handled so that they don't clear error status. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: Fengguang Wu <fengguang.wu@gmail.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-22locks: cleanup posix_lock_inode_wait and flock_lock_inode_waitBenjamin Coddington
All callers use locks_lock_inode_wait() instead. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-10-22locks: introduce locks_lock_inode_wait()Benjamin Coddington
Users of the locks API commonly call either posix_lock_file_wait() or flock_lock_file_wait() depending upon the lock type. Add a new function locks_lock_inode_wait() which will check and call the correct function for the type of lock passed in. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>