summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-11-23f2fs: fix to release discard entries during checkpointChao Yu
In f2fs_fill_super, if there is any IO error occurs during recovery, cached discard entries will be leaked, in order to avoid this, make write_checkpoint() handle memory release by itself, besides, move clear_prefree_segments to write_checkpoint for readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23f2fs: exclude free nids building and allocationChao Yu
During nid allocation, it needs to exclude building and allocating flow of free nids, this is because while building free nid cache, there are two steps: a) load free nids from unused nat entries in NAT pages, b) update free nid cache by checking nat journal. The two steps should be atomical, otherwise an used nid can be allocated as free one after a) and before b). This patch adds missing lock which covers build_free_nids in unlock_operation and f2fs_balance_fs_bg to avoid that. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23f2fs: fix overflow due to condition check orderJaegeuk Kim
In the last ilen case, i was already increased, resulting in accessing out- of-boundary entry of do_replace and blkaddr. Fix to check ilen first to exit the loop. Fixes: 2aa8fbb9693020 ("f2fs: refactor __exchange_data_block for speed up") Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-22ext4: Add select for CONFIG_FS_IOMAPJan Kara
When ext4 is compiled with DAX support, it now needs the iomap code. Add appropriate select to Kconfig. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-22NFSv4.x: hide array-bounds warningArnd Bergmann
A correct bugfix introduced a harmless warning that shows up with gcc-7: fs/nfs/callback.c: In function 'nfs_callback_up': fs/nfs/callback.c:214:14: error: array subscript is outside array bounds [-Werror=array-bounds] What happens here is that the 'minorversion == 0' check tells the compiler that we assume minorversion can be something other than 0, but when CONFIG_NFS_V4_1 is disabled that would be invalid and result in an out-of-bounds access. The added check for IS_ENABLED(CONFIG_NFS_V4_1) tells gcc that this really can't happen, which makes the code slightly smaller and also avoids the warning. The bugfix that introduced the warning is marked for stable backports, we want this one backported to the same releases. Fixes: 98b0f80c2396 ("NFSv4.x: Fix a refcount leak in nfs_callback_up_net") Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All conflicts were simple overlapping changes except perhaps for the Thunder driver. That driver has a change_mtu method explicitly for sending a message to the hardware. If that fails it returns an error. Normally a driver doesn't need an ndo_change_mtu method becuase those are usually just range changes, which are now handled generically. But since this extra operation is needed in the Thunder driver, it has to stay. However, if the message send fails we have to restore the original MTU before the change because the entire call chain expects that if an error is thrown by ndo_change_mtu then the MTU did not change. Therefore code is added to nicvf_change_mtu to remember the original MTU, and to restore it upon nicvf_update_hw_max_frs() failue. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22fs: logfs: remove unnecesary checkMing Lei
The check on bio->bi_vcnt doesn't make sense in erase_end_io(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22fs: logfs: use bio_add_page() in do_erase()Ming Lei
Also code gets simplified a bit. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22fs: logfs: use bio_add_page() in __bdev_writeseg()Ming Lei
Also this patch simplify the code a bit. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22fs: logfs: convert to bio_add_page() in sync_request()Ming Lei
Always bio_add_page() is the standard and preferred way to do the task. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22block: bio: pass bvec table to bio_init()Ming Lei
Some drivers often use external bvec table, so introduce this helper for this case. It is always safe to access the bio->bi_io_vec in this way for this case. After converting to this usage, it will becomes a bit easier to evaluate the remaining direct access to bio->bi_io_vec, so it can help to prepare for the following multipage bvec support. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixed up the new O_DIRECT cases. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22block_dev: get rid of blksize bits calculationJens Axboe
We store the bits in the bdev sector size locally, but we don't use the calculation anymore. All we do with it is shift it back up to the bdev sector size. So let's just use that directly and kill the variable and bits calculation. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22block_dev: Fixed direct I/O bio sector calculationDamien Le Moal
A direct I/O alignment must be always checked against the device blocks size, but the I/O offset (bio->bi_iter.bi_sector must always use 512B sector unit, and not the actual logical block size. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-21NFSv4.1: Keep a reference on lock states while checkingBenjamin Coddington
While walking the list of lock_states, keep a reference on each nfs4_lock_state to be checked, otherwise the lock state could be removed while the check performs TEST_STATEID and possible FREE_STATEID. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-21ext4: avoid lockdep warning when inheriting encryption contextEric Biggers
On a lockdep-enabled kernel, xfstests generic/027 fails due to a lockdep warning when run on ext4 mounted with -o test_dummy_encryption: xfs_io/4594 is trying to acquire lock: (jbd2_handle ){++++.+}, at: [<ffffffff813096ef>] jbd2_log_wait_commit+0x5/0x11b but task is already holding lock: (jbd2_handle ){++++.+}, at: [<ffffffff813000de>] start_this_handle+0x354/0x3d8 The abbreviated call stack is: [<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b [<ffffffff8130972a>] jbd2_log_wait_commit+0x40/0x11b [<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b [<ffffffff8130987b>] ? __jbd2_journal_force_commit+0x76/0xa6 [<ffffffff81309896>] __jbd2_journal_force_commit+0x91/0xa6 [<ffffffff813098b9>] jbd2_journal_force_commit_nested+0xe/0x18 [<ffffffff812a6049>] ext4_should_retry_alloc+0x72/0x79 [<ffffffff812f0c1f>] ext4_xattr_set+0xef/0x11f [<ffffffff812cc35b>] ext4_set_context+0x3a/0x16b [<ffffffff81258123>] fscrypt_inherit_context+0xe3/0x103 [<ffffffff812ab611>] __ext4_new_inode+0x12dc/0x153a [<ffffffff812bd371>] ext4_create+0xb7/0x161 When a file is created in an encrypted directory, ext4_set_context() is called to set an encryption context on the new file. This calls ext4_xattr_set(), which contains a retry loop where the journal is forced to commit if an ENOSPC error is encountered. If the task actually were to wait for the journal to commit in this case, then it would deadlock because a handle remains open from __ext4_new_inode(), so the running transaction can't be committed yet. Fortunately, __jbd2_journal_force_commit() avoids the deadlock by not allowing the running transaction to be committed while the current task has it open. However, the above lockdep warning is still triggered. This was a false positive which was introduced by: 1eaa566d368b: jbd2: track more dependencies on transaction commit Fix the problem by passing the handle through the 'fs_data' argument to ext4_set_context(), then using ext4_xattr_set_handle() instead of ext4_xattr_set(). And in the case where no journal handle is specified and ext4_set_context() has to open one, add an ENOSPC retry loop since in that case it is the outermost transaction. Signed-off-by: Eric Biggers <ebiggers@google.com>
2016-11-21ext4: remove unused function ext4_aligned_io()Ross Zwisler
The last user of ext4_aligned_io() was the DAX path in ext4_direct_IO_write(). This usage was removed by Jan Kara's patch entitled "ext4: Rip out DAX handling from direct IO path". Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20dax: rip out get_block based IO supportJan Kara
No one uses functions using the get_block callback anymore. Rip them out and update documentation. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext2: use iomap_zero_range() for zeroing truncated page in DAX pathJan Kara
Currently the last user of ext2_get_blocks() for DAX inodes was dax_truncate_page(). Convert that to iomap_zero_range() so that all DAX IO uses the iomap path. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: rip out DAX handling from direct IO pathJan Kara
Reads and writes for DAX inodes should no longer end up in direct IO code. Rip out the support and add a warning. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: convert DAX faults to iomap infrastructureJan Kara
Convert DAX faults to use iomap infrastructure. We would not have to start transaction in ext4_dax_fault() anymore since ext4_iomap_begin takes care of that but so far we do that to avoid lock inversion of transaction start with DAX entry lock which gets acquired in dax_iomap_fault() before calling ->iomap_begin handler. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: avoid split extents for DAX writesJan Kara
Currently mapping of blocks for DAX writes happen with EXT4_GET_BLOCKS_PRE_IO flag set. That has a result that each ext4_map_blocks() call creates a separate written extent, although it could be merged to the neighboring extents in the extent tree. The reason for using this flag is that in case the extent is unwritten, we need to convert it to written one and zero it out. However this "convert mapped range to written" operation is already implemented by ext4_map_blocks() for the case of data writes into unwritten extent. So just use flags for that mode of operation, simplify the code, and avoid unnecessary split extents. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: DAX iomap write supportJan Kara
Implement DAX writes using the new iomap infrastructure instead of overloading the direct IO path. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: use iomap for zeroing blocks in DAX modeJan Kara
Use iomap infrastructure for zeroing blocks when in DAX mode. ext4_iomap_begin() handles read requests just fine and that's all that is needed for iomap_zero_range(). Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: convert DAX reads to iomap infrastructureJan Kara
Implement basic iomap_begin function that handles reading and use it for DAX reads. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: only set S_DAX if DAX is really supportedJan Kara
Currently we have S_DAX set inode->i_flags for a regular file whenever ext4 is mounted with dax mount option. However in some cases we cannot really do DAX - e.g. when inode is marked to use data journalling, when inode data is being encrypted, or when inode is stored inline. Make sure S_DAX flag is appropriately set/cleared in these cases. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: factor out checks from ext4_file_write_iter()Jan Kara
Factor out checks of 'from' and whether we are overwriting out of ext4_file_write_iter() so that the function is easier to follow. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-19Merge tag 'ext4_for_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A security fix (so a maliciously corrupted file system image won't panic the kernel) and some fixes for CONFIG_VMAP_STACK" * tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: sanity check the block and cluster size at mount time fscrypto: don't use on-stack buffer for key derivation fscrypto: don't use on-stack buffer for filename encryption
2016-11-19ext4: sanity check the block and cluster size at mount timeTheodore Ts'o
If the block size or cluster size is insane, reject the mount. This is important for security reasons (although we shouldn't be just depending on this check). Ref: http://www.securityfocus.com/archive/1/539661 Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506 Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-11-19fscrypto: don't use on-stack buffer for key derivationEric Biggers
With the new (in 4.9) option to use a virtually-mapped stack (CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for the scatterlist crypto API because they may not be directly mappable to struct page. get_crypt_info() was using a stack buffer to hold the output from the encryption operation used to derive the per-file key. Fix it by using a heap buffer. This bug could most easily be observed in a CONFIG_DEBUG_SG kernel because this allowed the BUG in sg_set_buf() to be triggered. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-19fscrypto: don't use on-stack buffer for filename encryptionEric Biggers
With the new (in 4.9) option to use a virtually-mapped stack (CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for the scatterlist crypto API because they may not be directly mappable to struct page. For short filenames, fname_encrypt() was encrypting a stack buffer holding the padded filename. Fix it by encrypting the filename in-place in the output buffer, thereby making the temporary buffer unnecessary. This bug could most easily be observed in a CONFIG_DEBUG_SG kernel because this allowed the BUG in sg_set_buf() to be triggered. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-18Merge tag 'v4.9-rc4' into soundJonathan Corbet
Bring in -rc4 patches so I can successfully merge the sound doc changes.
2016-11-18NFSv4.1: Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_stateBenjamin Coddington
Now that we're doing TEST_STATEID in nfs4_reclaim_open_state(), we can have a NFS4ERR_OLD_STATEID returned from nfs41_open_expired() . Instead of marking state recovery as failed, mark the state for recovery again. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-18NFSv4: Don't call close if the open stateid has already been clearedTrond Myklebust
Ensure we test to see if the open stateid is actually set, before we send a CLOSE. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-18ext4: add sanity checking to count_overhead()Theodore Ts'o
The commit "ext4: sanity check the block and cluster size at mount time" should prevent any problems, but in case the superblock is modified while the file system is mounted, add an extra safety check to make sure we won't overrun the allocated buffer. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-11-18NFSv4: Fix CLOSE races with OPENTrond Myklebust
If the reply to a successful CLOSE call races with an OPEN to the same file, we can end up scribbling over the stateid that represents the new open state. The race looks like: Client Server ====== ====== CLOSE stateid A on file "foo" CLOSE stateid A, return stateid C OPEN file "foo" OPEN "foo", return stateid B Receive reply to OPEN Reset open state for "foo" Associate stateid B to "foo" Receive CLOSE for A Reset open state for "foo" Replace stateid B with C The fix is to examine the argument of the CLOSE, and check for a match with the current stateid "other" field. If the two do not match, then the above race occurred, and we should just ignore the CLOSE. Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-18NFSv4.1: Fix a regression in DELEGRETURNTrond Myklebust
We don't want to call nfs4_free_revoked_stateid() in the case where the delegreturn was successful. Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-18ext4: use more strict checks for inodes_per_block on mountTheodore Ts'o
Centralize the checks for inodes_per_block and be more strict to make sure the inodes_per_block_group can't end up being zero. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org
2016-11-18ext4: fix in-superblock mount options processingTheodore Ts'o
Fix a large number of problems with how we handle mount options in the superblock. For one, if the string in the superblock is long enough that it is not null terminated, we could run off the end of the string and try to interpret superblocks fields as characters. It's unlikely this will cause a security problem, but it could result in an invalid parse. Also, parse_options is destructive to the string, so in some cases if there is a comma-separated string, it would be modified in the superblock. (Fortunately it only happens on file systems with a 1k block size.) Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-11-18ext4: sanity check the block and cluster size at mount timeTheodore Ts'o
If the block size or cluster size is insane, reject the mount. This is important for security reasons (although we shouldn't be just depending on this check). Ref: http://www.securityfocus.com/archive/1/539661 Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506 Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-11-18netns: make struct pernet_operations::id unsigned intAlexey Dobriyan
Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple of regression fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix iov_iter_advance() for ITER_PIPE xattr: Fix setting security xattrs on sockfs
2016-11-17Merge tag 'for-linus-4.9-rc5-ofs-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs fix from Mike Marshall: "orangefs: add .owner to debugfs file_operations Without ".owner = THIS_MODULE" it is possible to crash the kernel by unloading the Orangefs module while someone is reading debugfs files" * tag 'for-linus-4.9-rc5-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: add .owner to debugfs file_operations
2016-11-17block: new direct I/O implementationChristoph Hellwig
Similar to the simple fast path, but we now need a dio structure to track multiple-bio completions. It's basically a cut-down version of the new iomap-based direct I/O code for filesystems, but without all the logic to call into the filesystem for extent lookup or allocation, and without the complex I/O completion workqueue handler for AIO - instead we just use the FUA bit on the bios to ensure data is flushed to stable storage. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-17block: make __blkdev_direct_IO_sync() support O_SYNC/DSYNCJens Axboe
Split the op setting code into a helper, use it in both places. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-17block: support a full bio worth of IO for simplified bdev direct-ioJens Axboe
Just alloc the bio_vec array if we exceed the inline limit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-17block: fast-path for small and simple direct I/O requestsChristoph Hellwig
This patch adds a small and simple fast patch for small direct I/O requests on block devices that don't use AIO. Between the neat bio_iov_iter_get_pages helper that avoids allocating a page array for get_user_pages and the on-stack bio and biovec this avoid memory allocations and atomic operations entirely in the direct I/O code (lower levels might still do memory allocations and will usually have at least some atomic operations, though). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Tested-By: Stephen Bates <sbates@raithlin.com> Reviewed-By: Stephen Bates <sbates@raithlin.com>
2016-11-17xenfs: Use proc_create_mount_point() to create /proc/xenSeth Forshee
Mounting proc in user namespace containers fails if the xenbus filesystem is mounted on /proc/xen because this directory fails the "permanently empty" test. proc_create_mount_point() exists specifically to create such mountpoints in proc but is currently proc-internal. Export this interface to modules, then use it in xenbus when creating /proc/xen. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2016-11-17xattr: Fix setting security xattrs on sockfsAndreas Gruenbacher
The IOP_XATTR flag is set on sockfs because sockfs supports getting the "system.sockprotoname" xattr. Since commit 6c6ef9f2, this flag is checked for setxattr support as well. This is wrong on sockfs because security xattr support there is supposed to be provided by security_inode_setsecurity. The smack security module relies on socket labels (xattrs). Fix this by adding a security xattr handler on sockfs that returns -EAGAIN, and by checking for -EAGAIN in setxattr. We cannot simply check for -EOPNOTSUPP in setxattr because there are filesystems that neither have direct security xattr support nor support via security_inode_setsecurity. A more proper fix might be to move the call to security_inode_setsecurity into sockfs, but it's not clear to me if that is safe: we would end up calling security_inode_post_setxattr after that as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "A regression fix and bug fix bound for stable" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix fuse_write_end() if zero bytes were copied fuse: fix root dentry initialization
2016-11-16orangefs: add .owner to debugfs file_operationsMike Marshall
Without ".owner = THIS_MODULE" it is possible to crash the kernel by unloading the Orangefs module while someone is reading debugfs files. Signed-off-by: Mike Marshall <hubcap@omnibond.com>