summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-12-05CIFS: Fix a possible double locking of mutex during reconnectPavel Shilovsky
With the current code it is possible to lock a mutex twice when a subsequent reconnects are triggered. On the 1st reconnect we reconnect sessions and tcons and then persistent file handles. If the 2nd reconnect happens during the reconnecting of persistent file handles then the following sequence of calls is observed: cifs_reopen_file -> SMB2_open -> small_smb2_init -> smb2_reconnect -> cifs_reopen_persistent_file_handles -> cifs_reopen_file (again!). So, we are trying to acquire the same cfile->fh_mutex twice which is wrong. Fix this by moving reconnecting of persistent handles to the delayed work (smb2_reconnect_server) and submitting this work every time we reconnect tcon in SMB2 commands handling codepath. This can also lead to corruption of a temporary file list in cifs_reopen_persistent_file_handles() because we can recursively call this function twice. Cc: Stable <stable@vger.kernel.org> # v4.9+ Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-12-05CIFS: Fix a possible memory corruption during reconnectPavel Shilovsky
We can not unlock/lock cifs_tcp_ses_lock while walking through ses and tcon lists because it can corrupt list iterator pointers and a tcon structure can be released if we don't hold an extra reference. Fix it by moving a reconnect process to a separate delayed work and acquiring a reference to every tcon that needs to be reconnected. Also do not send an echo request on newly established connections. CC: Stable <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-12-05f2fs: call sync_fs when f2fs is idleJaegeuk Kim
The sync_fs in f2fs_balance_fs_bg must avoid interrupting current user requests. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-05Revert "f2fs: use percpu_counter for # of dirty pages in inode"Jaegeuk Kim
This reverts commit 1beba1b3a953107c3ff5448ab4e4297db4619c76. The perpcu_counter doesn't provide atomicity in single core and consume more DRAM. That incurs fs_mark test failure due to ENOMEM. Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-05[iov_iter] new primitives - copy_from_iter_full() and friendsAl Viro
copy_from_iter_full(), copy_from_iter_full_nocache() and csum_and_copy_from_iter_full() - counterparts of copy_from_iter() et.al., advancing iterator only in case of successful full copy and returning whether it had been successful or not. Convert some obvious users. *NOTE* - do not blindly assume that something is a good candidate for those unless you are sure that not advancing iov_iter in failure case is the right thing in this case. Anything that does short read/short write kind of stuff (or is in a loop, etc.) is unlikely to be a good one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05CIFS: Fix a possible memory corruption in push locksPavel Shilovsky
If maxBuf is not 0 but less than a size of SMB2 lock structure we can end up with a memory corruption. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-12-05CIFS: Fix missing nls unload in smb2_reconnect()Pavel Shilovsky
Cc: Stable <stable@vger.kernel.org> Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-12-05xfs: optimise CRC updatesDave Chinner
Nick Piggin reported that the CRC overhead in an fsync heavy workload was higher than expected on a Power8 machine. Part of this was to do with the fact that the power8 CRC implementation is not efficient for CRC lengths of less than 512 bytes, and so the way we split the CRCs over the CRC field means a lot of the CRCs are reduced to being less than than optimal size. To optimise this, change the CRC update mechanism to zero the CRC field first, and then compute the CRC in one pass over the buffer and write the result back into the buffer. We can do this safely because anything writing a CRC has exclusive access to the buffer the CRC is being calculated over. We leave the CRC verify code the same - it still splits the CRC calculation - because we do not want read-only operations modifying the underlying buffer. This is because read-only operations may not have an exclusive access to the buffer guaranteed, and so temporary modifications could leak out to to other processes accessing the buffer concurrently. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: make xfs btree stats less hugeDave Chinner
Embedding a switch statement in every btree stats inc/add adds a lot of code overhead to the core btree infrastructure paths. Stats are supposed to be small and lightweight, but the btree stats have become big and bloated as we've added more btrees. It needs fixing because the reflink code will just add more overhead again. Convert the v2 btree stats to arrays instead of independent variables, and instead use the type to index the specific btree array via an enum. This allows us to use array based indexing to update the stats, rather than having to derefence variables specific to the btree type. If we then wrap the xfsstats structure in a union and place uint32_t array beside it, and calculate the correct btree stats array base array index when creating a btree cursor, we can easily access entries in the stats structure without having to switch names based on the btree type. We then replace with the switch statement with a simple set of stats wrapper macros, resulting in a significant simplification of the btree stats code, and: text data bss dec hex filename 48905 144 8 49057 bfa1 fs/xfs/libxfs/xfs_btree.o.old 36793 144 8 36945 9051 fs/xfs/libxfs/xfs_btree.o it reduces the core btree infrastructure code size by close to 25%! Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: don't cap maximum dedupe request lengthDarrick J. Wong
After various discussions on linux-fsdevel, it has been decided that it is not necessary to cap the length of a dedupe request, and that correctly-written userspace client programs will be able to absorb the change. Therefore, remove the length clamping behavior. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: don't allow di_size with high bit setDarrick J. Wong
The on-disk field di_size is used to set i_size, which is a signed integer of loff_t. If the high bit of di_size is set, we'll end up with a negative i_size, which will cause all sorts of problems. Since the VFS won't let us create a file with such length, we should catch them here in the verifier too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: error out if trying to add attrs and anextents > 0Darrick J. Wong
We shouldn't assert if somehow we end up trying to add an attr fork to an inode that apparently already has attr extents because this is an indication of on-disk corruption. Instead, return an error code to userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: don't crash if reading a directory results in an unexpected holeDarrick J. Wong
In xfs_dir3_data_read, we can encounter the situation where err == 0 and *bpp == NULL if the given bno offset happens to be a hole; this leads to a crash if we try to set the buffer type after the _da_read_buf call. Holes can happen due to corrupt or malicious entries in the bmbt data, so be a little more careful when we're handling buffers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: complain if we don't get nextents bmap recordsDarrick J. Wong
When reading into memory all extents of a btree-format inode fork, complain if the number of extents we find is not the same as the number of extents reported in the inode core. This is needed to stop an IO action from accessing the garbage areas of the in-core fork. [dchinner: removed redundant assert] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: check for bogus values in btree block headersDarrick J. Wong
When we're reading a btree block, make sure that what we retrieved matches the owner and level; and has a plausible number of records. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: forbid AG btrees with level == 0Darrick J. Wong
There is no such thing as a zero-level AG btree since even a single-node zero-records btree has one level. Btree cursor constructors read cur_nlevels straight from disk and then access things like cur_bufs[cur_nlevels - 1] which is /really/ bad if cur_nlevels is zero! Therefore, strengthen the verifiers to prevent this possibility. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: several xattr functions can be voidEric Sandeen
There are a handful of xattr functions which now return nothing but zero. They can be made void, chased through calling functions, and error handling etc can be removed. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: handle cow fork in xfs_bmap_trace_exlistEric Sandeen
By inspection, xfs_bmap_trace_exlist isn't handling cow forks, and will trace the data fork instead. Fix this by setting state appropriately if whichfork == XFS_COW_FORK. ()___() < @ @ > | | {o_o} (|) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: pass state not whichfork to trace_xfs_extlistEric Sandeen
When xfs_bmap_trace_exlist called trace_xfs_extlist, it sent in the "whichfork" var instead of the bmap "state" as expected (even though state was already set up for this purpose). As a result, the xfs_bmap_class in tracing code used "whichfork" not state in xfs_iext_state_to_fork(), and got the wrong ifork pointer. It all goes downhill from there, including an ASSERT when ifp_bytes is empty by the time it reaches xfs_iext_get_ext(): XFS: Assertion failed: idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: Move AGI buffer type setting to xfs_read_agiEric Sandeen
We've missed properly setting the buffer type for an AGI transaction in 3 spots now, so just move it into xfs_read_agi() and set it if we are in a transaction to avoid the problem in the future. This is similar to how it is done in i.e. the dir3 and attr3 read functions. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05xfs: set AGI buffer type in xlog_recover_clear_agi_bucketEric Sandeen
xlog_recover_clear_agi_bucket didn't set the type to XFS_BLFT_AGI_BUF, so we got a warning during log replay (or an ASSERT on a debug build). XFS (md0): Unknown buffer type 0! XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1 Fix this, as was done in f19b872b for 2 other locations with the same problem. cc: <stable@vger.kernel.org> # 3.10 to current Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-04NFSv4.1: Don't schedule lease recovery in nfs4_schedule_session_recovery()Trond Myklebust
If the session has an error, then we want to start by recovering the session, as any SEQUENCE we send is going to fail with a session error. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-04NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCETrond Myklebust
In the case where SEQUENCE receives a NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, we just want to report the session as needing recovery, and then we want to retry the operation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-04NFS: Only look at the change attribute cache state in nfs_check_verifierTrond Myklebust
When looking at whether or not our dcache is valid, we really don't care about the general state of the directory attribute cache. Instead, we we only care about the state of the change attribute. This fixes a performance issue when the client is responsible for changing the directory contents; a number of NFSv4 operations will atomically update the directory change attribute, but may not return all the other attributes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-04don't open-code file_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-04NFS: Fix incorrect size revalidation when holding a delegationTrond Myklebust
We should only care about checking the attributes if the page cache is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the NFS_INO_REVAL_FORCED flag is set. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-04NFS: Fix incorrect mapping revalidation when holding a delegationTrond Myklebust
We should only care about checking the attributes if the page cache is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the NFS_INO_REVAL_FORCED flag is set. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03autofs - dont hold spin lock over direct mount expireIan Kent
Commit 7cbdb4a286 altered the autofs indirect mount expire to not hold a spin lock during the expire check. The direct mount expire needs the same treatment because to make autofs expires namespace aware may_umount_tree() needs to to use a similar method to may_umount() when checking if a mount tree is in use. This means may_umount_tree() will end up taking the namespace_sem for the check so the autofs direct mount expire won't be allowed to hold a spin lock over the check. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03autofs - constify misc struct path instancesIan Kent
Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03vfs: remove unused have_submounts() functionIan Kent
Now that path_has_submounts() has been added have_submounts() is no longer used so remove it. Link: http://lkml.kernel.org/r/20161011053428.27645.12310.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03autofs: use path_has_submounts() to fix unreliable have_submount() checksIan Kent
If an automount mount is clone(2)ed into a file system that is propagation private, when it later expires in the originating namespace, subsequent calls to autofs ->d_automount() for that dentry in the original namespace will return ELOOP until the mount is umounted in the cloned namespace. Now that a struct path is available where needed use path_has_submounts() instead of have_submounts() so we don't get false positives when checking if a dentry is a mount point or contains mounts in the current namespace. Link: http://lkml.kernel.org/r/20161011053423.27645.91233.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03autofs: use path_is_mountpoint() to fix unreliable d_mountpoint() checksIan Kent
If an automount mount is clone(2)ed into a file system that is propagation private, when it later expires in the originating namespace, subsequent calls to autofs ->d_automount() for that dentry in the original namespace will return ELOOP until the mount is umounted in the cloned namespace. Now that a struct path is available where needed use path_is_mountpoint() instead of d_mountpoint() so we don't get false positives when checking if a dentry is a mount point in the current namespace. Link: http://lkml.kernel.org/r/20161011053418.27645.15241.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03autofs: change autofs4_wait() to take struct pathIan Kent
In order to use the functions path_is_mountpoint() and path_has_submounts() autofs needs to pass a struct path in several places. Now change autofs4_wait() to take a struct path instead of a struct dentry. Link: http://lkml.kernel.org/r/20161011053413.27645.84666.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03autofs: change autofs4_expire_wait()/do_expire_wait() to take struct pathIan Kent
In order to use the functions path_is_mountpoint() and path_has_submounts() autofs needs to pass a struct path in several places. Start by changing autofs4_expire_wait() and do_expire_wait() to take a struct path instead of a struct dentry. Link: http://lkml.kernel.org/r/20161011053408.27645.40091.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03vfs: add path_has_submounts()Ian Kent
d_mountpoint() can only be used reliably to establish if a dentry is not mounted in any namespace. It isn't aware of the possibility there may be multiple mounts using the given dentry, possibly in a different namespace. Add function, path_has_submounts(), that checks is a struct path contains mounts (or is a mountpoint itself) to handle this case. Link: http://lkml.kernel.org/r/20161011053403.27645.55242.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03vfs: add path_is_mountpoint() helperIan Kent
d_mountpoint() can only be used reliably to establish if a dentry is not mounted in any namespace. It isn't aware of the possibility there may be multiple mounts using a given dentry that may be in a different namespace. Add helper functions, path_is_mountpoint(), that checks if a struct path is a mountpoint for this case. Link: http://lkml.kernel.org/r/20161011053358.27645.9729.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03ext4: remove another test in ext4_alloc_file_blocks()Dan Carpenter
Before commit c3fe493ccdb1 ('ext4: remove unneeded test in ext4_alloc_file_blocks()') then it was possible for "depth" to be -1 but now, it's not possible that it is negative. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03ext4: fix checks for data=ordered and journal_async_commit optionsJan Kara
Combination of data=ordered mode and journal_async_commit mount option is invalid. However the check in parse_options() fails to detect the case where we simply end up defaulting to data=ordered mode and we detect the problem only on remount which triggers hard to understand failure to remount the filesystem. Fix the checking of mount options to take into account also the default mode by moving the check somewhat later in the mount sequence. Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-03mbcache: document that "find" functions only return reusable entriesEric Biggers
mb_cache_entry_find_first() and mb_cache_entry_find_next() only return cache entries with the 'e_reusable' bit set. This should be documented. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03mbcache: use consistent type for entry countEric Biggers
mbcache used several different types to represent the number of entries in the cache. For consistency within mbcache and with the shrinker API, always use unsigned long. This does not change behavior for current mbcache users (ext2 and ext4) since they limit the entry count to a value which easily fits in an int. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03mbcache: remove unnecessary module_get/module_putEric Biggers
When mbcache is built as a module, any modules that use it (ext2 and/or ext4) will depend on its symbols directly, incrementing its reference count. Therefore, there is no need to do module_get/module_put. Also note that since the module_get/module_put were in the mbcache module itself, executing those lines of code was already dependent on another reference to the mbcache module being held. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03pNFS/flexfiles: Support sending layoutstats in layoutreturnTrond Myklebust
Add the ability to send an array of layoutstats entries as part of layoutreturn. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03pNFS/flexfiles: Minor refactoring before adding iostats to layoutreturnTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03NFS: Fix up read of mirror statsTrond Myklebust
Need to lock while reading in order to ensure 64-bit reads are correct. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03pNFS/flexfiles: Clean up layoutstatsTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03pNFS/flexfiles: Refactor encoding of the layoutreturn payloadTrond Myklebust
Add the layout error payload to the flexfiles layoutreturn private data, and set up the encoding mechanisms. This is a refactoring in preparation for adding the layout iostats payload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03mbcache: don't BUG() if entry cache cannot be allocatedEric Biggers
mbcache can be a module that is loaded long after startup, when someone asks to mount an ext2 or ext4 filesystem. Therefore it should not BUG() if kmem_cache_create() fails, but rather just fail the module load. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03mbcache: correctly handle 'e_referenced' bitEric Biggers
mbcache entries have an 'e_referenced' bit which users can set with mb_cache_entry_touch() to indicate that an entry should be given another pass through the LRU list before the shrinker can delete it. However, mb_cache_shrink() actually would, when seeing an e_referenced entry at the front of the list (the least-recently used end), place it right at the front of the list again. The next iteration would then remove the entry from the list and delete it. Consequently, e_referenced had essentially no effect, so ext2/ext4 xattr blocks would sometimes not be reused as often as expected. Fix this by making the shrinker move e_referenced entries to the back of the list rather than the front. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03pNFS: Add a layoutreturn callback to performa layout-private setupTrond Myklebust
Add a callback to allow the flexfiles layout driver to initialise the layout private payload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>