summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-05-03SMB3: Work around mount failure when using SMB3 dialect to MacsSteve French
Macs send the maximum buffer size in response on ioctl to validate negotiate security information, which causes us to fail the mount as the response buffer is larger than the expected response. Changed ioctl response processing to allow for padding of validate negotiate ioctl response and limit the maximum response size to maximum buffer size. Signed-off-by: Steve French <steve.french@primarydata.com> CC: Stable <stable@vger.kernel.org>
2017-05-03f2fs: fix a mount fail for wrong next_scan_nidYunlei He
-write_checkpoint -do_checkpoint -next_free_nid <--- something wrong with next free nid -f2fs_fill_super -build_node_manager -build_free_nids -get_current_nat_page -__get_meta_page <--- attempt to access beyond end of device Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few misc things - most of MM - KASAN updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) kasan: separate report parts by empty lines kasan: improve double-free report format kasan: print page description after stacks kasan: improve slab object description kasan: change report header kasan: simplify address description logic kasan: change allocation and freeing stack traces headers kasan: unify report headers kasan: introduce helper functions for determining bug type mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page mm: hwpoison: call shake_page() unconditionally mm/swapfile.c: fix swap space leak in error path of swap_free_entries() mm/gup.c: fix access_ok() argument type mm/truncate: avoid pointless cleancache_invalidate_inode() calls. mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty fs/block_dev: always invalidate cleancache in invalidate_bdev() fs: fix data invalidation in the cleancache during direct IO zram: reduce load operation in page_same_filled zram: use zram_free_page instead of open-coded zram: introduce zram data accessor ...
2017-05-03cifs: fix CIFS_IOC_GET_MNT_INFO oopsDavid Disseldorp
An open directory may have a NULL private_data pointer prior to readdir. Fixes: 0de1f4c6f6c0 ("Add way to query server fs info for smb3") Cc: stable@vger.kernel.org Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-03CIFS: fix mapping of SFM_SPACE and SFM_PERIODBjörn Jacke
- trailing space maps to 0xF028 - trailing period maps to 0xF029 This fix corrects the mapping of file names which have a trailing character that would otherwise be illegal (period or space) but is allowed by POSIX. Signed-off-by: Bjoern Jacke <bjacke@samba.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-03fs/block_dev: always invalidate cleancache in invalidate_bdev()Andrey Ryabinin
invalidate_bdev() calls cleancache_invalidate_inode() iff ->nrpages != 0 which doen't make any sense. Make sure that invalidate_bdev() always calls cleancache_invalidate_inode() regardless of mapping->nrpages value. Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache") Link: http://lkml.kernel.org/r/20170424164135.22350-3-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexey Kuznetsov <kuznet@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03fs: fix data invalidation in the cleancache during direct IOAndrey Ryabinin
Patch series "Properly invalidate data in the cleancache", v2. We've noticed that after direct IO write, buffered read sometimes gets stale data which is coming from the cleancache. The reason for this is that some direct write hooks call call invalidate_inode_pages2[_range]() conditionally iff mapping->nrpages is not zero, so we may not invalidate data in the cleancache. Another odd thing is that we check only for ->nrpages and don't check for ->nrexceptional, but invalidate_inode_pages2[_range] also invalidates exceptional entries as well. So we invalidate exceptional entries only if ->nrpages != 0? This doesn't feel right. - Patch 1 fixes direct IO writes by removing ->nrpages check. - Patch 2 fixes similar case in invalidate_bdev(). Note: I only fixed conditional cleancache_invalidate_inode() here. Do we also need to add ->nrexceptional check in into invalidate_bdev()? - Patches 3-4: some optimizations. This patch (of 4): Some direct IO write fs hooks call invalidate_inode_pages2[_range]() conditionally iff mapping->nrpages is not zero. This can't be right, because invalidate_inode_pages2[_range]() also invalidate data in the cleancache via cleancache_invalidate_inode() call. So if page cache is empty but there is some data in the cleancache, buffered read after direct IO write would get stale data from the cleancache. Also it doesn't feel right to check only for ->nrpages because invalidate_inode_pages2[_range] invalidates exceptional entries as well. Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages state. Note: nfs,cifs,9p doesn't need similar fix because the never call cleancache_get_page() (nor directly, nor via mpage_readpage[s]()), so they are not affected by this bug. Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache") Link: http://lkml.kernel.org/r/20170424164135.22350-2-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexey Kuznetsov <kuznet@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03jbd2: make the whole kjournald2 kthread NOFS safeMichal Hocko
kjournald2 is central to the transaction commit processing. As such any potential allocation from this kernel thread has to be GFP_NOFS. Make sure to mark the whole kernel thread GFP_NOFS by the memalloc_nofs_save. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170306131408.9828-8-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03jbd2: mark the transaction context with the scope GFP_NOFS contextMichal Hocko
now that we have memalloc_nofs_{save,restore} api we can mark the whole transaction context as implicitly GFP_NOFS. All allocations will automatically inherit GFP_NOFS this way. This means that we do not have to mark any of those requests with GFP_NOFS and moreover all the ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded GFP_KERNEL allocations deep inside the vmalloc will be NOFS now. [akpm@linux-foundation.org: tweak comments] Link: http://lkml.kernel.org/r/20170306131408.9828-7-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03xfs: use memalloc_nofs_{save,restore} instead of memalloc_noio*Michal Hocko
kmem_zalloc_large and _xfs_buf_map_pages use memalloc_noio_{save,restore} API to prevent from reclaim recursion into the fs because vmalloc can invoke unconditional GFP_KERNEL allocations and these functions might be called from the NOFS contexts. The memalloc_noio_save will enforce GFP_NOIO context which is even weaker than GFP_NOFS and that seems to be unnecessary. Let's use memalloc_nofs_{save,restore} instead as it should provide exactly what we need here - implicit GFP_NOFS context. Link: http://lkml.kernel.org/r/20170306131408.9828-6-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm: introduce memalloc_nofs_{save,restore} APIMichal Hocko
GFP_NOFS context is used for the following 5 reasons currently: - to prevent from deadlocks when the lock held by the allocation context would be needed during the memory reclaim - to prevent from stack overflows during the reclaim because the allocation is performed from a deep context already - to prevent lockups when the allocation context depends on other reclaimers to make a forward progress indirectly - just in case because this would be safe from the fs POV - silence lockdep false positives Unfortunately overuse of this allocation context brings some problems to the MM. Memory reclaim is much weaker (especially during heavy FS metadata workloads), OOM killer cannot be invoked because the MM layer doesn't have enough information about how much memory is freeable by the FS layer. In many cases it is far from clear why the weaker context is even used and so it might be used unnecessarily. We would like to get rid of those as much as possible. One way to do that is to use the flag in scopes rather than isolated cases. Such a scope is declared when really necessary, tracked per task and all the allocation requests from within the context will simply inherit the GFP_NOFS semantic. Not only this is easier to understand and maintain because there are much less problematic contexts than specific allocation requests, this also helps code paths where FS layer interacts with other layers (e.g. crypto, security modules, MM etc...) and there is no easy way to convey the allocation context between the layers. Introduce memalloc_nofs_{save,restore} API to control the scope of GFP_NOFS allocation context. This is basically copying memalloc_noio_{save,restore} API we have for other restricted allocation context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is just an alias for PF_FSTRANS which has been xfs specific until recently. There are no more PF_FSTRANS users anymore so let's just drop it. PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags is renamed to current_gfp_context because it now cares about both PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve their semantic. kmem_flags_convert() doesn't need to evaluate the flag anymore. This patch shouldn't introduce any functional changes. Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS) usage as much as possible and only use a properly documented memalloc_nofs_{save,restore} checkpoints where they are appropriate. [akpm@linux-foundation.org: fix comment typo, reflow comment] Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFSMichal Hocko
xfs has defined PF_FSTRANS to declare a scope GFP_NOFS semantic quite some time ago. We would like to make this concept more generic and use it for other filesystems as well. Let's start by giving the flag a more generic name PF_MEMALLOC_NOFS which is in line with an exiting PF_MEMALLOC_NOIO already used for the same purpose for GFP_NOIO contexts. Replace all PF_FSTRANS usage from the xfs code in the first step before we introduce a full API for it as xfs uses the flag directly anyway. This patch doesn't introduce any functional change. Link: http://lkml.kernel.org/r/20170306131408.9828-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03proc: show MADV_FREE pages info in smapsShaohua Li
Show MADV_FREE pages info of each vma in smaps. The interface is for diganose or monitoring purpose, userspace could use it to understand what happens in the application. Since userspace could dirty MADV_FREE pages without notice from kernel, this interface is the only place we can get accurate accounting info about MADV_FREE pages. [mhocko@kernel.org: update Documentation/filesystems/proc.txt] Link: http://lkml.kernel.org/r/89efde633559de1ec07444f2ef0f4963a97a2ce8.1487965799.git.shli@fb.com Signed-off-by: Shaohua Li <shli@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03fs/ocfs2/cluster: use offset_in_page() macroGeliang Tang
Use offset_in_page() macro instead of open-coding. Link: http://lkml.kernel.org/r/4dbc77ccaaed98b183cf4dba58a4fa325fd65048.1492758503.git.geliangtang@gmail.com Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03ocfs2: o2hb: revert hb threshold to keep compatibleJunxiao Bi
Configfs is the interface for ocfs2-tools to set configure to kernel and $configfs_dir/cluster/$clustername/heartbeat/dead_threshold is the one used to configure heartbeat dead threshold. Kernel has a default value of it but user can set O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb to override it. Commit 45b997737a80 ("ocfs2/cluster: use per-attribute show and store methods") changed heartbeat dead threshold name while ocfs2-tools did not, so ocfs2-tools won't set this configurable and the default value is always used. So revert it. Fixes: 45b997737a80 ("ocfs2/cluster: use per-attribute show and store methods") Link: http://lkml.kernel.org/r/1490665245-15374-1-git-send-email-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03fs/ocfs2/cluster: use setup_timerGeliang Tang
Use setup_timer() instead of init_timer() to simplify the code. Link: http://lkml.kernel.org/r/5e75bf07beb91e092d5aa36c36769949a480456a.1489060564.git.geliangtang@gmail.com Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03f2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGSChao Yu
This patch expands cover region of inode->i_rwsem to keep setting flag atomically. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-03f2fs: Make flush bios explicitely syncJan Kara
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions. Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 Cc: stable@vger.kernel.org # 4.9+ CC: Jaegeuk Kim <jaegeuk@kernel.org> CC: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-03xfs: reserve enough blocks to handle btree splits when remappingDarrick J. Wong
In xfs_reflink_end_cow, we erroneously reserve only enough blocks to handle adding 1 extent. This is problematic if we fragment free space, have to do CoW, and then have to perform multiple bmap btree expansions. Furthermore, the BUI recovery routine doesn't reserve /any/ blocks to handle btree splits, so log recovery fails after our first error causes the filesystem to go down. Therefore, refactor the transaction block reservation macros until we have a macro that works for our deferred (re)mapping activities, and fix both problems by using that macro. With 1k blocks we can hit this fairly often in g/187 if the scratch fs is big enough. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-05-03Merge branch 'generic' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota, reiserfs, udf and ext2 updates from Jan Kara: "The branch contains changes to quota code so that it does not modify persistent flags in inode->i_flags (it was the only place in kernel doing that) and handle it inside filesystem's quotaon/off handlers instead. The branch also contains two UDF cleanups, a couple of reiserfs fixes and one fix for ext2 quota locking" * 'generic' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext4: Improve comments in ext4_quota_{on|off}() udf: use kmap_atomic for memcpy copying udf: use octal for permissions quota: Remove dquot_quotactl_ops reiserfs: Remove i_attrs_to_sd_attrs() reiserfs: Remove useless setting of i_flags jfs: Remove jfs_get_inode_flags() ext2: Remove ext2_get_inode_flags() ext4: Remove ext4_get_inode_flags() quota: Stop setting IMMUTABLE and NOATIME flags on quota files jfs: Set flags on quota files directly ext2: Set flags on quota files directly reiserfs: Set flags on quota files directly ext4: Set flags on quota files directly reiserfs: Protect dquot_writeback_dquots() by s_umount semaphore reiserfs: Make cancel_old_flush() reliable ext2: Call dquot_writeback_dquots() with s_umount held reiserfs: avoid a -Wmaybe-uninitialized warning
2017-05-03Merge branch 'fsnotify' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "The branch contains mainly a rework of fsnotify infrastructure fixing a shortcoming that we have waited for response to fanotify permission events with SRCU read lock held and when the process consuming events was slow to respond the kernel has stalled. It also contains several cleanups of unnecessary indirections in fsnotify framework and a bugfix from Amir fixing leakage of kernel internal errno to userspace" * 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (37 commits) fanotify: don't expose EOPENSTALE to userspace fsnotify: remove a stray unlock fsnotify: Move ->free_mark callback to fsnotify_ops fsnotify: Add group pointer in fsnotify_init_mark() fsnotify: Drop inode_mark.c fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark() fsnotify: Remove fsnotify_detach_group_marks() fsnotify: Rename fsnotify_clear_marks_by_group_flags() fsnotify: Inline fsnotify_clear_{inode|vfsmount}_mark_group() fsnotify: Remove fsnotify_recalc_{inode|vfsmount}_mask() fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked() fanotify: Release SRCU lock when waiting for userspace response fsnotify: Pass fsnotify_iter_info into handle_event handler fsnotify: Provide framework for dropping SRCU lock in ->handle_event fsnotify: Remove special handling of mark destruction on group shutdown fsnotify: Detach mark from object list when last reference is dropped fsnotify: Move queueing of mark for destruction into fsnotify_put_mark() inotify: Do not drop mark reference under idr_lock fsnotify: Free fsnotify_mark_connector when there is no mark attached fsnotify: Lock object list with connector lock ...
2017-05-03f2fs: show available_nids in f2fs/statusJaegeuk Kim
This patch adds an entry in f2fs/status to show # of available nids. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-03f2fs: flush dirty nats periodicallyJaegeuk Kim
This patch flushes dirty nats in order to acquire available nids by writing checkpoint. Otherwise, we can have no chance to get freed nids. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-03f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discardChao Yu
Introduce CP_TRIMMED_FLAG to indicate all invalid block were trimmed before umount, so once we do mount with image which contain the flag, we don't record invalid blocks as undiscard one, when fstrim is being triggered, we can avoid issuing redundant discard commands. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-03f2fs: allow cpc->reason to indicate more than one reasonChao Yu
Change to use different bits of cpc->reason to indicate different status, so cpc->reason can indicate more than one reason. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-03f2fs: release cp and dnode lock before IPUHou Pengyang
We don't need to rewrite the page under cp_rwsem and dnode locks. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-03pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commitsFred Isaman
Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-03Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: IMA: - provide ">" and "<" operators for fowner/uid/euid rules KEYS: - add a system blacklist keyring - add KEYCTL_RESTRICT_KEYRING, exposes keyring link restriction functionality to userland via keyctl() LSM: - harden LSM API with __ro_after_init - add prlmit security hook, implement for SELinux - revive security_task_alloc hook TPM: - implement contextual TPM command 'spaces'" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (98 commits) tpm: Fix reference count to main device tpm_tis: convert to using locality callbacks tpm: fix handling of the TPM 2.0 event logs tpm_crb: remove a cruft constant keys: select CONFIG_CRYPTO when selecting DH / KDF apparmor: Make path_max parameter readonly apparmor: fix parameters so that the permission test is bypassed at boot apparmor: fix invalid reference to index variable of iterator line 836 apparmor: use SHASH_DESC_ON_STACK security/apparmor/lsm.c: set debug messages apparmor: fix boolreturn.cocci warnings Smack: Use GFP_KERNEL for smk_netlbl_mls(). smack: fix double free in smack_parse_opts_str() KEYS: add SP800-56A KDF support for DH KEYS: Keyring asymmetric key restrict method with chaining KEYS: Restrict asymmetric key linkage using a specific keychain KEYS: Add a lookup_restriction function for the asymmetric key type KEYS: Add KEYCTL_RESTRICT_KEYRING KEYS: Consistent ordering for __key_link_begin and restrict check KEYS: Add an optional lookup_restriction hook to key_type ...
2017-05-03fs: don't set *REFERENCED on single use objectsJosef Bacik
By default we set DCACHE_REFERENCED and I_REFERENCED on any dentry or inode we create. This is problematic as this means that it takes two trips through the LRU for any of these objects to be reclaimed, regardless of their actual lifetime. With enough pressure from these caches we can easily evict our working set from page cache with single use objects. So instead only set *REFERENCED if we've already been added to the LRU list. This means that we've been touched since the first time we were accessed, and so more likely to need to hang out in cache. To illustrate this issue I wrote the following scripts https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure on my test box. It is a single socket 4 core CPU with 16gib of RAM and I tested on an Intel 2tib NVME drive. The cache-pressure.sh script creates a new file system and creates 2 6.5gib files in order to take up 13gib of the 16gib of ram with pagecache. Then it runs a test program that reads these 2 files in a loop, and keeps track of how often it has to read bytes for each loop. On an ideal system with no pressure we should have to read 0 bytes indefinitely. The second thing this script does is start a fs_mark job that creates a ton of 0 length files, putting pressure on the system with slab only allocations. On exit the script prints out how many bytes were read by the read-file program. The results are as follows Without patch: /mnt/btrfs-test/reads/file1: total read during loops 27262988288 /mnt/btrfs-test/reads/file2: total read during loops 27262976000 With patch: /mnt/btrfs-test/reads/file2: total read during loops 18640457728 /mnt/btrfs-test/reads/file1: total read during loops 9565376512 This patch results in a 50% reduction of the amount of pages evicted from our working set. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-03CIFS: fix oplock break deadlocksRabin Vincent
When the final cifsFileInfo_put() is called from cifsiod and an oplock break work is queued, lockdep complains loudly: ============================================= [ INFO: possible recursive locking detected ] 4.11.0+ #21 Not tainted --------------------------------------------- kworker/0:2/78 is trying to acquire lock: ("cifsiod"){++++.+}, at: flush_work+0x215/0x350 but task is already holding lock: ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock("cifsiod"); lock("cifsiod"); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by kworker/0:2/78: #0: ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0 #1: ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0 stack backtrace: CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21 Workqueue: cifsiod cifs_writev_complete Call Trace: dump_stack+0x85/0xc2 __lock_acquire+0x17dd/0x2260 ? match_held_lock+0x20/0x2b0 ? trace_hardirqs_off_caller+0x86/0x130 ? mark_lock+0xa6/0x920 lock_acquire+0xcc/0x260 ? lock_acquire+0xcc/0x260 ? flush_work+0x215/0x350 flush_work+0x236/0x350 ? flush_work+0x215/0x350 ? destroy_worker+0x170/0x170 __cancel_work_timer+0x17d/0x210 ? ___preempt_schedule+0x16/0x18 cancel_work_sync+0x10/0x20 cifsFileInfo_put+0x338/0x7f0 cifs_writedata_release+0x2a/0x40 ? cifs_writedata_release+0x2a/0x40 cifs_writev_complete+0x29d/0x850 ? preempt_count_sub+0x18/0xd0 process_one_work+0x304/0x8e0 worker_thread+0x9b/0x6a0 kthread+0x1b2/0x200 ? process_one_work+0x8e0/0x8e0 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x31/0x40 This is a real warning. Since the oplock is queued on the same workqueue this can deadlock if there is only one worker thread active for the workqueue (which will be the case during memory pressure when the rescuer thread is handling it). Furthermore, there is at least one other kind of hang possible due to the oplock break handling if there is only worker. (This can be reproduced without introducing memory pressure by having passing 1 for the max_active parameter of cifsiod.) cifs_oplock_break() can wait indefintely in the filemap_fdatawait() while the cifs_writev_complete() work is blocked: sysrq: SysRq : Show Blocked State task PC stack pid father kworker/0:1 D 0 16 2 0x00000000 Workqueue: cifsiod cifs_oplock_break Call Trace: __schedule+0x562/0xf40 ? mark_held_locks+0x4a/0xb0 schedule+0x57/0xe0 io_schedule+0x21/0x50 wait_on_page_bit+0x143/0x190 ? add_to_page_cache_lru+0x150/0x150 __filemap_fdatawait_range+0x134/0x190 ? do_writepages+0x51/0x70 filemap_fdatawait_range+0x14/0x30 filemap_fdatawait+0x3b/0x40 cifs_oplock_break+0x651/0x710 ? preempt_count_sub+0x18/0xd0 process_one_work+0x304/0x8e0 worker_thread+0x9b/0x6a0 kthread+0x1b2/0x200 ? process_one_work+0x8e0/0x8e0 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x31/0x40 dd D 0 683 171 0x00000000 Call Trace: __schedule+0x562/0xf40 ? mark_held_locks+0x29/0xb0 schedule+0x57/0xe0 io_schedule+0x21/0x50 wait_on_page_bit+0x143/0x190 ? add_to_page_cache_lru+0x150/0x150 __filemap_fdatawait_range+0x134/0x190 ? do_writepages+0x51/0x70 filemap_fdatawait_range+0x14/0x30 filemap_fdatawait+0x3b/0x40 filemap_write_and_wait+0x4e/0x70 cifs_flush+0x6a/0xb0 filp_close+0x52/0xa0 __close_fd+0xdc/0x150 SyS_close+0x33/0x60 entry_SYSCALL_64_fastpath+0x1f/0xbe Showing all locks held in the system: 2 locks held by kworker/0:1/16: #0: ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0 #1: ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0 Showing busy workqueues and worker pools: workqueue cifsiod: flags=0xc pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1 in-flight: 16:cifs_oplock_break delayed: cifs_writev_complete, cifs_echo_request pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3 Fix these problems by creating a a new workqueue (with a rescuer) for the oplock break work. Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2017-05-03cifs: fix CIFS_ENUMERATE_SNAPSHOTS oopsDavid Disseldorp
As with 618763958b22, an open directory may have a NULL private_data pointer prior to readdir. CIFS_ENUMERATE_SNAPSHOTS must check for this before dereference. Fixes: 834170c85978 ("Enable previous version support") Signed-off-by: David Disseldorp <ddiss@suse.de> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-03cifs: fix leak in FSCTL_ENUM_SNAPS response handlingDavid Disseldorp
The server may respond with success, and an output buffer less than sizeof(struct smb_snapshot_array) in length. Do not leak the output buffer in this case. Fixes: 834170c85978 ("Enable previous version support") Signed-off-by: David Disseldorp <ddiss@suse.de> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-02f2fs: shrink size of struct discard_cmdChao Yu
In order to shrink size of struct discard_cmd, change variable type of @state in struct discard_cmd from int to unsigned char. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: don't hold cmd_lock during waiting discard commandChao Yu
Previously, with protection of cmd_lock, we will wait for end io of discard command which potentially may lead long latency, making worse concurrency. So, in this patch, we try to add reference into discard entry to prevent the entry being released by other thread, then we can avoid holding global cmd_lock during waiting discard to finish. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: nullify fio->encrypted_page for each writesJaegeuk Kim
This makes sure each write request has nullified encrypted_page pointer. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: sanity check segment countJin Qian
F2FS uses 4 bytes to represent block address. As a result, supported size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments. Signed-off-by: Jin Qian <jinqian@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: introduce valid_ipu_blkaddr to clean upJaegeuk Kim
This patch introduces valid_ipu_blkaddr to clean up checking block address for inplace-update. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: lookup extent cache first under IPU scenarioHou Pengyang
If a page is cold, NOT atomit written and need_ipu now, there is a high probability that IPU should be adapted. For IPU, we try to check extent tree to get the block index first, instead of reading the dnode page, where may lead to an useless dnode IO, since no need to update the dnode index for IPU. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: reconstruct code to write a data pageHou Pengyang
This patch introduces encrypt_one_page which encrypts one data page before submit_bio, and change the use of need_inplace_update. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: introduce __wait_discard_cmdChao Yu
Just cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: introduce __issue_discard_cmdChao Yu
Just cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: tty: fix comment for __tty_alloc_driver() init/main: properly align the multi-line comment init/main: Fix double "the" in comment Fix dead URLs to ftp.kernel.org drivers: Clean up duplicated email address treewide: Fix typo in xml/driver-api/basics.xml tools/testing/selftests/powerpc: remove redundant CFLAGS in Makefile: "-Wall -O2 -Wall" -> "-O2 -Wall" selftests/timers: Spelling s/privledges/privileges/ HID: picoLCD: Spelling s/REPORT_WRTIE_MEMORY/REPORT_WRITE_MEMORY/ net: phy: dp83848: Fix Typo UBI: Fix typos Documentation: ftrace.txt: Correct nice value of 120 priority net: fec: Fix typo in error msg and comment treewide: Fix typos in printk
2017-05-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatch updates from Jiri Kosina: - a per-task consistency model is being added for architectures that support reliable stack dumping (extending this, currently rather trivial set, is currently in the works). This extends the nature of the types of patches that can be applied by live patching infrastructure. The code stems from the design proposal made [1] back in November 2014. It's a hybrid of SUSE's kGraft and RH's kpatch, combining advantages of both: it uses kGraft's per-task consistency and syscall barrier switching combined with kpatch's stack trace switching. There are also a number of fallback options which make it quite flexible. Most of the heavy lifting done by Josh Poimboeuf with help from Miroslav Benes and Petr Mladek [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz - module load time patch optimization from Zhou Chengming - a few assorted small fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add missing printk newlines livepatch: Cancel transition a safe way for immediate patches livepatch: Reduce the time of finding module symbols livepatch: make klp_mutex proper part of API livepatch: allow removal of a disabled patch livepatch: add /proc/<pid>/patch_state livepatch: change to a per-task consistency model livepatch: store function sizes livepatch: use kstrtobool() in enabled_store() livepatch: move patching functions into patch.c livepatch: remove unnecessary object loaded check livepatch: separate enabled and patched states livepatch/s390: add TIF_PATCH_PENDING thread flag livepatch/s390: reorganize TIF thread flag bits livepatch/powerpc: add TIF_PATCH_PENDING thread flag livepatch/x86: add TIF_PATCH_PENDING thread flag livepatch: create temporary klp_update_patch_state() stub x86/entry: define _TIF_ALLWORK_MASK flags explicitly stacktrace/x86: add function for detecting reliable stack traces
2017-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Millar: "Here are some highlights from the 2065 networking commits that happened this development cycle: 1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri) 2) Add a generic XDP driver, so that anyone can test XDP even if they lack a networking device whose driver has explicit XDP support (me). 3) Sparc64 now has an eBPF JIT too (me) 4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei Starovoitov) 5) Make netfitler network namespace teardown less expensive (Florian Westphal) 6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana) 7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger) 8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky) 9) Multiqueue support in stmmac driver (Joao Pinto) 10) Remove TCP timewait recycling, it never really could possibly work well in the real world and timestamp randomization really zaps any hint of usability this feature had (Soheil Hassas Yeganeh) 11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay Aleksandrov) 12) Add socket busy poll support to epoll (Sridhar Samudrala) 13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso, and several others) 14) IPSEC hw offload infrastructure (Steffen Klassert)" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits) tipc: refactor function tipc_sk_recv_stream() tipc: refactor function tipc_sk_recvmsg() net: thunderx: Optimize page recycling for XDP net: thunderx: Support for XDP header adjustment net: thunderx: Add support for XDP_TX net: thunderx: Add support for XDP_DROP net: thunderx: Add basic XDP support net: thunderx: Cleanup receive buffer allocation net: thunderx: Optimize CQE_TX handling net: thunderx: Optimize RBDR descriptor handling net: thunderx: Support for page recycling ipx: call ipxitf_put() in ioctl error path net: sched: add helpers to handle extended actions qed*: Fix issues in the ptp filter config implementation. qede: Fix concurrency issue in PTP Tx path processing. stmmac: Add support for SIMATIC IOT2000 platform net: hns: fix ethtool_get_strings overflow in hns driver tcp: fix wraparound issue in tcp_lp bpf, arm64: fix jit branch offset related to ldimm64 bpf, arm64: implement jiting of BPF_XADD ...
2017-05-02Set unicode flag on cifs echo request to avoid Mac errorSteve French
Mac requires the unicode flag to be set for cifs, even for the smb echo request (which doesn't have strings). Without this Mac rejects the periodic echo requests (when mounting with cifs) that we use to check if server is down Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2017-05-02CIFS: Add asynchronous write support through kernel AIOPavel Shilovsky
This patch adds support to process write calls passed by io_submit() asynchronously. It based on the previously introduced async context that allows to process i/o responses in a separate thread and return the caller immediately for asynchronous calls. This improves writing performance of single threaded applications with increasing of i/o queue depth size. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-02CIFS: Add asynchronous read support through kernel AIOPavel Shilovsky
This patch adds support to process read calls passed by io_submit() asynchronously. It based on the previously introduced async context that allows to process i/o responses in a separate thread and return the caller immediately for asynchronous calls. This improves reading performance of single threaded applications with increasing of i/o queue depth size. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-02CIFS: Add asynchronous context to support kernel AIOPavel Shilovsky
Currently the code doesn't recognize asynchronous calls passed by io_submit() and processes all calls synchronously. This is not what kernel AIO expects. This patch introduces a new async context that keeps track of all issued i/o requests and moves a response collecting procedure to a separate thread. This allows to return to a caller immediately for async calls and call iocb->ki_complete() once all requests are completed. For sync calls the current thread simply waits until all requests are completed. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-02cifs: fix IPv6 link local, with scope id, address parsingDaniel N Pettersson
When the IP address is gotten from the UNC, use only the address part of the UNC. Else all after the percent sign in an IPv6 link local address is interpreted as a scope id. This includes the slash and share name. A scope id is expected to be an integer and any trailing characters makes the conversion to integer fail. Example of mount command that fails: mount -i -t cifs //fe80::6a05:caff:fe3e:8ffc%2/test /mnt/t -o sec=none Signed-off-by: Daniel N Pettersson <danielnp@axis.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-02cifs: small underflow in cnvrtDosUnixTm()Dan Carpenter
January is month 1. There is no zero-th month. If someone passes a zero month then it means we read from one space before the start of the total_days_of_prev_months[] array. We may as well also be strict about days as well. Fixes: 1bd5bbcb6531 ("[CIFS] Legacy time handling for Win9x and OS/2 part 1") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <smfrench@gmail.com>