summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-11-28nfsd: fix a warning in __cld_pipe_upcall()Scott Mayhew
__cld_pipe_upcall() emits a "do not call blocking ops when !TASK_RUNNING" warning due to the dput() call in rpc_queue_upcall(). Fix it by using a completion instead of hand coding the wait. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-28nfsd4: fix crash on writing v4_end_grace before nfsd startupJ. Bruce Fields
Anatoly Trosinenko reports that this: 1) Checkout fresh master Linux branch (tested with commit e195ca6cb) 2) Copy x84_64-config-4.14 to .config, then enable NFS server v4 and build 3) From `kvm-xfstests shell`: results in NULL dereference in locks_end_grace. Check that nfsd has been started before trying to end the grace period. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-28dax: Don't access a freed inodeMatthew Wilcox
After we drop the i_pages lock, the inode can be freed at any time. The get_unlocked_entry() code has no choice but to reacquire the lock, so it can't be used here. Create a new wait_entry_unlocked() which takes care not to acquire the lock or dereference the address_space in any way. Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()") Cc: <stable@vger.kernel.org> Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-11-28dax: Check page->mapping isn't NULLMatthew Wilcox
If we race with inode destroy, it's possible for page->mapping to be NULL before we even enter this routine, as well as after having slept waiting for the dax entry to become unlocked. Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()") Cc: <stable@vger.kernel.org> Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-11-28Merge tag 'for-4.20-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Some of these bugs are being hit during testing so we'd like to get them merged, otherwise there are usual stability fixes for stable trees" * tag 'for-4.20-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: relocation: set trans to be NULL after ending transaction Btrfs: fix race between enabling quotas and subvolume creation Btrfs: send, fix infinite loop due to directory rename dependencies Btrfs: ensure path name is null terminated at btrfs_control_ioctl Btrfs: fix rare chances for data loss when doing a fast fsync btrfs: Always try all copies when reading extent buffers
2018-11-28cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is activeKiran Kumar Modukuri
[Description] In a heavily loaded system where the system pagecache is nearing memory limits and fscache is enabled, pages can be leaked by fscache while trying read pages from cachefiles backend. This can happen because two applications can be reading same page from a single mount, two threads can be trying to read the backing page at same time. This results in one of the threads finding that a page for the backing file or netfs file is already in the radix tree. During the error handling cachefiles does not clean up the reference on backing page, leading to page leak. [Fix] The fix is straightforward, to decrement the reference when error is encountered. [dhowells: Note that I've removed the clearance and put of newpage as they aren't attested in the commit message and don't appear to actually achieve anything since a new page is only allocated is newpage!=NULL and any residual new page is cleared before returning.] [Testing] I have tested the fix using following method for 12+ hrs. 1) mkdir -p /mnt/nfs ; mount -o vers=3,fsc <server_ip>:/export /mnt/nfs 2) create 10000 files of 2.8MB in a NFS mount. 3) start a thread to simulate heavy VM presssure (while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done)& 4) start multiple parallel reader for data set at same time find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & .. .. find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & find /mnt/nfs -type f | xargs -P 80 cat > /dev/null & 5) finally check using cat /proc/fs/fscache/stats | grep -i pages ; free -h , cat /proc/meminfo and page-types -r -b lru to ensure all pages are freed. Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Shantanu Goel <sgoel01@yahoo.com> Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> [dja: forward ported to current upstream] Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-28dlm: NULL check before kmem_cache_destroy is not neededWen Yang
kmem_cache_destroy(NULL) is safe, so removes NULL check before freeing the mem. This patch also fix ifnullfree.cocci warnings. Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-28cachefiles: Fix an assertion failure when trying to update a failed objectDavid Howells
If cachefiles gets an error other then ENOENT when trying to look up an object in the cache (in this case, EACCES), the object state machine will eventually transition to the DROP_OBJECT state. This state invokes fscache_drop_object() which tries to sync the auxiliary data with the cache (this is done lazily since commit 402cb8dda949d) on an incomplete cache object struct. The problem comes when cachefiles_update_object_xattr() is called to rewrite the xattr holding the data. There's an assertion there that the cache object points to a dentry as we're going to update its xattr. The assertion trips, however, as dentry didn't get set. Fix the problem by skipping the update in cachefiles if the object doesn't refer to a dentry. A better way to do it could be to skip the update from the DROP_OBJECT state handler in fscache, but that might deny the cache the opportunity to update intermediate state. If this error occurs, the kernel log includes lines that look like the following: CacheFiles: Lookup failed error -13 CacheFiles: CacheFiles: Assertion failed ------------[ cut here ]------------ kernel BUG at fs/cachefiles/xattr.c:138! ... Workqueue: fscache_object fscache_object_work_func [fscache] RIP: 0010:cachefiles_update_object_xattr.cold.4+0x18/0x1a [cachefiles] ... Call Trace: cachefiles_update_object+0xdd/0x1c0 [cachefiles] fscache_update_aux_data+0x23/0x30 [fscache] fscache_drop_object+0x18e/0x1c0 [fscache] fscache_object_work_func+0x74/0x2b0 [fscache] process_one_work+0x18d/0x340 worker_thread+0x2e/0x390 ? pwq_unbound_release_workfn+0xd0/0xd0 kthread+0x112/0x130 ? kthread_bind+0x30/0x30 ret_from_fork+0x35/0x40 Note that there are actually two issues here: (1) EACCES happened on a cache object and (2) an oops occurred. I think that the second is a consequence of the first (it certainly looks like it ought to be). This patch only deals with the second. Fixes: 402cb8dda949 ("fscache: Attach the index key and aux data to the cookie") Reported-by: Zhibin Li <zhibli@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-28f2fs: avoid frequent costly fsck triggersJaegeuk Kim
If we want to re-enable nat_bits, we rely on fsck which requires full scan of directory tree. Let's do that by regular fsck or unclean shutdown. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-27lockd: fix decoding of TEST resultsJ. Bruce Fields
We fail to advance the read pointer when reading the stat.oh field that identifies the lock-holder in a TEST result. This turns out not to matter if the server is knfsd, which always returns a zero-length field. But other servers (Ganesha is an example) may not do this. The result is bad values in fcntl F_GETLK results. Fix this. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-27nfsd4: skip unused assignmentJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-27nfsd4: forbid all renames during grace periodJ. Bruce Fields
The idea here was that renaming a file on a nosubtreecheck export would make lookups of the old filehandle return STALE, making it impossible for clients to reclaim opens. But during the grace period I think we should also hold off on operations that would break delegations. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-27nfsd4: remove unused nfs4_check_olstateid parameterJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-27nfsd4: zero-length WRITE should succeedJ. Bruce Fields
Zero-length writes are legal; from 5661 section 18.32.3: "If the count is zero, the WRITE will succeed and return a count of zero subject to permissions checking". This check is unnecessary and is causing zero-length reads to return EINVAL. Cc: stable@vger.kernel.org Fixes: 3fd9557aec91 "NFSD: Refactor the generic write vector fill helper" Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-27fs/file: Replace synchronize_sched() with synchronize_rcu()Paul E. McKenney
Now that synchronize_rcu() waits for preempt-disable regions of code as well as RCU read-side critical sections, synchronize_sched() can be replaced by synchronize_rcu(). This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <linux-fsdevel@vger.kernel.org>
2018-11-27kernfs: Improve kernfs_notify() poll notification latencyRadu Rendec
kernfs_notify() does two notifications: poll and fsnotify. Originally, both notifications were done from scheduled work context and all that kernfs_notify() did was schedule the work. This patch simply moves the poll notification from the scheduled work handler to kernfs_notify(). The fsnotify notification still needs to be done from scheduled work context because it can sleep (it needs to lock a mutex). If the poll notification is time critical (the notified thread needs to wake as quickly as possible), it's better to do it from kernfs_notify() directly. One example is calling sysfs_notify_dirent() from a hardware interrupt handler to wake up a thread and handle the interrupt in user space. Signed-off-by: Radu Rendec <radu.rendec@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-27ext2: fix potential use after freePan Bian
The function ext2_xattr_set calls brelse(bh) to drop the reference count of bh. After that, bh may be freed. However, following brelse(bh), it reads bh->b_data via macro HDR(bh). This may result in a use-after-free bug. This patch moves brelse(bh) after reading field. CC: stable@vger.kernel.org Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-27ext2: initialize opts.s_mount_opt as zero before using itxingaopeng
We need to initialize opts.s_mount_opt as zero before using it, else we may get some unexpected mount options. Fixes: 088519572ca8 ("ext2: Parse mount options into a dedicated structure") CC: stable@vger.kernel.org Signed-off-by: xingaopeng <xingaopeng@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-26f2fs: fix m_may_create to make OPU DIO write correctlyJia Zhu
Previously, we added a parameter @map.m_may_create to trigger OPU allocation and call f2fs_balance_fs() correctly. But in get_more_blocks(), @create has been overwritten by below code. So the function f2fs_map_blocks() will not allocate new block address but directly go out. Meanwile,there are several functions calling f2fs_map_blocks() directly and @map.m_may_create not initialized. CODE: create = dio->op == REQ_OP_WRITE; if (dio->flags & DIO_SKIP_HOLES) { if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> i_blkbits)) create = 0; } This patch fixes it. Signed-off-by: Jia Zhu <zhujia13@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix to update new block address correctly for OPUJia Zhu
Previously, we allocated a new block address for OPU mode in direct_IO. But the new address couldn't be assigned to @map->m_pblk correctly. This patch fix it. Cc: <stable@vger.kernel.org> Fixes: 511f52d02f05 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Jia Zhu <zhujia13@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: adjust trace print in f2fs_get_victim() to cover all pathsSahitya Tummala
Adjust the trace print in f2fs_get_victim() to cover GC done by F2FS_IOC_GARBAGE_COLLECT_RANGE. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix to allow node segment for GC by ioctl pathSahitya Tummala
Allow node type segments also to be GC'd via f2fs ioctl F2FS_IOC_GARBAGE_COLLECT_RANGE. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: make "f2fs_fault_name[]" const char *Alexey Dobriyan
Those strings are immutable. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: read page index before freeingPan Bian
The function truncate_node frees the page with f2fs_put_page. However, the page index is read after that. So, the patch reads the index before freeing the page. Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated") Cc: <stable@vger.kernel.org> Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix wrong return value of f2fs_acl_createTiezhu Yang
When call f2fs_acl_create_masq() failed, the caller f2fs_acl_create() should return -EIO instead of -ENOMEM, this patch makes it consistent with posix_acl_create() which has been fixed in commit beaf226b863a ("posix_acl: don't ignore return value of posix_acl_create_masq()"). Fixes: 83dfe53c185e ("f2fs: fix reference leaks in f2fs_acl_create") Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: avoid build warn of fall_throughJaegeuk Kim
After merging the f2fs tree, today's linux-next build (x86_64_allmodconfig) produced this warning: In file included from fs/f2fs/dir.c:11: fs/f2fs/f2fs.h: In function '__mark_inode_dirty_flag': fs/f2fs/f2fs.h:2388:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (set) ^ fs/f2fs/f2fs.h:2390:2: note: here case FI_DATA_EXIST: ^~~~ Exposed by my use of -Wimplicit-fallthrough Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix race between write_checkpoint and write_beginSheng Yong
The following race could lead to inconsistent SIT bitmap: Task A Task B ====== ====== f2fs_write_checkpoint block_operations f2fs_lock_all down_write(node_change) down_write(node_write) ... sync ... up_write(node_change) f2fs_file_write_iter set_inode_flag(FI_NO_PREALLOC) ...... f2fs_write_begin(index=0, has inline data) prepare_write_begin __do_map_lock(AIO) => down_read(node_change) f2fs_convert_inline_page => update SIT __do_map_lock(AIO) => up_read(node_change) f2fs_flush_sit_entries <= inconsistent SIT finish write checkpoint sudden-power-off If SPO occurs after checkpoint is finished, SIT bitmap will be set incorrectly. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: check memory boundary by insane namelenJaegeuk Kim
If namelen is corrupted to have very long value, fill_dentries can copy wrong memory area. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: only flush the single temp bio cache which owns the target pageYunlong Song
Previously, when f2fs finds which temp bio cache owns the target page, it will flush all the three temp bio caches, but we only need to flush one single bio cache indeed, which can help to keep bio merged. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix out-place-update DIO writeChao Yu
In get_more_blocks(), we may override @create as below code: create = dio->op == REQ_OP_WRITE; if (dio->flags & DIO_SKIP_HOLES) { if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> i_blkbits)) create = 0; } But in f2fs_map_blocks(), we only trigger f2fs_balance_fs() if @create is 1, so in LFS mode, dio overwrite under LFS mode can easily run out of free segments, result in below panic. Call Trace: allocate_segment_by_default+0xa8/0x270 [f2fs] f2fs_allocate_data_block+0x1ea/0x5c0 [f2fs] __allocate_data_block+0x306/0x480 [f2fs] f2fs_map_blocks+0x6f6/0x920 [f2fs] __get_data_block+0x4f/0xb0 [f2fs] get_data_block_dio_write+0x50/0x60 [f2fs] do_blockdev_direct_IO+0xcd5/0x21e0 __blockdev_direct_IO+0x3a/0x3c f2fs_direct_IO+0x1ff/0x4a0 [f2fs] generic_file_direct_write+0xd9/0x160 __generic_file_write_iter+0xbb/0x1e0 f2fs_file_write_iter+0xaf/0x220 [f2fs] __vfs_write+0xd0/0x130 vfs_write+0xb2/0x1b0 SyS_pwrite64+0x69/0xa0 ? vtime_user_exit+0x29/0x70 do_syscall_64+0x6e/0x160 entry_SYSCALL64_slow_path+0x25/0x25 RIP: new_curseg+0x36f/0x380 [f2fs] RSP: ffffac570393f7a8 So this patch introduces a parameter map.m_may_create to indicate that f2fs_map_blocks() is called from write or read path, which can give the right hint to let f2fs_map_blocks() trigger OPU allocation and call f2fs_balanc_fs() correctly. BTW, it disables physical address preallocation for direct IO in f2fs_preallocate_blocks, which is redundant to OPU allocation of f2fs_map_blocks. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix to be aware discard/preflush/dio command in is_idle()Chao Yu
This patch adds missing in-flight discard/preflush/dio command count check in is_idle(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: add to account direct IOChao Yu
This patch adds f2fs_dio_submit_bio() to hook submit_io/end_io functions in direct IO path, in order to account DIO. Later, we will add this count into is_idle() to let background GC/Discard thread be aware of DIO. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: move dir data flush to write checkpoint processYunlei He
This patch move dir data flush to write checkpoint process, by doing this, it may reduce some time for dir fsync. pre: -f2fs_do_sync_file enter -file_write_and_wait_range <- flush & wait -write_checkpoint -do_checkpoint <- wait all -f2fs_do_sync_file exit now: -f2fs_do_sync_file enter -write_checkpoint -block_operations <- flush dir & no wait -do_checkpoint <- wait all -f2fs_do_sync_file exit Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: Change to use DEFINE_SHOW_ATTRIBUTE macroYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: change segment to section in f2fs_ioc_gc_rangeYunlong Song
f2fs_ioc_gc_range skips blocks_per_seg each time, however, f2fs_gc moves blocks of section each time, so fix it from segment to section. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: export migration_granularity sysfs entryChao Yu
Add one sysfs entry to control migration granularity of GC in large section f2fs, it can be tuned to mitigate heavy overhead of migrating huge number of blocks in large section. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: clean up f2fs_sb_has_##feature_nameChao Yu
In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to access .raw_super field, to avoid unneeded pointer conversion, this patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly. Just do cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: support subsectional garbage collectionChao Yu
Section is minimal garbage collection unit of f2fs, in zoned block device, or ancient block mapping flash device, in order to improve GC efficiency, we can align GC unit to lower device erase unit, normally, it consists of multiple of segments. Once background or foreground GC triggers, it brings a large number of IOs, which will impact user IO, and also occupy cpu/memory resource intensively. So, to reduce impact of GC on large size section, this patch supports subsectional GC, in one cycle of GC, it only migrate partial segment{s} in victim section. Currently, by default, we use sbi->segs_per_sec as migration granularity. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: remove codes of unused wio_mutexYunlong Song
Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: introduce __is_large_section() for cleanupChao Yu
Introduce a wrapper __is_large_section() to clean up codes. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix count of seg_freed to make sec_freed correctYunlong Song
When sbi->segs_per_sec > 1, and if some segno has 0 valid blocks before gc starts, do_garbage_collect will skip counting seg_freed++, and this will cause seg_freed < sbi->segs_per_sec and finally skip sec_freed++. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix to account preflush command for noflush_merge modeChao Yu
Previously, we only account preflush command for flush_merge mode, so for noflush_merge mode, we can not know in-flight preflush command count, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: avoid GC causing encrypted file corruptedYunlong Song
The encrypted file may be corrupted by GC in following case: Time 1: | segment 1 blkaddr = A | GC -> | segment 2 blkaddr = B | Encrypted block 1 is moved from blkaddr A of segment 1 to blkaddr B of segment 2, Time 2: | segment 1 blkaddr = B | GC -> | segment 3 blkaddr = C | Before page 1 is written back and if segment 2 become a victim, then page 1 is moved from blkaddr B of segment 2 to blkaddr Cof segment 3, during the GC process of Time 2, f2fs should wait for page 1 written back before reading it, or move_data_block will read a garbage block from blkaddr B since page is not written back to blkaddr B yet. Commit 6aa58d8a ("f2fs: readahead encrypted block during GC") introduce ra_data_block to read encrypted block, but it forgets to add f2fs_wait_on_page_writeback to avoid racing between GC and flush. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26fs/xfs: fix f_ffree value for statfs when project quota is setYe Yin
When project is set, we should use inode limit minus the used count Signed-off-by: Ye Yin <dbyin@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-26block: make blk_poll() take a parameter on whether to spin or notJens Axboe
blk_poll() has always kept spinning until it found an IO. This is fine for SYNC polling, since we need to find one request we have pending, but in preparation for ASYNC polling it can be beneficial to just check if we have any entries available or not. Existing callers are converted to pass in 'spin == true', to retain the old behavior. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-25ext4: add ext4_sb_bread() to disambiguate ENOMEM casesTheodore Ts'o
Today, when sb_bread() returns NULL, this can either be because of an I/O error or because the system failed to allocate the buffer. Since it's an old interface, changing would require changing many call sites. So instead we create our own ext4_sb_bread(), which also allows us to set the REQ_META flag. Also fixed a problem in the xattr code where a NULL return in a function could also mean that the xattr was not found, which could lead to the wrong error getting returned to userspace. Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3") Cc: stable@kernel.org # 2.6.19 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-11-25Merge tag 'nfs-for-4.20-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - Fix a NFSv4 state manager deadlock when returning a delegation - NFSv4.2 copy do not allocate memory under the lock - flexfiles: Use the correct stateid for IO in the tightly coupled case * tag 'nfs-for-4.20-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: flexfiles: use per-mirror specified stateid for IO NFSv4.2 copy do not allocate memory under the lock NFSv4: Fix a NFSv4 state manager deadlock
2018-11-24Merge tag 'xarray-4.20-rc4' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds
Pull XArray updates from Matthew Wilcox: "We found some bugs in the DAX conversion to XArray (and one bug which predated the XArray conversion). There were a couple of bugs in some of the higher-level functions, which aren't actually being called in today's kernel, but surfaced as a result of converting existing radix tree & IDR users over to the XArray. Some of the other changes to how the higher-level APIs work were also motivated by converting various users; again, they're not in use in today's kernel, so changing them has a low probability of introducing a bug. Dan can still trigger a bug in the DAX code with hot-offline/online, and we're working on tracking that down" * tag 'xarray-4.20-rc4' of git://git.infradead.org/users/willy/linux-dax: XArray tests: Add missing locking dax: Avoid losing wakeup in dax_lock_mapping_entry dax: Fix huge page faults dax: Fix dax_unlock_mapping_entry for PMD pages dax: Reinstate RCU protection of inode dax: Make sure the unlocking entry isn't locked dax: Remove optimisation from dax_lock_mapping_entry XArray tests: Correct some 64-bit assumptions XArray: Correct xa_store_range XArray: Fix Documentation XArray: Handle NULL pointers differently for allocation XArray: Unify xa_store and __xa_store XArray: Add xa_store_bh() and xa_store_irq() XArray: Turn xa_erase into an exported function XArray: Unify xa_cmpxchg and __xa_cmpxchg XArray: Regularise xa_reserve nilfs2: Use xa_erase_irq XArray: Export __xa_foo to non-GPL modules XArray: Fix xa_for_each with a single element at 0
2018-11-24Merge tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Dave and I have continued our work fixing corruption problems that can be found when running long-term burn-in exercisers on xfs. Here are some patches fixing most of the problems, but there will likely be more. :/ - Numerous corruption fixes for copy on write - Numerous corruption fixes for blocksize < pagesize writes - Don't miscalculate AG reservations for small final AGs - Fix page cache truncation to work properly for reflink and extent shifting - Fix use-after-free when retrying failed inode/dquot buffer logging - Fix corruptions seen when using copy_file_range in directio mode" * tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: readpages doesn't zero page tail beyond EOF vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP iomap: dio data corruption and spurious errors when pipes fill iomap: sub-block dio needs to zeroout beyond EOF iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents xfs: delalloc -> unwritten COW fork allocation can go wrong xfs: flush removing page cache in xfs_reflink_remap_prep xfs: extent shifting doesn't fully invalidate page cache xfs: finobt AG reserves don't consider last AG can be a runt xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers xfs: uncached buffer tracing needs to print bno xfs: make xfs_file_remap_range() static xfs: fix shared extent data corruption due to missing cow reservation
2018-11-23Merge tag 'pm-4.20-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two issues in the Operating Performance Points (OPP) framework, one cpufreq driver issue, one problem related to the tasks freezer and a few build-related issues in the cpupower utility. Specifics: - Fix tasks freezer deadlock in de_thread() that occurs if one of its sub-threads has been frozen already (Chanho Min). - Avoid registering a platform device by the ti-cpufreq driver on platforms that cannot use it (Dave Gerlach). - Fix a mistake in the ti-opp-supply operating performance points (OPP) driver that caused an incorrect reference voltage to be used and make it adjust the minimum voltage dynamically to avoid hangs or crashes in some cases (Keerthy). - Fix issues related to compiler flags in the cpupower utility and correct a linking problem in it by renaming a file with a duplicate name (Jiri Olsa, Konstantin Khlebnikov)" * tag 'pm-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: exec: make de_thread() freezable cpufreq: ti-cpufreq: Only register platform_device when supported opp: ti-opp-supply: Correct the supply in _get_optimal_vdd_voltage call opp: ti-opp-supply: Dynamically update u_volt_min tools cpupower: Override CFLAGS assignments tools cpupower debug: Allow to use outside build flags tools/power/cpupower: fix compilation with STATIC=true