summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-02-26f2fs: introduce f2fs_update_data_blkaddr for cleanupChao Yu
Add a new help f2fs_update_data_blkaddr to clean up redundant codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26f2fs crypto: fix incorrect positioning for GCing encrypted data pageChao Yu
For now, flow of GCing an encrypted data page: 1) try to grab meta page in meta inode's mapping with index of old block address of that data page 2) load data of ciphertext into meta page 3) allocate new block address 4) write the meta page into new block address 5) update block address pointer in direct node page. Other reader/writer will use f2fs_wait_on_encrypted_page_writeback to check and wait on GCed encrypted data cached in meta page writebacked in order to avoid inconsistence among data page cache, meta page cache and data on-disk when updating. However, we will use new block address updated in step 5) as an index to lookup meta page in inner bio buffer. That would be wrong, and we will never find the GCing meta page, since we use the old block address as index of that page in step 1). This patch fixes the issue by adjust the order of step 1) and step 3), and in step 1) grab page with index generated in step 3). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26Orangefs: update orangefs.txtMike Marshall
Al Viro has cleaned up the way ops are processed and waited for, now orangefs.txt has an overview of how it works. Several recent related commits have added to the comments in the code as well. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26Orangefs: code sanitation.Mike Marshall
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26orangefs: remove unused 'diff' functionArnd Bergmann
orangefs contains a helper function to calculate the difference between two timeval structures. We are trying to remove all instances of timespec from the kernel, and this one is not used at all, so let's remove it now. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26orangefs: avoid time conversion functionArnd Bergmann
The new orangefs code uses a helper function to read a time field to its private structures from struct iattr. This will conflict with the move to 64-bit timestamps in the kernel and is generally not necessary. This replaces the conversion with a simple cast to time64_t that shows what is going on. As the orangefs-internal representation already uses 64-bit timestamps, there should be no ambiguity to negative values, and the cast ensures that we treat them as times before 1970 on both 32-bit and 64-bit architectures, rather than times after 2038. This patch keeps that behavior. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26Merge branch 'dev/control-ioctl' into for-chris-4.6David Sterba
2016-02-26Merge branch 'misc-4.6' into for-chris-4.6David Sterba
# Conflicts: # fs/btrfs/file.c
2016-02-26Merge branch 'cleanups-4.6' into for-chris-4.6David Sterba
2016-02-26Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6David Sterba
2016-02-26Merge branch 'foreign/josef/space-updates' into for-chris-4.6David Sterba
2016-02-26Merge branch 'foreign/zhaolei/reada' into for-chris-4.6David Sterba
2016-02-26Merge branch 'foreign/qu/norecovery-v7' into for-chris-4.6David Sterba
2016-02-26Merge branch 'dev/rename-keys' into for-chris-4.6David Sterba
2016-02-26Merge branch 'dev/gfp-flags' into for-chris-4.6David Sterba
2016-02-26Merge branch 'chandan/prep-subpage-blocksize' into for-chris-4.6David Sterba
# Conflicts: # fs/btrfs/file.c
2016-02-26configfs: Replace CURRENT_TIME by current_fs_time()Deepa Dinamani
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-02-25f2fs: fix incorrect upper bound when iterating inode mapping treeChao Yu
1. Inode mapping tree can index page in range of [0, ULONG_MAX], however, in some places, f2fs only search or iterate page in ragne of [0, LONG_MAX], result in miss hitting in page cache. 2. filemap_fdatawait_range accepts range parameters in unit of bytes, so the max range it covers should be [0, LLONG_MAX], if we use [0, LONG_MAX] as range for waiting on writeback, big number of pages will not be covered. This patch corrects above two issues. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-25Fix directory hardlinks from deleted directoriesDavid Woodhouse
When a directory is deleted, we don't take too much care about killing off all the dirents that belong to it — on the basis that on remount, the scan will conclude that the directory is dead anyway. This doesn't work though, when the deleted directory contained a child directory which was moved *out*. In the early stages of the fs build we can then end up with an apparent hard link, with the child directory appearing both in its true location, and as a child of the original directory which are this stage of the mount process we don't *yet* know is defunct. To resolve this, take out the early special-casing of the "directories shall not have hard links" rule in jffs2_build_inode_pass1(), and let the normal nlink processing happen for directories as well as other inodes. Then later in the build process we can set ic->pino_nlink to the parent inode#, as is required for directories during normal operaton, instead of the nlink. And complain only *then* about hard links which are still in evidence even after killing off all the unreachable paths. Reported-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@vger.kernel.org
2016-02-25jffs2: Fix page lock / f->sem deadlockDavid Woodhouse
With this fix, all code paths should now be obtaining the page lock before f->sem. Reported-by: Szabó Tamás <sztomi89@gmail.com> Tested-by: Thomas Betker <thomas.betker@rohde-schwarz.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@vger.kernel.org
2016-02-25Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"Thomas Betker
This reverts commit 5ffd3412ae55 ("jffs2: Fix lock acquisition order bug in jffs2_write_begin"). The commit modified jffs2_write_begin() to remove a deadlock with jffs2_garbage_collect_live(), but this introduced new deadlocks found by multiple users. page_lock() actually has to be called before mutex_lock(&c->alloc_sem) or mutex_lock(&f->sem) because jffs2_write_end() and jffs2_readpage() are called with the page locked, and they acquire c->alloc_sem and f->sem, resp. In other words, the lock order in jffs2_write_begin() was correct, and it is the jffs2_garbage_collect_live() path that has to be changed. Revert the commit to get rid of the new deadlocks, and to clear the way for a better fix of the original deadlock. Reported-by: Deng Chao <deng.chao1@zte.com.cn> Reported-by: Ming Liu <liu.ming50@gmail.com> Reported-by: wangzaiwei <wangzaiwei@top-vision.cn> Signed-off-by: Thomas Betker <thomas.betker@rohde-schwarz.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@vger.kernel.org
2016-02-24orangefs: clean up fill_default_sys_attrsMartin Brandenburg
Size and type are read-only and not in the mask. The times were left unset despite being in the mask. We zero-fill the times since the server will fill them in and we will get the correct time when we fill the inode with getattr. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24orangefs: we never lookup with sym_follow setMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24orangefs: remove vestigial async io codeMartin Brandenburg
I have verified that there is nothing in the userspace daemon version we are implementing this protocol against that ever looks at this field. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24orangefs: use ORANGEFS_NAME_LEN everywhere; remove ORANGEFS_NAME_MAXMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24orangefs: don't d_drop in d_revalidate since the caller willMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24orangefs: free readdir buffer index before the dir_emit loopMartin Brandenburg
We only need it while the service operation is actually in progress since it is only used to co-ordinate the client-core's memory use. The kernel allocates its own space. Also clean up some comments which mislead the reader into thinking the readdir buffers are shared memory. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Assorted fixes - xattr one from this cycle, the rest - stable fodder" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/pnode.c: treat zero mnt_group_id-s as unequal affs_do_readpage_ofs(): just use kmap_atomic() around memcpy() xattr handlers: plug a lock leak in simple_xattr_list fs: allow no_seek_end_llseek to actually seek
2016-02-24Orangefs: code sanitationMike Marshall
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24Orangefs: clean up orangefs_kernel_op_s comments.Mike Marshall
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-23Merge tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Stable bugfixes: - Fix nfs_size_to_loff_t - NFSv4: Fix a dentry leak on alias use Other bugfixes: - Don't schedule a layoutreturn if the layout segment can be freed immediately. - Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode - rpcrdma_bc_receive_call() should init rq_private_buf.len - fix stateid handling for the NFS v4.2 operations - pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page - fix panic in gss_pipe_downcall() in fips mode - Fix a race between layoutget and pnfs_destroy_layout - Fix a race between layoutget and bulk recalls" * tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout auth_gss: fix panic in gss_pipe_downcall() in fips mode pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page nfs4: fix stateid handling for the NFS v4.2 operations NFSv4: Fix a dentry leak on alias use xprtrdma: rpcrdma_bc_receive_call() should init rq_private_buf.len pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode pNFS: Fix pnfs_mark_matching_lsegs_return() nfs: fix nfs_size_to_loff_t
2016-02-23f2fs: avoid hungtask problem caused by losing wake_upYunlei He
The D state of wait_on_all_pages_writeback should be waken by function f2fs_write_end_io when all writeback pages have been succesfully written to device. It's possible that wake_up comes between get_pages and io_schedule. Maybe in this case it will lost wake_up and still in D state even if all pages have been write back to device, and finally, the whole system will be into the hungtask state. if (!get_pages(sbi, F2FS_WRITEBACK)) break; <--------- wake_up io_schedule(); Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Biao He <hebiao6@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-23Btrfs: fix lockdep deadlock warning due to dev_replaceLiu Bo
Xfstests btrfs/011 complains about a deadlock warning, [ 1226.649039] ========================================================= [ 1226.649039] [ INFO: possible irq lock inversion dependency detected ] [ 1226.649039] 4.1.0+ #270 Not tainted [ 1226.649039] --------------------------------------------------------- [ 1226.652955] kswapd0/46 just changed the state of lock: [ 1226.652955] (&delayed_node->mutex){+.+.-.}, at: [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0 [ 1226.652955] but this lock took another, RECLAIM_FS-unsafe lock in the past: [ 1226.652955] (&fs_info->dev_replace.lock){+.+.+.} and interrupts could create inverse lock ordering between them. [ 1226.652955] other info that might help us debug this: [ 1226.652955] Chain exists of: &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock [ 1226.652955] Possible interrupt unsafe locking scenario: [ 1226.652955] CPU0 CPU1 [ 1226.652955] ---- ---- [ 1226.652955] lock(&fs_info->dev_replace.lock); [ 1226.652955] local_irq_disable(); [ 1226.652955] lock(&delayed_node->mutex); [ 1226.652955] lock(&found->groups_sem); [ 1226.652955] <Interrupt> [ 1226.652955] lock(&delayed_node->mutex); [ 1226.652955] *** DEADLOCK *** Commit 084b6e7c7607 ("btrfs: Fix a lockdep warning when running xfstest.") tried to fix a similar one that has the exactly same warning, but with that, we still run to this. The above lock chain comes from btrfs_commit_transaction ->btrfs_run_delayed_items ... ->__btrfs_update_delayed_inode ... ->__btrfs_cow_block ... ->find_free_extent ->cache_block_group ->load_free_space_cache ->btrfs_readpages ->submit_one_bio ... ->__btrfs_map_block ->btrfs_dev_replace_lock However, with high memory pressure, tasks which hold dev_replace.lock can be interrupted by kswapd and then kswapd is intended to release memory occupied by superblock, inodes and dentries, where we may call evict_inode, and it comes to [ 1226.652955] [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0 [ 1226.652955] [<ffffffff81459e74>] btrfs_remove_delayed_node+0x24/0x30 [ 1226.652955] [<ffffffff8140c5fe>] btrfs_evict_inode+0x34e/0x700 delayed_node->mutex may be acquired in __btrfs_release_delayed_node(), and it leads to a ABBA deadlock. To fix this, we can use "blocking rwlock" used in the case of extent_buffer, but things are simpler here since we only needs read's spinlock to blocking lock. With this, btrfs/011 no more produces warnings in dmesg. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-23btrfs: drop unused argument in btrfs_ioctl_get_supported_featuresDavid Sterba
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-23btrfs: add GET_SUPPORTED_FEATURES to the control device ioctlsDavid Sterba
The control device is accessible when no filesystem is mounted and we may want to query features supported by the module. This is already possible using the sysfs files, this ioctl is for parity and convenience. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-23btrfs: change max_inline default to 2048David Sterba
The current practical default is ~4k on x86_64 (the logic is more complex, simplified for brevity), the inlined files land in the metadata group and thus consume space that could be needed for the real metadata. The inlining brings some usability surprises: 1) total space consumption measured on various filesystems and btrfs with DUP metadata was quite visible because of the duplicated data within metadata 2) inlined data may exhaust the metadata, which are more precious in case the entire device space is allocated to chunks (ie. balance cannot make the space more compact) 3) performance suffers a bit as the inlined blocks are duplicate and stored far away on the device. Proposed fix: set the default to 2048 This fixes namely 1), the total filesysystem space consumption will be on par with other filesystems. Partially fixes 2), more data are pushed to the data block groups. The characteristics of 3) are based on actual small file size distribution. The change is independent of the metadata blockgroup type (though it's most visible with DUP) or system page size as these parameters are not trival to find out, compared to file size. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-23btrfs: remove error message from search ioctl for nonexistent treeDavid Sterba
Let's remove the error message that appears when the tree_id is not present. This can happen with the quota tree and has been observed in practice. The applications are supposed to handle -ENOENT and we don't need to report that in the system log as it's not a fatal error. Reported-by: Vlastimil Babka <vbabka@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-23btrfs: avoid uninitialized variable warningArnd Bergmann
With CONFIG_SMP and CONFIG_PREEMPT both disabled, gcc decides to partially inline the get_state_failrec() function but cannot figure out that means the failrec pointer is always valid if the function returns success, which causes a harmless warning: fs/btrfs/extent_io.c: In function 'clean_io_failure': fs/btrfs/extent_io.c:2131:4: error: 'failrec' may be used uninitialized in this function [-Werror=maybe-uninitialized] This marks get_state_failrec() and set_state_failrec() both as 'noinline', which avoids the warning in all cases for me, and seems less ugly than adding a fake initialization. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 47dc196ae719 ("btrfs: use proper type for failrec in extent_state") Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-22f2fs: trace old block address for CoWed pageChao Yu
This patch enables to trace old block address of CoWed page for better debugging. f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: try to flush inode after merging inline dataChao Yu
When flushing node pages, if current node page is an inline inode page, we will try to merge inline data from data page into inline inode page, then skip flushing current node page, it will decrease the number of nodes to be flushed in batch in this round, which may lead to worse performance. This patch gives a chance to flush just merged inline inode pages for performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: show more info about superblock recoveryChao Yu
This patch changes to show more info in message log about the recovery of the corrupted superblock during ->mount, e.g. the index of corrupted superblock and the result of recovery. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: fix the wrong stat count of calling gcChao Yu
With a partition which was formated as multi segments in one section, we stated incorrectly for count of gc operation. e.g., for a partition with segs_per_sec = 4 cat /sys/kernel/debug/f2fs/status GC calls: 208 (BG: 7) - data segments : 104 (52) - node segments : 104 (24) GC called count should be (104 (data segs) + 104 (node segs)) / 4 = 52, rather than 208. Fix it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: remain last victim segment number ascending orderJaegeuk Kim
This patch avoids to remain inefficient victim segment number selected by a victim. For example, if all the dirty segments has same valid blocks, we can get the victim segments descending order due to keeping wrong last segment number. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: reuse read_inline_data for f2fs_convert_inline_pageShawn Lin
f2fs_convert_inline_page introduce what read_inline_data already does for copying out the inline data from inode_page. We can use read_inline_data instead to simplify the code. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: fix to delete old dirent in converted inline directory in ->renameChao Yu
When doing test with fstests/generic/068 in inline_dentry enabled f2fs, following oops dmesg will be reported: ------------[ cut here ]------------ WARNING: CPU: 5 PID: 11841 at fs/inode.c:273 drop_nlink+0x49/0x50() Modules linked in: f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state CPU: 5 PID: 11841 Comm: fsstress Tainted: G O 4.5.0-rc1 #45 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013 0000000000000111 ffff88009cdf7ae8 ffffffff813e5944 0000000000002e41 0000000000000000 0000000000000111 0000000000000000 ffff88009cdf7b28 ffffffff8106a587 ffff88009cdf7b58 ffff8804078fe180 ffff880374a64e00 Call Trace: [<ffffffff813e5944>] dump_stack+0x48/0x64 [<ffffffff8106a587>] warn_slowpath_common+0x97/0xe0 [<ffffffff8106a5ea>] warn_slowpath_null+0x1a/0x20 [<ffffffff81231039>] drop_nlink+0x49/0x50 [<ffffffffa07b95b4>] f2fs_rename2+0xe04/0x10c0 [f2fs] [<ffffffff81231ff1>] ? lock_two_nondirectories+0x81/0x90 [<ffffffff813f454d>] ? lockref_get+0x1d/0x30 [<ffffffff81220f70>] vfs_rename+0x2e0/0x640 [<ffffffff8121f9db>] ? lookup_dcache+0x3b/0xd0 [<ffffffff810b8e41>] ? update_fast_ctr+0x21/0x40 [<ffffffff8134ff12>] ? security_path_rename+0xa2/0xd0 [<ffffffff81224af6>] SYSC_renameat2+0x4b6/0x540 [<ffffffff810ba8ed>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff810022ba>] ? exit_to_usermode_loop+0x7a/0xd0 [<ffffffff817e0ade>] ? int_ret_from_sys_call+0x52/0x9f [<ffffffff810bdc90>] ? trace_hardirqs_on_caller+0x100/0x1c0 [<ffffffff81224b8e>] SyS_renameat2+0xe/0x10 [<ffffffff8121f08e>] SyS_rename+0x1e/0x20 [<ffffffff817e0957>] entry_SYSCALL_64_fastpath+0x12/0x6f ---[ end trace 2b31e17995404e42 ]--- This is because: in the same inline directory, when we renaming one file from source name to target name which is not existed, once space of inline dentry is not enough, inline conversion will be triggered, after that all data in inline dentry will be moved to normal dentry page. After attaching the new entry in coverted dentry page, still we try to remove old entry in original inline dentry, since old entry has been moved, so it obviously doesn't make any effect, result in remaining old entry in converted dentry page. Now, we have two valid dentries pointed to the same inode which has nlink value of 1, deleting them both, above warning appears. This issue can be reproduced easily as below steps: 1. mount f2fs with inline_dentry option 2. mkdir dir 3. touch 180 files named [001-180] in dir 4. rename dir/180 dir/181 5. rm dir/180 dir/181 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: detect error of update_dent_inode in ->renameChao Yu
Should check and show correct return value of update_dent_inode in ->rename. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: move sanity checking of cp into get_valid_checkpointShawn Lin
>From the function name of get_valid_checkpoint, it seems to return the valid cp or NULL for caller to check. If no valid one is found, f2fs_fill_super will print the err log. But if get_valid_checkpoint get one valid(the return value indicate that it's valid, however actually it is invalid after sanity checking), then print another similar err log. That seems strange. Let's keep sanity checking inside the procedure of geting valid cp. Another improvement we gained from this move is that even the large volume is supported, we check the cp in advanced to skip the following procedure if failing the sanity checking. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: slightly reorganize read_raw_super_blockShawn Lin
read_raw_super_block was introduced to help find the first valid superblock. Commit da554e48caab ("f2fs: recovering broken superblock during mount") changed the behaviour to read both of them and check whether need the recovery flag or not. So the comment before this function isn't consistent with what it actually does. Also, the origin code use two tags to round the err cases, which isn't so readable. So this patch amend the comment and slightly reorganize it. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: reorder nat cache lock in cache_nat_entryChao Yu
When lookuping nat entry in cache_nat_entry, if we fail to hit nat cache, we try to load nat entries a) from journal of current segment cache or b) from NAT pages for updating, during the process, write lock of nat_tree_lock will be held to avoid inconsistent condition in between nid cache and nat cache caused by racing among nat entry shrinker, checkpointer, nat entry updater. But this way may cause low efficient when updating nat cache, because it serializes accessing in journal cache or reading NAT pages. Here, we reorder lock and update flow as below to enhance accessing concurrency: - get_node_info - down_read(nat_tree_lock) - lookup nat cache --- hit -> unlock & return - lookup journal cache --- hit -> unlock & goto update - up_read(nat_tree_lock) update: - down_write(nat_tree_lock) - cache_nat_entry - lookup nat cache --- nohit -> update - up_write(nat_tree_lock) Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: split journal cache from curseg cacheChao Yu
In curseg cache, f2fs caches two different parts: - datas of current summay block, i.e. summary entries, footer info. - journal info, i.e. sparse nat/sit entries or io stat info. With this approach, 1) it may cause higher lock contention when we access or update both of the parts of cache since we use the same mutex lock curseg_mutex to protect the cache. 2) current summary block with last journal info will be writebacked into device as a normal summary block when flushing, however, we treat journal info as valid one only in current summary, so most normal summary blocks contain junk journal data, it wastes remaining space of summary block. So, in order to fix above issues, we split curseg cache into two parts: a) current summary block, protected by original mutex lock curseg_mutex b) journal cache, protected by newly introduced r/w semaphore journal_rwsem When loading curseg cache during ->mount, we store summary info and journal info into different caches; When doing checkpoint, we combine datas of two cache into current summary block for persisting. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>