Age | Commit message (Collapse) | Author |
|
In preparation for moving the filelayout getdeviceinfo call from
filelayout_alloc_lseg called by pnfs_process_layout
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
URLs to ftp.kernel.org are still exist though the service is closed [0].
This commit fixes the URLs to use www.kernel.org instead.
[0] https://www.kernel.org/shutting-down-ftp-services.html
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Now that nfs_rename()'s d_move has moved within the RPC task's rpc_call_done
callback, rehashing new_dentry will actually rehash the old dentry's name
in nfs_rename(). d_move() is going to rehash the new dentry for us anyway,
so doing it again here is unnecessary.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: 920b4530fb80 ("NFS: nfs_rename() handle -ERESTARTSYS dentry left behind")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
We want the char-misc fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single kernfs fix for 4.11-rc4 that resolves a reported
issue.
It has been in linux-next with no reported issues"
* tag 'driver-core-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a memory leak on an error path, and two races when modifying
inodes relating to the inline_data and metadata checksum features"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix two spelling nits
ext4: lock the xattr block before checksuming it
jbd2: don't leak memory if setting up journal fails
ext4: mark inode dirty after converting inline directory
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypto fixes from Ted Ts'o:
"A code cleanup and bugfix for fs/crypto"
* tag 'fscrypt-for-linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: eliminate ->prepare_context() operation
fscrypt: remove broken support for detecting keyring key revocation
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We must lock the xattr block before calculating or verifying the
checksum in order to avoid spurious checksum failures.
https://bugzilla.kernel.org/show_bug.cgi?id=193661
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
This patch allow write data to normal file when writting
new checkpoint.
We relax three limitations for write_begin path:
1. data allocation
2. node allocation
3. variables in checkpoint
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds busy poll support to epoll. The implementation is meant to
be opportunistic in that it will take the NAPI ID from the last socket
that is added to the ready list that contains a valid NAPI ID and it will
use that for busy polling until the ready list goes empty. Once the ready
list goes empty the NAPI ID is reset and busy polling is disabled until a
new socket is added to the ready list.
In addition when we insert a new socket into the epoll we record the NAPI
ID and assume we are going to receive events on it. If that doesn't occur
it will be evicted as the active NAPI ID and we will resume normal
behavior.
An application can use SO_INCOMING_CPU or SO_REUSEPORT_ATTACH_C/EBPF socket
options to spread the incoming connections to specific worker threads
based on the incoming queue. This enables epoll for each worker thread
to have only sockets that receive packets from a single queue. So when an
application calls epoll_wait() and there are no events available to report,
busy polling is done on the associated queue to pull the packets.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch flips the logic we were using to determine if the busy polling
has timed out. The main motivation for this is that we will need to
support two different possible timeout values in the future and by
recording the start time rather than when we would want to end we can focus
on making the end_time specific to the task be it epoll or socket based
polling.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In this patch, we change xattr block disk layout as below:
Before:
xattr node block layout
+---------------------------------------------+---------------+-------------+
| node block xattr entries | reserved | node footer |
| 4068 Bytes | 4 Bytes | 24 Bytes |
In memory layout
+--------------------+---------------------------------+--------------------+
| inline xattr | node block xattr entries | reserved |
| 200 Bytes | 4068 Bytes | 4 Bytes |
After:
xattr node block layout
+-------------------------------------------------------------+-------------+
| node block xattr entries | node footer |
| 4072 Bytes | 24 Bytes |
In memory layout
+--------------------+---------------------------------+--------------------+
| inline xattr | node block xattr entries | reserved |
| 200 Bytes | 4072 Bytes | 4 Bytes |
With this change, we don't need to reserve additional space in node block,
just keep reserved space in logical in-memory layout. So that it would help
to enlarge valid free space of xattr node block.
As tested, generic/026 shows max stored xattr entires number increases from
531 to 532 when inline_xattr option is enabled.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
1. don't allocate redundant memory in read_all_xattrs.
2. introduce RESERVED_XATTR_SIZE for cleanup.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Don't track volatile file in dirty inode list, otherwise with data_flush
option, background thread will entry into endless loop for flushing
journal file's pages.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds to show the max number of volatile operations which are
conducting concurrently.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In below concurrent case, allocated nid can be loaded into free nid cache
and be allocated again.
Thread A Thread B
- f2fs_create
- f2fs_new_inode
- alloc_nid
- __insert_nid_to_list(ALLOC_NID_LIST)
- f2fs_balance_fs_bg
- build_free_nids
- __build_free_nids
- scan_nat_page
- add_free_nid
- __lookup_nat_cache
- f2fs_add_link
- init_inode_metadata
- new_inode_page
- new_node_page
- set_node_addr
- alloc_nid_done
- __remove_nid_from_list(ALLOC_NID_LIST)
- __insert_nid_to_list(FREE_NID_LIST)
This patch makes nat cache lookup and free nid list operation being atomical
to avoid this race condition.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Use set_page_private marcro instead of operte page struct
directly
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When doing garbage collection, we try to record segment offset which
locates at next one of last victim, using it as the start offset in
next searching.
But in some corner cases, recorded offset may cross the end of main
segment area, it will cause incorrectly searching in dirty_segmap
bitmap. This patch adds modular operation to avoid this issue.
Reported-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Zygo tracked down a very old bug with inline compressed extents.
I didn't tag this one for stable because I want to do individual
tested backports. It's a little tricky and I'd rather do some extra
testing on it along the way"
* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: add missing memset while reading compressed inline extents
Btrfs: fix regression in lock_delalloc_pages
btrfs: remove btrfs_err_str function from uapi/linux/btrfs.h
|
|
The latest gcc-7.0.1 snapshot warns about an unintialized variable use:
In file included from fs/reiserfs/lbalance.c:8:0:
fs/reiserfs/lbalance.c: In function 'leaf_item_bottle.isra.3':
fs/reiserfs/reiserfs.h:1279:13: error: '*((void *)&n_ih+8).v' may be used uninitialized in this function [-Werror=maybe-uninitialized]
v2->v = (v2->v & cpu_to_le64(15ULL << 60)) | cpu_to_le64(offset);
~~^~~
fs/reiserfs/reiserfs.h:1279:13: error: '*((void *)&n_ih+8).v' may be used uninitialized in this function [-Werror=maybe-uninitialized]
v2->v = (v2->v & cpu_to_le64(15ULL << 60)) | cpu_to_le64(offset);
This happens because the offset/type pair that is stored in
ih.key.u.k_offset_v2 is actually uninitialized when we call
set_le_ih_k_offset() and set_le_ih_k_type(). After we have called both,
all data is correct, but the first of the two reads uninitialized data
for the type field and writes it back before it gets overwritten.
This works around the warning by initializing the k_offset_v2 through
the slightly larger memcpy().
[JK: Remove now unused define and make it obvious we initialize the key]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
When block device is closed, we call inode_detach_wb() in __blkdev_put()
which sets inode->i_wb to NULL. That is contrary to expectations that
inode->i_wb stays valid once set during the whole inode's lifetime and
leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because
inode_to_wb() returned NULL.
The reason why we called inode_detach_wb() is not valid anymore though.
BDI is guaranteed to stay along until we call bdi_put() from
bdev_evict_inode() so we can postpone calling inode_detach_wb() to that
moment.
Also add a warning to catch if someone uses inode_detach_wb() in a
dangerous way.
Reported-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When disk->fops->open() in __blkdev_get() returns -ERESTARTSYS, we
restart the process of opening the block device. However we forget to
switch bdev->bd_bdi back to noop_backing_dev_info and as a result bdev
inode will be pointing to a stale bdi. Fix the problem by setting
bdev->bd_bdi later when __blkdev_get() is already guaranteed to succeed.
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The memory size of f2fs_stat_info also should be calculated.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
After filemap_write_and_wait_range fail, the FI_ATOMIC_FILE flags is removed,
so that f2fs should not increase the stat of atomic_write.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The crc_offset towards or beyond the end of block is wrong,
sanity check it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As discuss with Jaegeuk and Chao,
"Once checkpoint is done, f2fs doesn't need to update there-in filename at all."
The disk-level filename is used only one case,
1. create a file A under a dir
2. sync A
3. godown
4. umount
5. mount (roll_forward)
Only the rename/cross_rename changes the filename, if it happens,
a. between step 1 and 2, the sync A will caused checkpoint, so that,
the roll_forward at step 5 never happens.
b. after step 2, the roll_forward happens, file A will roll forward
to the result as after step 1.
So that, any updating the disk filename is useless, just cleanup it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
free_nid_bitmap and free_nid_count in update_free_nid_bitmap should be
updated atomically, use nid_list_lock cover them to avoid race in
concurrent scenario.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
For f2fs_read_data_pages, the f2fs_mpage_readpages gets "page == NULL",
so that, the prefetchw(&page->flags) is operated on NULL.
Fixes: f1e8866016 ("f2fs: expose f2fs_mpage_readpages")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Clear FI_DATA_EXIST flag atomically in truncate_inline_inode, and
the return value from truncate_inline_inode isn't used, remove it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It's needless of mnt_want_write_file for arguments checking.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The inode_newsize_ok is better than only checking the maxbytes,
eg. the rlimit etc.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If move file range return error, the data copied to user-space is duplicate.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
use a slightly simpler expression to calculate nat block with nid.
Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Inject a fault during f2fs_truncate().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch checks the parameter range passed by ioctl to void that range
exceeds the max_file_blocks limit.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch add a function to punch discard command if one segment
reuse before discard. Split this segment from multi-segments discard
range, and discard the left bigger range.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Let's allocate a bio when issuing discard commands later.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Skip writeback meta pages if cp_mutex lock acquire failed, cp will
flush dirty pages instead.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This case is not caused by fsck.f2fs. User needs to retry mount.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fixes: 3cf4574705 ("f2fs: introduce get_next_page_offset to speed up SEEK_DATA")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The nat entry is listed from the set list for freeing,
it's duplicate to do radix tree lookup again.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
[Jaegeuk Kim: remove unnecessary f2fs_bug_on]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The root device's issue flush trace is missing,
add it and tracing the result from submit.
Fixes d50aaeec90 ("f2fs: show actual device info in tracepoints")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Now f2fs only supports volatile writes for journal db regular file.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The atomic writes only supports regular files for database.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When I forced to enable atomic operations intentionally, I could hit the below
panic, since we didn't clear page->private in f2fs_invalidate_page called by
file truncation.
The panic occurs due to NULL mapping having page->private.
BUG: unable to handle kernel paging request at ffffffffffffffff
IP: drop_buffers+0x38/0xe0
PGD 5d00c067
PUD 5d00e067
PMD 0
CPU: 3 PID: 1648 Comm: fsstress Tainted: G D OE 4.10.0+ #5
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: ffff9151952863c0 task.stack: ffffaaec40db4000
RIP: 0010:drop_buffers+0x38/0xe0
RSP: 0018:ffffaaec40db74c8 EFLAGS: 00010292
Call Trace:
? page_referenced+0x8b/0x170
try_to_free_buffers+0xc5/0xe0
try_to_release_page+0x49/0x50
shrink_page_list+0x8bc/0x9f0
shrink_inactive_list+0x1dd/0x500
? shrink_active_list+0x2c0/0x430
shrink_node_memcg+0x5eb/0x7c0
shrink_node+0xe1/0x320
do_try_to_free_pages+0xef/0x2e0
try_to_free_pages+0xe9/0x190
__alloc_pages_slowpath+0x390/0xe70
__alloc_pages_nodemask+0x291/0x2b0
alloc_pages_current+0x95/0x140
__page_cache_alloc+0xc4/0xe0
pagecache_get_page+0xab/0x2a0
grab_cache_page_write_begin+0x20/0x40
get_read_data_page+0x2e6/0x4c0 [f2fs]
? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs]
? truncate_data_blocks_range+0x238/0x2b0 [f2fs]
get_lock_data_page+0x30/0x190 [f2fs]
__exchange_data_block+0xaaf/0xf40 [f2fs]
f2fs_fallocate+0x418/0xd00 [f2fs]
vfs_fallocate+0x157/0x220
SyS_fallocate+0x48/0x80
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Chao Yu: use INMEM_INVALIDATE for better tracing]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|