Age | Commit message (Collapse) | Author |
|
Passing dentry to some helpers is unnecessary. Simplify these cases.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
If memory for uperredirect was allocated with kstrdup() in upperdir != NULL
and d.redirect != NULL path, it may seem that it can be lost when
upperredirect is reassigned later, but it's not possible.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 0a2d0d3f2f291 ("ovl: Check redirect on index as well")
Signed-off-by: Stanislav Goriainov <goriainov@ispras.ru>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Those two cleanup routines are using the helper ovl_dir_read() with the
merge dir filler, which populates an rb tree, that is never used.
The index dir entry names all have a long (42 bytes) constant prefix, so it
is not surprising that perf top has demostrated high CPU usage by rb tree
population during cleanup of a large index dir:
- 9.53% ovl_fill_merge
- 78.41% ovl_cache_entry_find_link.constprop.27
+ 72.11% strncmp
Use the plain list filler that does not populate the unneeded rb tree.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
ovl_indexdir_cleanup() is called on mount of overayfs with nfs_export
feature to cleanup stale index records for lower and upper files that have
been deleted while overlayfs was offline.
This has the side effect (good or bad) of pre populating inode cache with
all the copied up upper inodes, while verifying the index entries.
For copied up directories, the upper file handles are decoded to conncted
upper dentries. This has the even bigger side effect of reading the
content of all the parent upper directories which may take significantly
more time and IO than just reading the upper inodes.
Do not request connceted upper dentries for verifying upper directory index
entries, because we have no use for the connected dentry.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Fix two typos.
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
A while ago we introduced a dedicated vfs{g,u}id_t type in commit
1e5267cd0895 ("mnt_idmapping: add vfs{g,u}id_t"). We already switched
over a good part of the VFS. Ultimately we will remove all legacy
idmapped mount helpers that operate only on k{g,u}id_t in favor of the
new type safe helpers that operate on vfs{g,u}id_t.
Cc: Seth Forshee (Digital Ocean) <sforshee@kernel.org>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
There is a wrong case of link() on overlay:
$ mkdir /lower /fuse /merge
$ mount -t fuse /fuse
$ mkdir /fuse/upper /fuse/work
$ mount -t overlay /merge -o lowerdir=/lower,upperdir=/fuse/upper,\
workdir=work
$ touch /merge/file
$ chown bin.bin /merge/file // the file's caller becomes "bin"
$ ln /merge/file /merge/lnkfile
Then we will get an error(EACCES) because fuse daemon checks the link()'s
caller is "bin", it denied this request.
In the changing history of ovl_link(), there are two key commits:
The first is commit bb0d2b8ad296 ("ovl: fix sgid on directory") which
overrides the cred's fsuid/fsgid using the new inode. The new inode's
owner is initialized by inode_init_owner(), and inode->fsuid is
assigned to the current user. So the override fsuid becomes the
current user. We know link() is actually modifying the directory, so
the caller must have the MAY_WRITE permission on the directory. The
current caller may should have this permission. This is acceptable
to use the caller's fsuid.
The second is commit 51f7e52dc943 ("ovl: share inode for hard link")
which removed the inode creation in ovl_link(). This commit move
inode_init_owner() into ovl_create_object(), so the ovl_link() just
give the old inode to ovl_create_or_link(). Then the override fsuid
becomes the old inode's fsuid, neither the caller nor the overlay's
mounter! So this is incorrect.
Fix this bug by using ovl mounter's fsuid/fsgid to do underlying
fs's link().
Link: https://lore.kernel.org/all/20220817102952.xnvesg3a7rbv576x@wittgenstein/T
Link: https://lore.kernel.org/lkml/20220825130552.29587-1-zhangtianci.1997@bytedance.com/t
Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com>
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Fixes: 51f7e52dc943 ("ovl: share inode for hard link")
Cc: <stable@vger.kernel.org> # v4.8
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The "buf" flexible array needs to be the memcpy() destination to avoid
false positive run-time warning from the recent FORTIFY_SOURCE
hardening:
memcpy: detected field-spanning write (size 93) of single field "&fh->fb"
at fs/overlayfs/export.c:799 (size 21)
Reported-by: syzbot+9d14351a171d0d1c7955@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000763a6c05e95a5985@google.com/
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
When insert and remove the orangefs module, there are memory leaked
as below:
unreferenced object 0xffff88816b0cc000 (size 2048):
comm "insmod", pid 783, jiffies 4294813439 (age 65.512s)
hex dump (first 32 bytes):
6e 6f 6e 65 0a 00 00 00 00 00 00 00 00 00 00 00 none............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000031ab7788>] kmalloc_trace+0x27/0xa0
[<000000005b405fee>] orangefs_debugfs_init.cold+0xaf/0x17f
[<00000000e5a0085b>] 0xffffffffa02780f9
[<000000004232d9f7>] do_one_initcall+0x87/0x2a0
[<0000000054f22384>] do_init_module+0xdf/0x320
[<000000003263bdea>] load_module+0x2f98/0x3330
[<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0
[<00000000250ae02b>] do_syscall_64+0x35/0x80
[<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Use the golbal variable as the buffer rather than dynamic allocate to
slove the problem.
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
When insert and remove the orangefs module, there are kobjects memory
leaked as below:
unreferenced object 0xffff88810f95af00 (size 64):
comm "insmod", pid 783, jiffies 4294813439 (age 65.512s)
hex dump (first 32 bytes):
a0 83 af 01 81 88 ff ff 08 af 95 0f 81 88 ff ff ................
08 af 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000031ab7788>] kmalloc_trace+0x27/0xa0
[<000000005a6e4dfe>] orangefs_sysfs_init+0x42/0x3a0
[<00000000722645ca>] 0xffffffffa02780fe
[<000000004232d9f7>] do_one_initcall+0x87/0x2a0
[<0000000054f22384>] do_init_module+0xdf/0x320
[<000000003263bdea>] load_module+0x2f98/0x3330
[<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0
[<00000000250ae02b>] do_syscall_64+0x35/0x80
[<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
unreferenced object 0xffff88810f95ae80 (size 64):
comm "insmod", pid 783, jiffies 4294813439 (age 65.512s)
hex dump (first 32 bytes):
c8 90 0f 02 81 88 ff ff 88 ae 95 0f 81 88 ff ff ................
88 ae 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000031ab7788>] kmalloc_trace+0x27/0xa0
[<000000001a4841fa>] orangefs_sysfs_init+0xc7/0x3a0
[<00000000722645ca>] 0xffffffffa02780fe
[<000000004232d9f7>] do_one_initcall+0x87/0x2a0
[<0000000054f22384>] do_init_module+0xdf/0x320
[<000000003263bdea>] load_module+0x2f98/0x3330
[<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0
[<00000000250ae02b>] do_syscall_64+0x35/0x80
[<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
unreferenced object 0xffff88810f95ae00 (size 64):
comm "insmod", pid 783, jiffies 4294813440 (age 65.511s)
hex dump (first 32 bytes):
60 87 a1 00 81 88 ff ff 08 ae 95 0f 81 88 ff ff `...............
08 ae 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000031ab7788>] kmalloc_trace+0x27/0xa0
[<000000005915e797>] orangefs_sysfs_init+0x12b/0x3a0
[<00000000722645ca>] 0xffffffffa02780fe
[<000000004232d9f7>] do_one_initcall+0x87/0x2a0
[<0000000054f22384>] do_init_module+0xdf/0x320
[<000000003263bdea>] load_module+0x2f98/0x3330
[<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0
[<00000000250ae02b>] do_syscall_64+0x35/0x80
[<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
unreferenced object 0xffff88810f95ad80 (size 64):
comm "insmod", pid 783, jiffies 4294813440 (age 65.511s)
hex dump (first 32 bytes):
78 90 0f 02 81 88 ff ff 88 ad 95 0f 81 88 ff ff x...............
88 ad 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000031ab7788>] kmalloc_trace+0x27/0xa0
[<000000007a14eb35>] orangefs_sysfs_init+0x1ac/0x3a0
[<00000000722645ca>] 0xffffffffa02780fe
[<000000004232d9f7>] do_one_initcall+0x87/0x2a0
[<0000000054f22384>] do_init_module+0xdf/0x320
[<000000003263bdea>] load_module+0x2f98/0x3330
[<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0
[<00000000250ae02b>] do_syscall_64+0x35/0x80
[<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
unreferenced object 0xffff88810f95ac00 (size 64):
comm "insmod", pid 783, jiffies 4294813440 (age 65.531s)
hex dump (first 32 bytes):
e0 ff 67 02 81 88 ff ff 08 ac 95 0f 81 88 ff ff ..g.............
08 ac 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000031ab7788>] kmalloc_trace+0x27/0xa0
[<000000001f38adcb>] orangefs_sysfs_init+0x291/0x3a0
[<00000000722645ca>] 0xffffffffa02780fe
[<000000004232d9f7>] do_one_initcall+0x87/0x2a0
[<0000000054f22384>] do_init_module+0xdf/0x320
[<000000003263bdea>] load_module+0x2f98/0x3330
[<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0
[<00000000250ae02b>] do_syscall_64+0x35/0x80
[<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
unreferenced object 0xffff88810f95ab80 (size 64):
comm "insmod", pid 783, jiffies 4294813441 (age 65.530s)
hex dump (first 32 bytes):
50 bf 2f 02 81 88 ff ff 88 ab 95 0f 81 88 ff ff P./.............
88 ab 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000031ab7788>] kmalloc_trace+0x27/0xa0
[<000000009cc7d95b>] orangefs_sysfs_init+0x2f5/0x3a0
[<00000000722645ca>] 0xffffffffa02780fe
[<000000004232d9f7>] do_one_initcall+0x87/0x2a0
[<0000000054f22384>] do_init_module+0xdf/0x320
[<000000003263bdea>] load_module+0x2f98/0x3330
[<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0
[<00000000250ae02b>] do_syscall_64+0x35/0x80
[<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Should add release function for each kobject_type to free the memory.
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
When insert and remove the orangefs module, then debug_help_string will
be leaked:
unreferenced object 0xffff8881652ba000 (size 4096):
comm "insmod", pid 1701, jiffies 4294893639 (age 13218.530s)
hex dump (first 32 bytes):
43 6c 69 65 6e 74 20 44 65 62 75 67 20 4b 65 79 Client Debug Key
77 6f 72 64 73 20 61 72 65 20 75 6e 6b 6e 6f 77 words are unknow
backtrace:
[<0000000004e6f8e3>] kmalloc_trace+0x27/0xa0
[<0000000006f75d85>] orangefs_prepare_debugfs_help_string+0x5e/0x480 [orangefs]
[<0000000091270a2a>] _sub_I_65535_1+0x57/0xf70 [crc_itu_t]
[<000000004b1ee1a3>] do_one_initcall+0x87/0x2a0
[<000000001d0614ae>] do_init_module+0xdf/0x320
[<00000000efef068c>] load_module+0x2f98/0x3330
[<000000006533b44d>] __do_sys_finit_module+0x113/0x1b0
[<00000000a0da6f99>] do_syscall_64+0x35/0x80
[<000000007790b19b>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
When remove the module, should always free debug_help_string. Should
always free the allocated buffer when change the free_debug_help_string.
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
When the dev init failed, should cleanup the sysfs, otherwise, the
module will never be loaded since can not create duplicate sysfs
directory:
sysfs: cannot create duplicate filename '/fs/orangefs'
CPU: 1 PID: 6549 Comm: insmod Tainted: G W 6.0.0+ #44
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
sysfs_warn_dup.cold+0x17/0x24
sysfs_create_dir_ns+0x16d/0x180
kobject_add_internal+0x156/0x3a0
kobject_init_and_add+0xcf/0x120
orangefs_sysfs_init+0x7e/0x3a0 [orangefs]
orangefs_init+0xfe/0x1000 [orangefs]
do_one_initcall+0x87/0x2a0
do_init_module+0xdf/0x320
load_module+0x2f98/0x3330
__do_sys_finit_module+0x113/0x1b0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
kobject_add_internal failed for orangefs with -EEXIST, don't try to register things with the same name in the same directory.
Fixes: 2f83ace37181 ("orangefs: put register_chrdev immediately before register_filesystem")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
The variable buffer_index is assigned a value that is never read,
it is assigned just before the function returns. The assignment is
redundant and can be removed.
Cleans up clang scan build warning:
fs/orangefs/file.c:276:3: warning: Value stored to 'buffer_index'
is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
Variable i is just being incremented and it's never used
anywhere else. The variable and the increment are redundant so
remove it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
If a cookie expires from the LRU and the LRU_DISCARD flag is set, but
the state machine has not run yet, it's possible another thread can call
fscache_use_cookie and begin to use it.
When the cookie_worker finally runs, it will see the LRU_DISCARD flag
set, transition the cookie->state to LRU_DISCARDING, which will then
withdraw the cookie. Once the cookie is withdrawn the object is removed
the below oops will occur because the object associated with the cookie
is now NULL.
Fix the oops by clearing the LRU_DISCARD bit if another thread uses the
cookie before the cookie_worker runs.
BUG: kernel NULL pointer dereference, address: 0000000000000008
...
CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G E 6.0.0-5.dneg.x86_64 #1
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs]
RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles]
...
Call Trace:
netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs]
process_one_work+0x217/0x3e0
worker_thread+0x4a/0x3b0
kthread+0xd6/0x100
Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
Reported-by: Daire Byrne <daire.byrne@gmail.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Daire Byrne <daire@dneg.com>
Link: https://lore.kernel.org/r/20221117115023.1350181-1-dwysocha@redhat.com/ # v1
Link: https://lore.kernel.org/r/20221117142915.1366990-1-dwysocha@redhat.com/ # v2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Linux 6.1-rc8
|
|
syzkaller reported a KASAN use-after-free:
https://syzkaller.appspot.com/bug?extid=2ae90e873e97f1faf6f2
The referenced fuzzed image actually has two issues:
- m_pa == 0 as a non-inlined pcluster;
- The logical length is longer than its physical length.
The first issue has already been addressed. This patch addresses
the second issue by checking the extent length validity.
Reported-by: syzbot+2ae90e873e97f1faf6f2@syzkaller.appspotmail.com
Fixes: 02827e1796b3 ("staging: erofs: add erofs_map_blocks_iter")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221205150050.47784-2-hsiangkao@linux.alibaba.com
|
|
Otherwise, meta buffers could be leaked.
Fixes: cec6e93beadf ("erofs: support parsing big pcluster compress indexes")
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221205150050.47784-1-hsiangkao@linux.alibaba.com
|
|
syzkaller reported a memleak:
https://syzkaller.appspot.com/bug?id=62f37ff612f0021641eda5b17f056f1668aa9aed
unreferenced object 0xffff88811009c7f8 (size 136):
...
backtrace:
[<ffffffff821db19b>] z_erofs_do_read_page+0x99b/0x1740
[<ffffffff821dee9e>] z_erofs_readahead+0x24e/0x580
[<ffffffff814bc0d6>] read_pages+0x86/0x3d0
...
syzkaller constructed a case: in z_erofs_register_pcluster(),
ztailpacking = false and map->m_pa = zero. This makes pcl->obj.index be
zero although pcl is not a inline pcluster.
Then following path adds refcount for grp, but the refcount won't be put
because pcl is inline.
z_erofs_readahead()
z_erofs_do_read_page() # for another page
z_erofs_collector_begin()
erofs_find_workgroup()
erofs_workgroup_get()
Since it's illegal for the block address of a non-inlined pcluster to
be zero, add check here to avoid registering the pcluster which would
be leaked.
Fixes: cecf864d3d76 ("erofs: support inline data decompression")
Reported-by: syzbot+6f8cd9a0155b366d227f@syzkaller.appspotmail.com
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/Y42Kz6sVkf+XqJRB@debian
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Convert all mapped erofs_bread() users to use kmap_local_page()
instead of kmap() or kmap_atomic().
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-and-tested-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20221018105313.4940-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Enable large folios for fscache mode. Enable this feature for
non-compressed format for now, until the compression part supports large
folios later.
One thing worth noting is that, the feature is not enabled for the meta
data routine since meta inodes don't need large folios for now, nor do
they support readahead yet.
Also document this new feature.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Link: https://lore.kernel.org/r/20221201074256.16639-3-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
When large folios supported, one folio can be split into several slices,
each of which may be mapped to META/UNMAPPED/MAPPED, and the folio can
be unlocked as a whole only when all slices have completed.
Thus always allocate erofs_fscache_request for each .read_folio() or
.readahead(), in which case the allocated request is responsible for
unlocking folios when all slices have completed.
As described above, each folio or folio range can be mapped into several
slices, while these slices may be mapped to different cookies, and thus
each slice needs its own netfs_cache_resources. Here we introduce
chained requests to support this, where each .read_folio() or
.readahead() calling can correspond to multiple requests. Each request
has its own netfs_cache_resources and thus is used to access one cookie.
Among these requests, there's a primary request, with the others
pointing to the primary request.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Link: https://lore.kernel.org/r/20221201074256.16639-2-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Switch to prepare_ondemand_read() interface and a self-contained request
completion to get rid of netfs_io_[request|subrequest].
The whole request will still be split into slices (subrequest) according
to the cache state of the backing file. As long as one of the
subrequests fails, the whole request will be marked as failed.
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Link: https://lore.kernel.org/r/20221124034212.81892-3-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Add prepare_ondemand_read() callback dedicated for the on-demand read
scenario, so that callers from this scenario can be decoupled from
netfs_io_subrequest.
The original cachefiles_prepare_read() is now refactored to a generic
routine accepting a parameter list instead of netfs_io_subrequest.
There's no logic change, except that the debug id of subrequest and
request is removed from trace_cachefiles_prep_read().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20221124034212.81892-2-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
After commit 4c7e42552b3a ("erofs: remove useless cache strategy of
DELAYEDALLOC"), only one cached I/O allocation strategy is supported:
When cached I/O is preferred, page allocation is applied without
direct reclaim. If allocation fails, fall back to inplace I/O.
Let's get rid of z_erofs_cache_alloctype. No logical changes.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221206060352.152830-1-xiang@kernel.org
|
|
When shared domain is enabled, doing mount twice with the same fsid and
domain_id will trigger sysfs warning as shown below:
sysfs: cannot create duplicate filename '/fs/erofs/d0,meta.bin'
CPU: 15 PID: 1051 Comm: mount Not tainted 6.1.0-rc6+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
dump_stack_lvl+0x38/0x49
dump_stack+0x10/0x12
sysfs_warn_dup.cold+0x17/0x27
sysfs_create_dir_ns+0xb8/0xd0
kobject_add_internal+0xb1/0x240
kobject_init_and_add+0x71/0xa0
erofs_register_sysfs+0x89/0x110
erofs_fc_fill_super+0x98c/0xaf0
vfs_get_super+0x7d/0x100
get_tree_nodev+0x16/0x20
erofs_fc_get_tree+0x20/0x30
vfs_get_tree+0x24/0xb0
path_mount+0x2fa/0xa90
do_mount+0x7c/0xa0
__x64_sys_mount+0x8b/0xe0
do_syscall_64+0x30/0x60
entry_SYSCALL_64_after_hwframe+0x46/0xb0
The reason is erofs_fscache_register_cookie() doesn't guarantee the primary
data blob (aka fsid) is unique in the shared domain and
erofs_register_sysfs() invoked by the second mount will fail due to the
duplicated fsid in the shared domain and report warning.
It would be better to check the uniqueness of fsid before doing
erofs_register_sysfs(), so adding a new flags parameter for
erofs_fscache_register_cookie() and doing the uniqueness check if
EROFS_REG_COOKIE_NEED_NOEXIST is enabled.
After the patch, the error in dmesg for the duplicated mount would be:
erofs: ...: erofs_domain_register_cookie: XX already exists in domain YY
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20221125110822.3812942-1-houtao@huaweicloud.com
Fixes: 7d41963759fe ("erofs: Support sharing cookies in the same domain")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Enable large folios for iomap mode. Then the readahead routine will
pass down large folios containing multiple pages.
Let's enable this for non-compressed format for now, until the
compression part supports large folios later.
When large folios supported, the iomap routine will allocate iomap_page
for each large folio and thus we need iomap_release_folio() and
iomap_invalidate_folio() to free iomap_page when these folios get
reclaimed or invalidated.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20221130060455.44532-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
If the state manager thread fails to start, then we should just mark the
client initialisation as failed so that other processes or threads don't
get stuck in nfs_wait_client_init_complete().
Reported-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Fixes: 4697bd5e9419 ("NFSv4: Fix a race in the net namespace mount notification")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
940261a19508 introduced nfs_io_size() to clamp the iosize to a multiple
of PAGE_SIZE. This had the unintended side effect of no longer allowing
iosizes less than a page, which could be useful in some situations.
UDP already has an exception that causes it to fall back on the
power-of-two style sizes instead. This patch adds an additional
exception for very small iosizes.
Reported-by: Jeff Layton <jlayton@kernel.org>
Fixes: 940261a19508 ("NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Assume that the first segment will be a DATA segment, and place the data
directly into the xdr pages so it doesn't need to be shifted.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The scratch_buf array is 16 bytes, but I was passing 32 to the
xdr_set_scratch_buffer() function. Fix this by using sizeof(), which is
what I probably should have been doing this whole time.
Fixes: d3b00a802c84 ("NFS: Replace the READ_PLUS decoding code")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When the NFSv4 state manager recovers state after a server restart, it
reports that locks have been lost if it finds any lock state for which
recovery hasn't been successful. i.e. any for which
NFS_LOCK_INITIALIZED is not set.
However it only tries to recover locks that are still linked to
inode->i_flctx. So if a lock has been removed from inode->i_flctx, but
the state for that lock has not yet been destroyed, then a spurious
warning results.
nfs4_proc_unlck() calls locks_lock_inode_wait() - which removes the lock
from ->i_flctx - before sending the unlock request to the server and
before the final nfs4_put_lock_state() is called. This allows a window
in which a spurious warning can be produced.
So add a new flag NFS_LOCK_UNLOCKING which is set once the decision has
been made to unlock the lock. This will prevent it from triggering any
warning.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
According to commit "vfs: parse: deal with zero length string value",
kernel will set the param->string to null pointer in vfs_parse_fs_string()
if fs string has zero length.
Yet the problem is that, nfs_fs_context_parse_param() will dereferences the
param->string, without checking whether it is a null pointer, which may
trigger a null-ptr-deref bug.
This patch solves it by adding sanity check on param->string
in nfs_fs_context_parse_param().
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
After converting file f_flags to open context mode by flags_to_mode(), open
context mode will have FMODE_EXEC when file open for exec, so we check
FMODE_EXEC from open context mode.
No functional change, just simplify the code.
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Because file f_mode never have FMODE_EXEC, open context mode won't get
FMODE_EXEC from file f_mode. Open context mode only care about FMODE_READ/
FMODE_WRITE/FMODE_EXEC, and all info about open context mode can be convert
from file f_flags, so convert file f_flags to open context mode by
flags_to_mode().
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Commit c412a97cf6c5 changed delete_work_func() to always perform an
inode lookup when gfs2_try_evict() fails. This doesn't make sense as a
gfs2_try_evict() failure indicates that the inode is likely still in
use. Revert that change.
Fixes: c412a97cf6c5 ("gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Add comment on when and why gfs2_cancel_delete_work() needs to be
skipped in gfs2_inode_lookup().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Those functions have reached a size at which having them inline isn't
useful anymore, so uninline them. In addition, report the glock name on
assertion failures.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
With the previous change, to simplify things, we can always just dequeue
and uninitialize the iopen glock in gfs2_evict_inode() even if it isn't
queued anymore.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Since commit 3d36e57ff768 ("gfs2: gfs2_create_inode rework"),
gfs2_evict_inode() and gfs2_create_inode() / gfs2_inode_lookup() will
synchronize via the inode hash table and we can be certain that once a
new inode is inserted into the inode hash table(), gfs2_evict_inode()
has completely destroyed any previous versions. We no longer need to
worry about overlapping inode object lifespans. Update the code and
comments accordingly.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When a locking request fails, the associated glock holder is
automatically dequeued from the list of active and waiting holders. For
GL_ASYNC locking requests, this will obviously happen asynchronously
and it can race with attempts to cancel that locking request via
gfs2_glock_dq(). Therefore, don't forget to check if a locking request
has already been dequeued in gfs2_glock_dq().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This allows code like 'gl = gfs2_glock_hold(...)'.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Check if the inode size of stuffed (inline) inodes is within the allowed
range when reading inodes from disk (gfs2_dinode_in()). This prevents
us from on-disk corruption.
The two checks in stuffed_readpage() and gfs2_unstuffer_page() that just
truncate inline data to the maximum allowed size don't actually make
sense, and they can be removed now as well.
Reported-by: syzbot+7bb81dfa9cda07d9cd9d@syzkaller.appspotmail.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In each of the two functions, add an inode variable that points to
&ip->i_inode and use that throughout the rest of the function.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
An oops can be induced by running 'cat /proc/kcore > /dev/null' on
devices using pstore with the ram backend because kmap_atomic() assumes
lowmem pages are accessible with __va().
Unable to handle kernel paging request at virtual address ffffff807ff2b000
Mem abort info:
ESR = 0x96000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x06: level 2 translation fault
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000081d87000
[ffffff807ff2b000] pgd=180000017fe18003, p4d=180000017fe18003, pud=180000017fe18003, pmd=0000000000000000
Internal error: Oops: 96000006 [#1] PREEMPT SMP
Modules linked in: dm_integrity
CPU: 7 PID: 21179 Comm: perf Not tainted 5.15.67-10882-ge4eb2eb988cd #1 baa443fb8e8477896a370b31a821eb2009f9bfba
Hardware name: Google Lazor (rev3 - 8) (DT)
pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __memcpy+0x110/0x260
lr : vread+0x194/0x294
sp : ffffffc013ee39d0
x29: ffffffc013ee39f0 x28: 0000000000001000 x27: ffffff807ff2b000
x26: 0000000000001000 x25: ffffffc0085a2000 x24: ffffff802d4b3000
x23: ffffff80f8a60000 x22: ffffff802d4b3000 x21: ffffffc0085a2000
x20: ffffff8080b7bc68 x19: 0000000000001000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: ffffffd3073f2e60
x14: ffffffffad588000 x13: 0000000000000000 x12: 0000000000000001
x11: 00000000000001a2 x10: 00680000fff2bf0b x9 : 03fffffff807ff2b
x8 : 0000000000000001 x7 : 0000000000000000 x6 : 0000000000000000
x5 : ffffff802d4b4000 x4 : ffffff807ff2c000 x3 : ffffffc013ee3a78
x2 : 0000000000001000 x1 : ffffff807ff2b000 x0 : ffffff802d4b3000
Call trace:
__memcpy+0x110/0x260
read_kcore+0x584/0x778
proc_reg_read+0xb4/0xe4
During early boot, memblock reserves the pages for the ramoops reserved
memory node in DT that would otherwise be part of the direct lowmem
mapping. Pstore's ram backend reuses those reserved pages to change the
memory type (writeback or non-cached) by passing the pages to vmap()
(see pfn_to_page() usage in persistent_ram_vmap() for more details) with
specific flags. When read_kcore() starts iterating over the vmalloc
region, it runs over the virtual address that vmap() returned for
ramoops. In aligned_vread() the virtual address is passed to
vmalloc_to_page() which returns the page struct for the reserved lowmem
area. That lowmem page is passed to kmap_atomic(), which effectively
calls page_to_virt() that assumes a lowmem page struct must be directly
accessible with __va() and friends. These pages are mapped via vmap()
though, and the lowmem mapping was never made, so accessing them via the
lowmem virtual address oopses like above.
Let's side-step this problem by passing VM_IOREMAP to vmap(). This will
tell vread() to not include the ramoops region in the kcore. Instead the
area will look like a bunch of zeros. The alternative is to teach kmap()
about vmalloc areas that intersect with lowmem. Presumably such a change
isn't a one-liner, and there isn't much interest in inspecting the
ramoops region in kcore files anyway, so the most expedient route is
taken for now.
Cc: Brian Geffon <bgeffon@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 404a6043385d ("staging: android: persistent_ram: handle reserving and mapping memory")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221205233136.3420802-1-swboyd@chromium.org
|
|
When creating a new inode, there is a small chance that an inode lookup
for a previous version of the same inode is still in progress. In that
case, that previous lookup will eventually fail, but we may still need
to retry here.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
If the layout is invalid, then just log a '0' value.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Currently we print the transaction aborted message with a debug level, but
a transaction abort is an exceptional event that indicates something went
wrong and it's useful to have it printed with an error level as it helps
analysing problems in a production environment, where debug level messages
are typically not logged. For example reports from syzbot never include
the transaction aborted message, since the log level on the test machines
is above the debug level.
So change the log level from debug to error.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|