summaryrefslogtreecommitdiff
path: root/fs/erofs/zdata.c
AgeCommit message (Collapse)Author
2024-11-18erofs: add sysfs node to drop internal cachesChunhai Guo
Add a sysfs node to drop compression-related caches, currently used to drop in-memory pclusters and cached compressed folios. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241113041148.749129-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-11-18erofs: free pclusters if no cached folio is attachedChunhai Guo
Once a pcluster is fully decompressed and there are no attached cached folios, its corresponding `struct z_erofs_pcluster` will be freed. This will significantly reduce the frequency of calls to erofs_shrink_scan() and the memory allocated for `struct z_erofs_pcluster`. The tables below show approximately a 96% reduction in the calls to erofs_shrink_scan() and in the memory allocated for `struct z_erofs_pcluster` after applying this patch. The results were obtained by performing a test to copy a 4.1GB partition on ARM64 Android devices running the 6.6 kernel with an 8-core CPU and 12GB of memory. 1. The reduction in calls to erofs_shrink_scan(): +-----------------+-----------+----------+---------+ | | w/o patch | w/ patch | diff | +-----------------+-----------+----------+---------+ | Average (times) | 11390 | 390 | -96.57% | +-----------------+-----------+----------+---------+ 2. The reduction in memory released by erofs_shrink_scan(): +-----------------+-----------+----------+---------+ | | w/o patch | w/ patch | diff | +-----------------+-----------+----------+---------+ | Average (Byte) | 133612656 | 4434552 | -96.68% | +-----------------+-----------+----------+---------+ Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241112043235.546164-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-11-18erofs: sunset `struct erofs_workgroup`Gao Xiang
`struct erofs_workgroup` was introduced to provide a unique header for all physically indexed objects. However, after big pclusters and shared pclusters are implemented upstream, it seems that all EROFS encoded data (which requires transformation) can be represented with `struct z_erofs_pcluster` directly. Move all members into `struct z_erofs_pcluster` for simplicity. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241021035323.3280682-3-hsiangkao@linux.alibaba.com
2024-11-18erofs: move erofs_workgroup operations into zdata.cGao Xiang
Move related helpers into zdata.c as an intermediate step of getting rid of `struct erofs_workgroup`, and rename: erofs_workgroup_put => z_erofs_put_pcluster erofs_workgroup_get => z_erofs_get_pcluster erofs_try_to_release_workgroup => erofs_try_to_release_pcluster erofs_shrink_workstation => z_erofs_shrink_scan Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241021035323.3280682-2-hsiangkao@linux.alibaba.com
2024-11-18erofs: get rid of erofs_{find,insert}_workgroupGao Xiang
Just fold them into the only two callers since they are simple enough. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241021035323.3280682-1-hsiangkao@linux.alibaba.com
2024-10-11erofs: get rid of z_erofs_try_to_claim_pcluster()Gao Xiang
Just fold it into the caller for simplicity. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241010090420.405871-1-hsiangkao@linux.alibaba.com
2024-09-12erofs: allocate more short-lived pages from reserved pool firstChunhai Guo
This patch aims to allocate bvpages and short-lived compressed pages from the reserved pool first. After applying this patch, there are three benefits. 1. It reduces the page allocation time. The bvpages and short-lived compressed pages account for about 4% of the pages allocated from the system in the multi-app launch benchmarks [1]. It reduces the page allocation time accordingly and lowers the likelihood of blockage by page allocation in low memory scenarios. 2. The pages in the reserved pool will be allocated on demand. Currently, bvpages and short-lived compressed pages are short-lived pages allocated from the system, and the pages in the reserved pool all originate from short-lived pages. Consequently, the number of reserved pool pages will increase to z_erofs_rsv_nrpages over time. With this patch, all short-lived pages are allocated from the reserved pool first, so the number of reserved pool pages will only increase when there are not enough pages. Thus, even if z_erofs_rsv_nrpages is set to a large number for specific reasons, the actual number of reserved pool pages may remain low as per demand. In the multi-app launch benchmarks [1], z_erofs_rsv_nrpages is set at 256, while the number of reserved pool pages remains below 64. 3. When erofs cache decompression is disabled (EROFS_ZIP_CACHE_DISABLED), all pages will *only* be allocated from the reserved pool for erofs. This will significantly reduce the memory pressure from erofs. [1] For additional details on the multi-app launch benchmarks, please refer to commit 0f6273ab4637 ("erofs: add a reserved buffer pool for lz4 decompression"). Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240906121110.3701889-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-09-12erofs: sunset unneeded NOFAILsGao Xiang
With iterative development, our codebase can now deal with compressed buffer misses properly if both in-place I/O and compressed buffer allocation fail. Note that if readahead fails (with non-uptodate folios), the original request will then fall back to synchronous read, and `.read_folio()` should return appropriate errnos; otherwise -EIO will be passed to user space, which is unexpected. To simplify rarely encountered failure paths, a mimic decompression will be just used. Before that, failure reasons are recorded in compressed_bvecs[] and they also act as placeholders to avoid in-place pages. They will be parsed just before decompression and then pass back to `.read_folio()`. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240905084732.2684515-1-hsiangkao@linux.alibaba.com
2024-09-10erofs: support compressed inodes for fileioGao Xiang
Use pseudo bios just like the previous fscache approach since merged bio_vecs can be filled properly with unique interfaces. Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240830032840.3783206-3-hsiangkao@linux.alibaba.com
2024-09-10erofs: support unencoded inodes for fileioGao Xiang
Since EROFS only needs to handle read requests in simple contexts, Just directly use vfs_iocb_iter_read() for data I/Os. Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240905093031.2745929-1-hsiangkao@linux.alibaba.com
2024-09-10erofs: handle overlapped pclusters out of crafted images properlyGao Xiang
syzbot reported a task hang issue due to a deadlock case where it is waiting for the folio lock of a cached folio that will be used for cache I/Os. After looking into the crafted fuzzed image, I found it's formed with several overlapped big pclusters as below: Ext: logical offset | length : physical offset | length 0: 0.. 16384 | 16384 : 151552.. 167936 | 16384 1: 16384.. 32768 | 16384 : 155648.. 172032 | 16384 2: 32768.. 49152 | 16384 : 537223168.. 537239552 | 16384 ... Here, extent 0/1 are physically overlapped although it's entirely _impossible_ for normal filesystem images generated by mkfs. First, managed folios containing compressed data will be marked as up-to-date and then unlocked immediately (unlike in-place folios) when compressed I/Os are complete. If physical blocks are not submitted in the incremental order, there should be separate BIOs to avoid dependency issues. However, the current code mis-arranges z_erofs_fill_bio_vec() and BIO submission which causes unexpected BIO waits. Second, managed folios will be connected to their own pclusters for efficient inter-queries. However, this is somewhat hard to implement easily if overlapped big pclusters exist. Again, these only appear in fuzzed images so let's simply fall back to temporary short-lived pages for correctness. Additionally, it justifies that referenced managed folios cannot be truncated for now and reverts part of commit 2080ca1ed3e4 ("erofs: tidy up `struct z_erofs_bvec`") for simplicity although it shouldn't be any difference. Reported-by: syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com Reported-by: syzbot+de04e06b28cfecf2281c@syzkaller.appspotmail.com Reported-by: syzbot+c8c8238b394be4a1087d@syzkaller.appspotmail.com Tested-by: syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/0000000000002fda01061e334873@google.com Fixes: 8e6c8fa9f2e9 ("erofs: enable big pcluster feature") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240910070847.3356592-1-hsiangkao@linux.alibaba.com
2024-07-13erofs: silence uninitialized variable warning in z_erofs_scan_folio()Dan Carpenter
Smatch complains that: fs/erofs/zdata.c:1047 z_erofs_scan_folio() error: uninitialized symbol 'err'. The issue is if we hit this (!(map->m_flags & EROFS_MAP_MAPPED)) { condition then "err" isn't set. It's inside a loop so we would have to hit that condition on every iteration. Initialize "err" to zero to solve this. Fixes: 5b9654efb604 ("erofs: teach z_erofs_scan_folios() to handle multi-page folios") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/f78ab50e-ed6d-4275-8dd4-a4159fa565a2@stanley.mountain Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-07-11erofs: avoid refcounting short-lived pagesGao Xiang
LZ4 always reuses the decompressed buffer as its LZ77 sliding window (dynamic dictionary) for optimal performance. However, in specific cases, the output buffer may not fully contain valid page cache pages, resulting in the use of short-lived pages for temporary purposes. Due to the limited sliding window size, LZ4 shortlived bounce pages can also be reused in a sliding manner, so each bounce page can be vmapped multiple times in different relative positions by design. In order to avoiding double frees, currently, reuse counts are recorded via page refcount, but it will no longer be used as-is in the future world of Memdescs. Just maintain a lookup table to check if a shortlived page is reused. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240711053659.1364989-1-hsiangkao@linux.alibaba.com
2024-07-09erofs: refine z_erofs_{init,exit}_subsystem()Gao Xiang
Introduce z_erofs_{init,exit}_decompressor() to unexport z_erofs_{deflate,lzma,zstd}_{init,exit}(). Besides, call them in z_erofs_{init,exit}_subsystem() for simplicity. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240709094106.3018109-2-hsiangkao@linux.alibaba.com
2024-07-09erofs: move each decompressor to its own source fileGao Xiang
Thus *_config() function declarations can be avoided. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240709094106.3018109-1-hsiangkao@linux.alibaba.com
2024-07-08erofs: tidy up `struct z_erofs_bvec`Gao Xiang
After revisiting the design, I believe `struct z_erofs_bvec` should be page-based instead of folio-based due to the reasons below: - The minimized memory mapping block is a page; - Under the certain circumstances, only temporary pages needs to be used instead of folios since refcount, mapcount for such pages are unnecessary; - Decompressors handle all types of pages including temporary pages, not only folios. When handling `struct z_erofs_bvec`, all folio-related information is now accessed using the page_folio() helper. The final goal of this round adaptation is to eliminate direct accesses to `struct page` in the EROFS codebase, except for some exceptions like `z_erofs_is_shortlived_page()` and `z_erofs_page_is_invalidated()`, which require a new helper to determine the memdesc type of an arbitrary page. Actually large folios of compressed files seem to work now, yet I tend to conduct more tests before officially enabling this for all scenarios. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-4-hsiangkao@linux.alibaba.com
2024-07-08erofs: teach z_erofs_scan_folios() to handle multi-page foliosGao Xiang
Previously, a folio just contains one page. In order to enable large folios, z_erofs_scan_folios() needs to handle multi-page folios. First, this patch eliminates all gotos. Instead, the new loop deal with multiple parts in each folio. It's simple to handle the parts which belong to unmapped extents or fragment extents; but for encoded extents, the page boundaries needs to be considered for `tight` and `split` to keep inplace I/Os work correctly: when a part crosses the page boundary, they needs to be reseted properly. Besides, simplify `tight` derivation since Z_EROFS_PCLUSTER_HOOKED has been removed for quite a while. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-3-hsiangkao@linux.alibaba.com
2024-07-08erofs: convert z_erofs_read_fragment() to foliosGao Xiang
Just a straight-forward conversion. No logic changes. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-2-hsiangkao@linux.alibaba.com
2024-07-08erofs: convert z_erofs_pcluster_readmore() to foliosGao Xiang
Unlike `pagecache_get_page()`, `__filemap_get_folio()` returns error pointers instead of NULL, thus switching to `IS_ERR_OR_NULL`. Apart from that, it's just a straightforward conversion. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-1-hsiangkao@linux.alibaba.com
2024-05-18z_erofs_pcluster_begin(): don't bother with rounding position downAl Viro
... and be more idiomatic when calculating ->pageofs_in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20240425200017.GF1031757@ZenIV [ Gao Xiang: don't use `offset_in_page(mptr)` due to EROFS_NO_KMAP. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-18erofs: mechanically convert erofs_read_metabuf() to offsetsAl Viro
just lift the call of erofs_pos() into the callers; it will collapse in most of them, but that's better done caller-by-caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20240425195846.GC1031757@ZenIV Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-04-25erofs_buf: store address_space instead of inodeAl Viro
... seeing that ->i_mapping is the only thing we want from the inode. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-07erofs: switch erofs_bread() to passing offset instead of block numberAl Viro
Callers are happier that way, especially since we no longer need to play with splitting offset into block number and offset within block, passing the former to erofs_bread(), then adding the latter... erofs_bread() always reads entire pages, anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-03-10erofs: support compressed inodes over fscacheJingbo Xu
Since fscache can utilize iov_iter to write dest buffers, bio_vec can be used in this way too. To simplify this, pseudo bios are prepared and bio_vec will be filled with bio_add_page(). And a common .bi_end_io will be called directly to handle I/O completions. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240308094159.40547-2-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-03-10erofs: refine managed cache operations to foliosGao Xiang
Convert erofs_try_to_free_all_cached_pages() and z_erofs_cache_release_folio(). Besides, erofs_page_is_managed() is moved to zdata.c and renamed as erofs_folio_is_managed(). Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-6-hsiangkao@linux.alibaba.com
2024-03-10erofs: convert z_erofs_submissionqueue_endio() to foliosGao Xiang
Use bio_for_each_folio() to iterate over each folio in the bio and there is no large folios for now. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-5-hsiangkao@linux.alibaba.com
2024-03-10erofs: convert z_erofs_fill_bio_vec() to foliosGao Xiang
Introduce a folio member to `struct z_erofs_bvec` and convert most of z_erofs_fill_bio_vec() to folios, which is still straight-forward. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-4-hsiangkao@linux.alibaba.com
2024-03-10erofs: get rid of `justfound` debugging tagGao Xiang
`justfound` is introduced to identify cached folios that are just added to compressed bvecs so that more checks can be applied in the I/O submission path. EROFS is quite now stable compared to the codebase at that stage. `justfound` becomes a burden for upcoming features. Drop it. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-3-hsiangkao@linux.alibaba.com
2024-03-10erofs: convert z_erofs_do_read_page() to foliosGao Xiang
It is a straight-forward conversion. Besides, it's renamed as z_erofs_scan_folio(). Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-2-hsiangkao@linux.alibaba.com
2024-03-10erofs: convert z_erofs_onlinepage_.* to foliosGao Xiang
Online folios are locked file-backed folios which will eventually keep decoded (e.g. decompressed) data of each inode for end users to utilize. It may belong to a few pclusters and contain other data (e.g. compressed data for inplace I/Os) temporarily in a time-sharing manner to reduce memory footprints for low-ended storage devices with high latencies under heary I/O pressure. Apart from folio_end_read() usage, it's a straight-forward conversion. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-1-hsiangkao@linux.alibaba.com
2024-01-27erofs: relaxed temporary buffers allocation on readaheadChunhai Guo
Even with inplace decompression, sometimes very few temporary buffers may be still needed for a single decompression shot (e.g. 16 pages for 64k sliding window or 4 pages for 16k sliding window). In low-memory scenarios, it would be better to try to allocate with GFP_NOWAIT on readahead first. That can help reduce the time spent on page allocation under durative memory pressure. Here are detailed performance numbers under multi-app launch benchmark workload [1] on ARM64 Android devices (8-core CPU and 8GB of memory) running a 5.15 LTS kernel with EROFS of 4k pclusters: +----------------------------------------------+ | LZ4 | vanilla | patched | diff | |----------------+---------+---------+---------| | Average (ms) | 3364 | 2684 | -20.21% | [64k sliding window] |----------------+---------+---------+---------| | Average (ms) | 2079 | 1610 | -22.56% | [16k sliding window] +----------------------------------------------+ The total size of system images for 4k pclusters is almost unchanged: (64k sliding window) 9,117,044 KB (16k sliding window) 9,113,096 KB Therefore, in addition to switch the sliding window from 64k to 16k, after applying this patch, it can eventually save 52.14% (3364 -> 1610) on average with no memory reservation. That is particularly useful for embedded devices with limited resources. [1] https://lore.kernel.org/r/20240109074143.4138783-1-guochunhai@vivo.com Suggested-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20240126140142.201718-1-hsiangkao@linux.alibaba.com
2024-01-26erofs: fix infinite loop due to a race of filling compressed_bvecsGao Xiang
I encountered a race issue after lengthy (~594647 secs) stress tests on a 64k-page arm64 VM with several 4k-block EROFS images. The timing is like below: z_erofs_try_inplace_io z_erofs_fill_bio_vec cmpxchg(&compressed_bvecs[].page, NULL, ..) [access bufvec] compressed_bvecs[] = *bvec; Previously, z_erofs_submit_queue() just accessed bufvec->page only, so other fields in bufvec didn't matter. After the subpage block support is landed, .offset and .end can be used too, but filling bufvec isn't an atomic operation which can cause inconsistency. Let's use a spinlock to keep the atomicity of each bufvec. More specifically, just reuse the existing spinlock `pcl->obj.lockref.lock` since it's rarely used (also it takes a short time if even used) as long as the pcluster has a reference. Fixes: 192351616a9d ("erofs: support I/O submission for sub-page compressed blocks") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Sandeep Dhavale <dhavale@google.com> Link: https://lore.kernel.org/r/20240125120039.3228103-1-hsiangkao@linux.alibaba.com
2024-01-25erofs: get rid of unneeded GFP_NOFSJingbo Xu
Clean up some leftovers since there is no way for EROFS to be called again from a reclaim context. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240124031945.130782-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-12-21erofs: allow partially filled compressed bvecsYue Hu
In order to reduce memory footprints even further, let's allow partially filled compressed bvecs for readahead to bail out later. Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231221062341.23901-1-zbestahu@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-12-18erofs: enable sub-page compressed block supportGao Xiang
Let's just disable cached decompression and inplace I/Os for partial pages as the first step in order to enable sub-page block initial support. In other words, currently it works primarily based on temporary short-lived pages. Don't expect too much in terms of performance. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231206091057.87027-6-hsiangkao@linux.alibaba.com
2023-12-18erofs: fix ztailpacking for subpage compressed blocksGao Xiang
`pageofs_in` should be the compressed data offset of the page rather than of the block. Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231214161337.753049-1-hsiangkao@linux.alibaba.com
2023-12-15erofs: record `pclustersize` in bytes instead of pagesGao Xiang
Currently, compressed sizes are recorded in pages using `pclusterpages`, However, for tailpacking pclusters, `tailpacking_size` is used instead. This approach doesn't work when dealing with sub-page blocks. To address this, let's switch them to the unified `pclustersize` in bytes. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231206091057.87027-3-hsiangkao@linux.alibaba.com
2023-12-15erofs: support I/O submission for sub-page compressed blocksGao Xiang
Add a basic I/O submission path first to support sub-page blocks: - Temporary short-lived pages will be used entirely; - In-place I/O pages can be used partially, but compressed pages need to be able to be mapped in contiguous virtual memory. As a start, currently cache decompression is explicitly disabled for sub-page blocks, which will be supported in the future. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231206091057.87027-2-hsiangkao@linux.alibaba.com
2023-12-15erofs: fix memory leak on short-lived bounced pagesGao Xiang
Both MicroLZMA and DEFLATE algorithms can use short-lived pages on demand for the overlapped inplace I/O decompression. However, those short-lived pages are actually added to `be->compressed_pages`. Thus, it should be checked instead of `pcl->compressed_bvecs`. The LZ4 algorithm doesn't work like this, so it won't be impacted. Fixes: 67139e36d970 ("erofs: introduce `z_erofs_parse_in_bvecs'") Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231128180431.4116991-1-hsiangkao@linux.alibaba.com
2023-10-31erofs: fix erofs_insert_workgroup() lockref usageGao Xiang
As Linus pointed out [1], lockref_put_return() is fundamentally designed to be something that can fail. It behaves as a fastpath-only thing, and the failure case needs to be handled anyway. Actually, since the new pcluster was just allocated without being populated, it won't be accessed by others until it is inserted into XArray, so lockref helpers are actually unneeded here. Let's just set the proper reference count on initializing. [1] https://lore.kernel.org/r/CAHk-=whCga8BeQnJ3ZBh_Hfm9ctba_wpF444LpwRybVNMzO6Dw@mail.gmail.com Fixes: 7674a42f35ea ("erofs: use struct lockref to replace handcrafted approach") Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20231031060524.1103921-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-08-23erofs: release ztailpacking pclusters properlyJingbo Xu
Currently ztailpacking pclusters are chained with FOLLOWED_NOINPLACE and not recorded into the managed_pslots XArray. After commit 7674a42f35ea ("erofs: use struct lockref to replace handcrafted approach"), ztailpacking pclusters won't be freed with erofs_workgroup_put() anymore, which will cause the following issue: BUG erofs_pcluster-1 (Tainted: G OE ): Objects remaining in erofs_pcluster-1 on __kmem_cache_shutdown() Use z_erofs_free_pcluster() directly to free ztailpacking pclusters. Fixes: 7674a42f35ea ("erofs: use struct lockref to replace handcrafted approach") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230822110530.96831-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-08-23erofs: adapt folios for z_erofs_read_folio()Gao Xiang
It's a straight-forward conversion and no logic changes (except that it renames the corresponding tracepoint.) Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817083942.103303-1-hsiangkao@linux.alibaba.com
2023-08-23erofs: adapt folios for z_erofs_readahead()Gao Xiang
It's a straight-forward conversion except that readahead_folio() will do folio_put() in advance but it doesn't matter since folios are still locked. As before, since file-backed folios (pages for now) are locked, so we could temporarily use folio->private as an internal counter to indicate split parts of each folio for the corresponding pclusters to decompress. When such counter becomes zero, the folio will be finally unlocked (see compress.h and z_erofs_onlinepage_endio()). Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817082813.81180-7-hsiangkao@linux.alibaba.com
2023-08-23erofs: get rid of fe->backmost for cache decompressionGao Xiang
EROFS_MAP_FULL_MAPPED is more accurate to decide if caching the last incomplete pcluster for later read or not. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817082813.81180-6-hsiangkao@linux.alibaba.com
2023-08-23erofs: drop z_erofs_page_mark_eio()Gao Xiang
It can be folded into z_erofs_onlinepage_endio() to simplify the code. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817082813.81180-5-hsiangkao@linux.alibaba.com
2023-08-23erofs: tidy up z_erofs_do_read_page()Gao Xiang
- Fix a typo: spiltted => split; - Move !EROFS_MAP_MAPPED and EROFS_MAP_FRAGMENT upwards; - Increase `split` in advance to avoid unnecessary repeats. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817082813.81180-4-hsiangkao@linux.alibaba.com
2023-08-23erofs: move preparation logic into z_erofs_pcluster_begin()Gao Xiang
Some preparation logic should be part of z_erofs_pcluster_begin() instead of z_erofs_do_read_page(). Let's move now. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817082813.81180-3-hsiangkao@linux.alibaba.com
2023-08-23erofs: avoid obsolete {collector,collection} termsGao Xiang
{collector,collection} were once reserved in order to indicate different runtime logical extent instance of multi-reference pclusters. However, de-duplicated decompression has been landed in a more flexable way, thus `struct z_erofs_collection` was formally removed in commit 87ca34a7065d ("erofs: get rid of `struct z_erofs_collection'"). Let's handle the remaining leftovers, for example: `z_erofs_collector_begin` => `z_erofs_pcluster_begin` `z_erofs_collector_end` => `z_erofs_pcluster_end` as well as some comments. No logic changes. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817082813.81180-2-hsiangkao@linux.alibaba.com
2023-08-23erofs: simplify z_erofs_read_fragment()Gao Xiang
A trivial cleanup to make the fragment handling logic more clear. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817082813.81180-1-hsiangkao@linux.alibaba.com
2023-08-23erofs: refine warning messages for zdata I/OsFerry Meng
Don't warn users since -EINTR can be returned due to user interruption. Also suppress warning messages of readmore. Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230809060637.21311-1-mengferry@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>