summaryrefslogtreecommitdiff
path: root/fs/erofs/super.c
AgeCommit message (Collapse)Author
14 dayserofs: Zstandard compression supportGao Xiang
Add Zstandard compression as the 4th supported algorithm since it becomes more popular now and some end users have asked this for quite a while [1][2]. Each EROFS physical cluster contains only one valid standard Zstandard frame as described in [3] so that decompression can be performed on a per-pcluster basis independently. Currently, it just leverages multi-call stream decompression APIs with internal sliding window buffers. One-shot or bufferless decompression could be implemented later for even better performance if needed. [1] https://github.com/erofs/erofs-utils/issues/6 [2] https://lore.kernel.org/r/Y08h+z6CZdnS1XBm@B-P7TQMD6M-0146.lan [3] https://www.rfc-editor.org/rfc/rfc8478.txt Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240508234453.17896-1-xiang@kernel.org
2024-05-08erofs: derive fsid from on-disk UUID for .statfs() if possibleHongzhen Luo
Use the superblock's UUID to generate the fsid when it's non-null. Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com> Link: https://lore.kernel.org/r/20240409113022.74720-1-hongzhen@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-08erofs: rename per-CPU buffers to global buffer pool and make it configurableChunhai Guo
It will cost more time if compressed buffers are allocated on demand for low-latency algorithms (like lz4) so EROFS uses per-CPU buffers to keep compressed data if in-place decompression is unfulfilled. While it is kind of wasteful of memory for a device with hundreds of CPUs, and only a small number of CPUs concurrently decompress most of the time. This patch renames it as 'global buffer pool' and makes it configurable. This allows two or more CPUs to share a common buffer to reduce memory occupation. Suggested-by: Gao Xiang <xiang@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Link: https://lore.kernel.org/r/20240402100036.2673604-1-guochunhai@vivo.com Signed-off-by: Sandeep Dhavale <dhavale@google.com> Link: https://lore.kernel.org/r/20240408215231.3376659-1-dhavale@google.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-04-28erofs: reliably distinguish block based and fscache modeChristian Brauner
When erofs_kill_sb() is called in block dev based mode, s_bdev may not have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled, it will be mistaken for fscache mode, and then attempt to free an anon_dev that has never been allocated, triggering the following warning: ============================================ ida_free called for id=0 which is not allocated. WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140 Modules linked in: CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630 RIP: 0010:ida_free+0x134/0x140 Call Trace: <TASK> erofs_kill_sb+0x81/0x90 deactivate_locked_super+0x35/0x80 get_tree_bdev+0x136/0x1e0 vfs_get_tree+0x2c/0xf0 do_new_mount+0x190/0x2f0 [...] ============================================ Now when erofs_kill_sb() is called, erofs_sb_info must have been initialised, so use sbi->fsid to distinguish between the two modes. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240419123611.947084-3-libaokun1@huawei.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-04-28erofs: get rid of erofs_fs_contextBaokun Li
Instead of allocating the erofs_sb_info in fill_super() allocate it during erofs_init_fs_context() and ensure that erofs can always have the info available during erofs_kill_sb(). After this erofs_fs_context is no longer needed, replace ctx with sbi, no functional changes. Suggested-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240419123611.947084-2-libaokun1@huawei.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-03-25erofs: drop experimental warning for FSDAXGao Xiang
As EXT4/XFS filesystems, FSDAX functionality is considered to be stable. Let's drop this warning. Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240325005116.106351-1-hsiangkao@linux.alibaba.com
2024-03-12mm, slab: remove last vestiges of SLAB_MEM_SPREADLinus Torvalds
Yes, yes, I know the slab people were planning on going slow and letting every subsystem fight this thing on their own. But let's just rip off the band-aid and get it over and done with. I don't want to see a number of unnecessary pull requests just to get rid of a flag that no longer has any meaning. This was mainly done with a couple of 'sed' scripts and then some manual cleanup of the end result. Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-12Merge tag 'erofs-for-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "In this cycle, we introduce compressed inode support over fscache since a lot of native EROFS images are explicitly compressed so that EROFS over fscache can be more widely used even without Dragonfly Nydus [1]. Apart from that, there are some folio conversions for compressed inodes available as well as a lockdep false positive fix. Summary: - Some folio conversions for compressed inodes; - Add compressed inode support over fscache; - Fix lockdep false positives of erofs_pseudo_mnt" Link: https://nydus.dev [1] * tag 'erofs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: support compressed inodes over fscache erofs: make iov_iter describe target buffers over fscache erofs: fix lockdep false positives on initializing erofs_pseudo_mnt erofs: refine managed cache operations to folios erofs: convert z_erofs_submissionqueue_endio() to folios erofs: convert z_erofs_fill_bio_vec() to folios erofs: get rid of `justfound` debugging tag erofs: convert z_erofs_do_read_page() to folios erofs: convert z_erofs_onlinepage_.* to folios
2024-03-10erofs: fix lockdep false positives on initializing erofs_pseudo_mntBaokun Li
Lockdep reported the following issue when mounting erofs with a domain_id: ============================================ WARNING: possible recursive locking detected 6.8.0-rc7-xfstests #521 Not tainted -------------------------------------------- mount/396 is trying to acquire lock: ffff907a8aaaa0e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0 but task is already holding lock: ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&type->s_umount_key#50/1); lock(&type->s_umount_key#50/1); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by mount/396: #0: ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0 #1: ffffffffc00e6f28 (erofs_domain_list_lock){+.+.}-{3:3}, at: erofs_fscache_register_fs+0x3d/0x270 [erofs] stack backtrace: CPU: 1 PID: 396 Comm: mount Not tainted 6.8.0-rc7-xfstests #521 Call Trace: <TASK> dump_stack_lvl+0x64/0xb0 validate_chain+0x5c4/0xa00 __lock_acquire+0x6a9/0xd50 lock_acquire+0xcd/0x2b0 down_write_nested+0x45/0xd0 alloc_super+0xe3/0x3d0 sget_fc+0x62/0x2f0 vfs_get_super+0x21/0x90 vfs_get_tree+0x2c/0xf0 fc_mount+0x12/0x40 vfs_kern_mount.part.0+0x75/0x90 kern_mount+0x24/0x40 erofs_fscache_register_fs+0x1ef/0x270 [erofs] erofs_fc_fill_super+0x213/0x380 [erofs] This is because the file_system_type of both erofs and the pseudo-mount point of domain_id is erofs_fs_type, so two successive calls to alloc_super() are considered to be using the same lock and trigger the warning above. Therefore add a nodev file_system_type called erofs_anon_fs_type in fscache.c to silence this complaint. Because kern_mount() takes a pointer to struct file_system_type, not its (string) name. So we don't need to call register_filesystem(). In addition, call init_pseudo() in erofs_anon_init_fs_context() as suggested by Al Viro, so that we can remove erofs_fc_fill_pseudo_super(), erofs_fc_anon_get_tree(), and erofs_anon_context_ops. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: a9849560c55e ("erofs: introduce a pseudo mnt to manage shared cookies") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-and-tested-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yang Erkun <yangerkun@huawei.com> Link: https://lore.kernel.org/r/20240307101018.2021925-1-libaokun1@huawei.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-02-25erofs: port device access to fileChristian Brauner
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-20-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-10erofs: make erofs_{err,info}() support NULL sb parameterChunhai Guo
Make erofs_err() and erofs_info() support NULL sb parameter for more general usage. Suggested-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Link: https://lore.kernel.org/r/20240103123202.3054718-1-guochunhai@vivo.com Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-11-07Merge tag 'vfs-6.7.fsid' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fanotify fsid updates from Christian Brauner: "This work is part of the plan to enable fanotify to serve as a drop-in replacement for inotify. While inotify is availabe on all filesystems, fanotify currently isn't. In order to support fanotify on all filesystems two things are needed: (1) all filesystems need to support AT_HANDLE_FID (2) all filesystems need to report a non-zero f_fsid This contains (1) and allows filesystems to encode non-decodable file handlers for fanotify without implementing any exportfs operations by encoding a file id of type FILEID_INO64_GEN from i_ino and i_generation. Filesystems that want to opt out of encoding non-decodable file ids for fanotify that don't support NFS export can do so by providing an empty export_operations struct. This also partially addresses (2) by generating f_fsid for simple filesystems as well as freevxfs. Remaining filesystems will be dealt with by separate patches. Finally, this contains the patch from the current exportfs maintainers which moves exportfs under vfs with Chuck, Jeff, and Amir as maintainers and vfs.git as tree" * tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: create an entry for exportfs fs: fix build error with CONFIG_EXPORTFS=m or not defined freevxfs: derive f_fsid from bdev->bd_dev fs: report f_fsid from s_dev for "simple" filesystems exportfs: support encoding non-decodeable file handles by default exportfs: define FILEID_INO64_GEN* file handle types exportfs: make ->encode_fh() a mandatory method for NFS export exportfs: add helpers to check if filesystem can encode/decode file handles
2023-11-02Merge tag 'erofs-for-6.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "Nothing exciting lands for this cycle, since we're still busying in developing support for sub-page blocks and large-folios of compressed data for new scenarios on Android. In this cycle, MicroLZMA format is marked as stable, and there are minor cleanups around documentation and codebase. In addition, it also fixes incorrect lockref usage in erofs_insert_workgroup(). Summary: - Fix inode metadata space layout documentation - Avoid warning for MicroLZMA format anymore - Fix erofs_insert_workgroup() lockref usage - Some cleanups" * tag 'erofs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix erofs_insert_workgroup() lockref usage erofs: tidy up redundant includes erofs: get rid of ROOT_NID() erofs: simplify compression configuration parser erofs: don't warn MicroLZMA format anymore erofs: fix inode metadata space layout description in documentation
2023-10-31erofs: tidy up redundant includesFerry Meng
- Remove unused includes like <linux/parser.h> and <linux/prefetch.h>; - Move common includes into "internal.h". Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20231026021627.23284-2-mengferry@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-10-31erofs: get rid of ROOT_NID()Ferry Meng
Let's open code this helper for simplicity. Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20231026021627.23284-1-mengferry@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-10-31erofs: simplify compression configuration parserGao Xiang
Move erofs_load_compr_cfgs() into decompressor.c as well as introduce a callback instead of a hard-coded switch for each algorithm for simplicity. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231022130957.11398-1-xiang@kernel.org
2023-10-28exportfs: make ->encode_fh() a mandatory method for NFS exportAmir Goldstein
Rename the default helper for encoding FILEID_INO32_GEN* file handles to generic_encode_ino32_fh() and convert the filesystems that used the default implementation to use the generic helper explicitly. After this change, exportfs_encode_inode_fh() no longer has a default implementation to encode FILEID_INO32_GEN* file handles. This is a step towards allowing filesystems to encode non-decodeable file handles for fanotify without having to implement any export_operations. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-3-amir73il@gmail.com Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28erofs: Convert to use bdev_open_by_path()Jan Kara
Convert erofs to use bdev_open_by_path() and pass the handle around. CC: Gao Xiang <xiang@kernel.org> CC: Chao Yu <chao@kernel.org> CC: linux-erofs@lists.ozlabs.org Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-21-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-19erofs: allow empty device tags in flatdev modeJingbo Xu
Device tags aren't actually required in flatdev mode, thus fix mount failure due to empty device tags in flatdev mode. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Fixes: 8b465fecc35a ("erofs: support flattened block device for multi-blob images") Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230915082728.56588-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-08-23erofs: don't warn dedupe and fragments features anymoresunshijie
The `dedupe` and `fragments` features have been merged for a year. They are mostly stable now. Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: sunshijie <sunshijie@xiaomi.com> Link: https://lore.kernel.org/r/20230821041737.2673401-1-sunshijie@xiaomi.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-08-23erofs: remove redundant erofs_fs_type declaration in super.cFerry Meng
As erofs_fs_type has been declared in internal.h, there is no use to declare repeatedly in super.c. Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> eviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230815094849.53249-3-mengferry@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-08-23erofs: add necessary kmem_cache_create flags for erofs inode cacheFerry Meng
To improve memory access efficiency and enable statistics functionality, add SLAB_MEM_SPREAD and SLAB_ACCOUNT flag during erofs_inode_cachep's allocation time. Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230815094849.53249-2-mengferry@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-08-23erofs: clean up redundant comment and adjust code alignmentFerry Meng
Remove some redundant comments in erofs/super.c, and avoid unncessary line breaks for cleanup. Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230815094849.53249-1-mengferry@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-08-11erofs: boost negative xattr lookup with bloom filterJingbo Xu
Optimise the negative xattr lookup with bloom filter. The bit value for the bloom filter map has a reverse semantics for compatibility. That is, the bit value of 0 indicates existence, while the bit value of 1 indicates the absence of corresponding xattr. The initial version is _only_ enabled when xattr_filter_reserved is zero. The filter map internals may change in the future, in which case the reserved flag will be set non-zero and we don't need bothering the compatible bits again at that time. For now disable the optimization if this reserved flag is non-zero. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230722094538.11754-3-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-08-11erofs: DEFLATE compression supportGao Xiang
Add DEFLATE compression as the 3rd supported algorithm. DEFLATE is a popular generic-purpose compression algorithm for quite long time (many advanced formats like gzip, zlib, zip, png are all based on that) as Apple documentation written "If you require interoperability with non-Apple devices, use COMPRESSION_ZLIB. [1]". Due to its popularity, there are several hardware on-market DEFLATE accelerators, such as (s390) DFLTCC, (Intel) IAA/QAT, (HiSilicon) ZIP accelerator, etc. In addition, there are also several high-performence IP cores and even open-source FPGA approches available for DEFLATE. Therefore, it's useful to support DEFLATE compression in order to find a way to utilize these accelerators for asynchronous I/Os and get benefits from these later. Besides, it's a good choice to trade off between compression ratios and performance compared to LZ4 and LZMA. The DEFLATE core format is simple as well as easy to understand, therefore the code size of its decompressor is small even for the bootloader use cases. The runtime memory consumption is quite limited too (e.g. 32K + ~7K for each zlib stream). As usual, EROFS ourperforms similar approaches too. Alternatively, DEFLATE could still be used for some specific files since EROFS supports multiple compression algorithms in one image. [1] https://developer.apple.com/documentation/compression/compression_algorithm Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230810154859.118330-1-hsiangkao@linux.alibaba.com
2023-08-01erofs: drop unnecessary WARN_ON() in erofs_kill_sb()Gao Xiang
Previously, .kill_sb() will be called only after fill_super fails. It will be changed [1]. Besides, checking for s_magic in erofs_kill_sb() is unnecessary from any point of view. Let's get rid of it now. [1] https://lore.kernel.org/r/20230731-flugbereit-wohnlage-78acdf95ab7e@brauner Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230801014737.28614-1-hsiangkao@linux.alibaba.com
2023-06-26Merge tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - NVMe pull request via Keith: - Various cleanups all around (Irvin, Chaitanya, Christophe) - Better struct packing (Christophe JAILLET) - Reduce controller error logs for optional commands (Keith) - Support for >=64KiB block sizes (Daniel Gomez) - Fabrics fixes and code organization (Max, Chaitanya, Daniel Wagner) - bcache updates via Coly: - Fix a race at init time (Mingzhe Zou) - Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye) - use page pinning in the block layer for dio (David) - convert old block dio code to page pinning (David, Christoph) - cleanups for pktcdvd (Andy) - cleanups for rnbd (Guoqing) - use the unchecked __bio_add_page() for the initial single page additions (Johannes) - fix overflows in the Amiga partition handling code (Michael) - improve mq-deadline zoned device support (Bart) - keep passthrough requests out of the IO schedulers (Christoph, Ming) - improve support for flush requests, making them less special to deal with (Christoph) - add bdev holder ops and shutdown methods (Christoph) - fix the name_to_dev_t() situation and use cases (Christoph) - decouple the block open flags from fmode_t (Christoph) - ublk updates and cleanups, including adding user copy support (Ming) - BFQ sanity checking (Bart) - convert brd from radix to xarray (Pankaj) - constify various structures (Thomas, Ivan) - more fine grained persistent reservation ioctl capability checks (Jingbo) - misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan, Jordy, Li, Min, Yu, Zhong, Waiman) * tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits) scsi/sg: don't grab scsi host module reference ext4: Fix warning in blkdev_put() block: don't return -EINVAL for not found names in devt_from_devname cdrom: Fix spectre-v1 gadget block: Improve kernel-doc headers blk-mq: don't insert passthrough request into sw queue bsg: make bsg_class a static const structure ublk: make ublk_chr_class a static const structure aoe: make aoe_class a static const structure block/rnbd: make all 'class' structures const block: fix the exclusive open mask in disk_scan_partitions block: add overflow checks for Amiga partition support block: change all __u32 annotations to __be32 in affs_hardblocks.h block: fix signed int overflow in Amiga partition support block: add capacity validation in bdev_add_partition() block: fine-granular CAP_SYS_ADMIN for Persistent Reservation block: disallow Persistent Reservation on partitions reiserfs: fix blkdev_put() warning from release_journal_dev() block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions() block: document the holder argument to blkdev_get_by_path ...
2023-06-22erofs: remove unnecessary gotoYangtao Li
It's redundant, let's remove it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20230615034539.14286-1-frank.li@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-06-12block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: use the holder as indication for exclusive opensChristoph Hellwig
The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05block: introduce holder opsChristoph Hellwig
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-29erofs: adapt managed inode operations into foliosGao Xiang
This patch gets rid of erofs_try_to_free_cached_page() and fold it into .release_folio(). It also moves managed inode operations into zdata.c, which simplifies the code a bit. No logic changes. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230526201459.128169-5-hsiangkao@linux.alibaba.com
2023-04-17erofs: enable long extended attribute name prefixesJingbo Xu
Let's enable long xattr name prefix feature. Old kernels will just ignore / skip such extended attributes. In addition, in case you don't want to mount such images, add another incompatible feature as an option for this. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407222808.19670-1-jefflexu@linux.alibaba.com [ Gao Xiang: minor commit message fix. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: add helpers to load long xattr name prefixesJingbo Xu
Long xattr name prefixes will be scanned upon mounting and the in-memory long xattr name prefix array will be initialized accordingly. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-6-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: move packed inode out of the compression partJingbo Xu
packed inode could be used in more scenarios which are independent of compression in the future. For example, packed inode could be used to keep extra long xattr prefixes with the help of following patches. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-4-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: keep meta inode into erofs_bufGao Xiang
So that erofs_read_metadata() can read metadata from other inodes (e.g. packed inode) as well. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-2-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: initialize packed inode after root inode is assignedJingbo Xu
As commit 8f7acdae2cd4 ("staging: erofs: kill all failure handling in fill_super()"), move the initialization of packed inode after root inode is assigned, so that the iput() in .put_super() is adequate as the failure handling. Otherwise, iput() is also needed in .kill_sb(), in case of the mounting fails halfway. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Fixes: b15b2e307c3a ("erofs: support on-disk compressed fragments data") Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-3-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: don't warn ztailpacking feature anymoreYue Hu
The ztailpacking feature has been merged for a year, it has been mostly stable now. Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230227084457.3510-1-zbestahu@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: support flattened block device for multi-blob imagesJia Zhu
In order to support mounting multi-blobs container image as a single block device, add flattened block device feature for EROFS. In this mode, all meta/data contents will be mapped into one block space. User could compose a block device(by nbd/ublk/virtio-blk/ vhost-user-blk) from multiple sources and mount the block device by EROFS directly. It can reduce the number of block devices used, and it's also benefits in both VM file passthrough and distributed storage scenarios. You can test this using the method mentioned by: https://github.com/dragonflyoss/image-service/pull/1139 1. Compose a (nbd)block device from multi-blobs. 2. Mount EROFS on mntdir/. 3. Compare the md5sum between source dir and mntdir/. Later, we could also use it to refer original tar blobs. Signed-off-by: Jia Zhu <zhujia.zj@bytedance.com> Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Tested-by: Jiang Liu <gerry@linux.alibaba.com> Link: https://lore.kernel.org/r/20230302071751.48425-1-zhujia.zj@bytedance.com [ Gao Xiang: refine commit message and use erofs_pos(). ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: set block size to the on-disk block sizeJingbo Xu
Set the block size to that specified in on-disk superblock. Also remove the hard constraint of PAGE_SIZE block size for the uncompressed device backend. This constraint is temporarily remained for compressed device and fscache backend, as there is more work needed to handle the condition where the block size is not equal to PAGE_SIZE. It is worth noting that the on-disk block size is read prior to erofs_superblock_csum_verify(), as the read block size is needed in the latter. Besides, later we are going to make erofs refer to tar data blobs (which is 512-byte aligned) for OCI containers, where the block size is 512 bytes. In this case, the 512-byte block size may not be adequate for a directory to contain enough dirents. To fix this, we are also going to introduce directory block size independent on the block size. Due to we have already supported block size smaller than PAGE_SIZE now, disable all these images with such separated directory block size until we supported this feature later. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230313135309.75269-3-jefflexu@linux.alibaba.com [ Gao Xiang: update documentation. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: avoid hardcoded blocksize for subpage block supportJingbo Xu
As the first step of converting hardcoded blocksize to that specified in on-disk superblock, convert all call sites of hardcoded blocksize to sb->s_blocksize except for: 1) use sbi->blkszbits instead of sb->s_blocksize in erofs_superblock_csum_verify() since sb->s_blocksize has not been updated with the on-disk blocksize yet when the function is called. 2) use inode->i_blkbits instead of sb->s_blocksize in erofs_bread(), since the inode operated on may be an anonymous inode in fscache mode. Currently the anonymous inode is allocated from an anonymous mount maintained in erofs, while in the near future we may allocate anonymous inodes from a generic API directly and thus have no access to the anonymous inode's i_sb. Thus we keep the block size in i_blkbits for anonymous inodes in fscache mode. Be noted that this patch only gets rid of the hardcoded blocksize, in preparation for actually setting the on-disk block size in the following patch. The hard limit of constraining the block size to PAGE_SIZE still exists until the next patch. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230313135309.75269-2-jefflexu@linux.alibaba.com [ Gao Xiang: fold a patch to fix incorrect truncated offsets. ] Link: https://lore.kernel.org/r/20230413035734.15457-1-zhujia.zj@bytedance.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: unify anonymous inodes for blobJingbo Xu
Currently there're two anonymous inodes (inode and anon_inode in struct erofs_fscache) for each blob. The former was introduced as the address_space of page cache for bootstrap. The latter was initially introduced as both the address_space of page cache and also a sentinel in the shared domain. Since now the management of cookies in share domain has been decoupled with the anonymous inode, there's no need to maintain an extra anonymous inode. Let's unify these two anonymous inodes. Besides, in non-share-domain mode only bootstrap will allocate anonymous inode. To simplify the implementation, always allocate anonymous inode for both bootstrap and data blobs. Similarly release anonymous inodes for data blobs when .put_super() is called, or we'll get "VFS: Busy inodes after unmount." warning. Also remove the redundant set_nlink() when initializing the anonymous inode, since i_nlink has already been initialized to 1 when the inode gets allocated. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Link: https://lore.kernel.org/r/20230209063913.46341-5-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: remove linux/buffer_head.h dependencyGao Xiang
EROFS actually never uses buffer heads, therefore just get rid of BH_xxx definitions and linux/buffer_head.h inclusive. Link: https://lore.kernel.org/r/20230113065226.68801-2-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-01-16erofs: clean up parsing of fscache related optionsJingbo Xu
... to avoid the mess of conditional preprocessing as we are continually adding fscache related mount options. Reviewd-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20230112065431.124926-3-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-12-07erofs: check the uniqueness of fsid in shared domain in advanceHou Tao
When shared domain is enabled, doing mount twice with the same fsid and domain_id will trigger sysfs warning as shown below: sysfs: cannot create duplicate filename '/fs/erofs/d0,meta.bin' CPU: 15 PID: 1051 Comm: mount Not tainted 6.1.0-rc6+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <TASK> dump_stack_lvl+0x38/0x49 dump_stack+0x10/0x12 sysfs_warn_dup.cold+0x17/0x27 sysfs_create_dir_ns+0xb8/0xd0 kobject_add_internal+0xb1/0x240 kobject_init_and_add+0x71/0xa0 erofs_register_sysfs+0x89/0x110 erofs_fc_fill_super+0x98c/0xaf0 vfs_get_super+0x7d/0x100 get_tree_nodev+0x16/0x20 erofs_fc_get_tree+0x20/0x30 vfs_get_tree+0x24/0xb0 path_mount+0x2fa/0xa90 do_mount+0x7c/0xa0 __x64_sys_mount+0x8b/0xe0 do_syscall_64+0x30/0x60 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The reason is erofs_fscache_register_cookie() doesn't guarantee the primary data blob (aka fsid) is unique in the shared domain and erofs_register_sysfs() invoked by the second mount will fail due to the duplicated fsid in the shared domain and report warning. It would be better to check the uniqueness of fsid before doing erofs_register_sysfs(), so adding a new flags parameter for erofs_fscache_register_cookie() and doing the uniqueness check if EROFS_REG_COOKIE_NEED_NOEXIST is enabled. After the patch, the error in dmesg for the duplicated mount would be: erofs: ...: erofs_domain_register_cookie: XX already exists in domain YY Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20221125110822.3812942-1-houtao@huaweicloud.com Fixes: 7d41963759fe ("erofs: Support sharing cookies in the same domain") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-11-10erofs: fix use-after-free of fsid and domain_id stringJingbo Xu
When erofs instance is remounted with fsid or domain_id mount option specified, the original fsid and domain_id string pointer in sbi->opt is directly overridden with the fsid and domain_id string in the new fs_context, without freeing the original fsid and domain_id string. What's worse, when the new fsid and domain_id string is transferred to sbi, they are not reset to NULL in fs_context, and thus they are freed when remount finishes, while sbi is still referring to these strings. Reconfiguration for fsid and domain_id seems unusual. Thus clarify this restriction explicitly and dump a warning when users are attempting to do this. Besides, to fix the use-after-free issue, move fsid and domain_id from erofs_mount_opts to outside. Fixes: c6be2bd0a5dd ("erofs: register fscache volume") Fixes: 8b7adf1dff3d ("erofs: introduce fscache-based domain") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20221021023153.1330-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-27erofs: clean up erofs_iget()Gao Xiang
isdir indicated REQ_META|REQ_PRIO which no longer works now. Get rid of isdir entirely. Link: https://lore.kernel.org/r/20220927063607.54832-2-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-26erofs: introduce partial-referenced pclustersGao Xiang
Due to deduplication for compressed data, pclusters can be partially referenced with their prefixes. Together with the user-space implementation, it enables EROFS variable-length global compressed data deduplication with rolling hash. Link: https://lore.kernel.org/r/20220923014915.4362-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-26erofs: support on-disk compressed fragments dataYue Hu
Introduce on-disk compressed fragments data feature. This approach adds a new field called `h_fragmentoff' in the per-file compression header to indicate the fragment offset of each tail pcluster or the whole file in the special packed inode. Similar to ztailpacking, it will also find and record the 'headlcn' of the tail pcluster when initializing per-inode zmap for making follow-on requests more easy. Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/YzHKxcFTlHGgXeH9@B-P7TQMD6M-0146.local Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-20erofs: introduce 'domain_id' mount optionJia Zhu
Introduce 'domain_id' mount option to enable shared domain sementics. In which case, the related cookie is shared if two mountpoints in the same domain have the same data blob. Users could specify the name of domain by this mount option. Signed-off-by: Jia Zhu <zhujia.zj@bytedance.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20220918043456.147-7-zhujia.zj@bytedance.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>