diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-05-24 18:42:04 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-05-24 18:42:04 -0700 |
commit | 65965d9530b0c320759cd18a9a5975fb2e098462 (patch) | |
tree | 150e4b53ada8c9c5a9d4db50feb113418103e9b5 /fs/erofs/internal.h | |
parent | 850f6033cd2bf3b1fcbf9a20d078edab7e7c67b4 (diff) | |
parent | ba73eadd23d1c2dc5c8dc0c0ae2eeca2b9b709a7 (diff) |
Merge tag 'erofs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs (and fscache) updates from Gao Xiang:
"After working on it on the mailing list for more than half a year, we
finally form 'erofs over fscache' feature into shape. Hopefully it
could bring more possibility to the communities.
The story mainly started from a new project what we called "RAFS v6" [1]
for Nydus image service almost a year ago, which enhances EROFS to be
a new form of one bootstrap (which includes metadata representing the
whole fs tree) + several data-deduplicated content addressable blobs
(actually treated as multiple devices). Each blob can represent one
container image layer but not quite exactly since all new data can be
fully existed in the previous blobs so no need to introduce another
new blob.
It is actually not a new idea (at least on my side it's much like a
simpilied casync [2] for now) and has many benefits over per-file
blobs or some other exist ways since typically each RAFS v6 image only
has dozens of device blobs instead of thousands of per-file blobs.
It's easy to be signed with user keys as a golden image, transfered
untouchedly with minimal overhead over the network, kept in some type
of storage conveniently, and run with (optional) runtime verification
but without involving too many irrelevant features crossing the system
beyond EROFS itself. At least it's our final goal and we're keeping
working on it. There was also a good summary of this approach from the
casync author [3].
Regardless further optimizations, this work is almost done in the
previous Linux release cycles. In this round, we'd like to introduce
on-demand load for EROFS with the fscache/cachefiles infrastructure,
considering the following advantages:
- Introduce new file-based backend to EROFS. Although each image only
contains dozens of blobs but in densely-deployed runC host for
example, there could still be massive blobs on a machine, which is
messy if each blob is treated as a device. In contrast, fscache and
cachefiles are really great interfaces for us to make them work.
- Introduce on-demand load to fscache and EROFS. Previously, fscache
is mainly used to caching network-likewise filesystems, now it can
support on-demand downloading for local fses too with the exact
localfs on-disk format. It has many advantages which we're been
described in the latest patchset cover letter [4]. In addition to
that, most importantly, the cached data is still stored in the
original local fs on-disk format so that it's still the one signed
with private keys but only could be partially available. Users can
fully trust it during running. Later, users can also back up
cachefiles easily to another machine.
- More reliable on-demand approach in principle. After data is all
available locally, user daemon can be no longer online in some use
cases, which helps daemon crash recovery (filesystems can still in
service) and hot-upgrade (user daemon can be upgraded more
frequently due to new features or protocols introduced.)
- Other format can also be converted to EROFS filesystem format over
the internet on the fly with the new on-demand load feature and
mounted. That is entirely possible with on-demand load feature as
long as such archive format metadata can be fetched in advance like
stargz.
In addition, although currently our target user is Nydus image service [5],
but laterly, it can be used for other use cases like on-demand system
booting, etc. As for the fscache on-demand load feature itself,
strictly it can be used for other local fses too. Laterly we could
promote most code to the iomap infrastructure and also enhance it in
the read-write way if other local fses are interested.
Thanks David Howells for taking so much time and patience on this
these months, many thanks with great respect here again! Thanks Jeffle
for working on this feature and Xin Yin from Bytedance for
asynchronous I/O implementation as well as Zichen Tian, Jia Zhu, and
Yan Song for testing, much appeciated. We're also exploring more
possibly over fscache cache management over FSDAX for secure
containers and working on more improvements and useful features for
fscache, cachefiles, and on-demand load.
In addition to "erofs over fscache", NFS export and idmapped mount are
also completed in this cycle for container use cases as well.
Summary:
- Add erofs on-demand load support over fscache
- Support NFS export for erofs
- Support idmapped mounts for erofs
- Don't prompt for risk any more when using big pcluster
- Fix buffer copy overflow of ztailpacking feature
- Several minor cleanups"
[1] https://lore.kernel.org/r/20210730194625.93856-1-hsiangkao@linux.alibaba.com
[2] https://github.com/systemd/casync
[3] http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html
[4] https://lore.kernel.org/r/20220509074028.74954-1-jefflexu@linux.alibaba.com
[5] https://github.com/dragonflyoss/image-service
* tag 'erofs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (29 commits)
erofs: scan devices from device table
erofs: change to use asynchronous io for fscache readpage/readahead
erofs: add 'fsid' mount option
erofs: implement fscache-based data readahead
erofs: implement fscache-based data read for inline layout
erofs: implement fscache-based data read for non-inline layout
erofs: implement fscache-based metadata read
erofs: register fscache context for extra data blobs
erofs: register fscache context for primary data blob
erofs: add erofs_fscache_read_folios() helper
erofs: add anonymous inode caching metadata for data blobs
erofs: add fscache context helper functions
erofs: register fscache volume
erofs: add fscache mode check helper
erofs: make erofs_map_blocks() generally available
cachefiles: document on-demand read mode
cachefiles: add tracepoints for on-demand read mode
cachefiles: enable on-demand read mode
cachefiles: implement on-demand read
cachefiles: notify the user daemon when withdrawing cookie
...
Diffstat (limited to 'fs/erofs/internal.h')
-rw-r--r-- | fs/erofs/internal.h | 76 |
1 files changed, 50 insertions, 26 deletions
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 5298c4ee277d..cfee49d33b95 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -49,6 +49,7 @@ typedef u32 erofs_blk_t; struct erofs_device_info { char *path; + struct erofs_fscache *fscache; struct block_device *bdev; struct dax_device *dax_dev; u64 dax_part_off; @@ -74,6 +75,7 @@ struct erofs_mount_opts { unsigned int max_sync_decompress_pages; #endif unsigned int mount_opt; + char *fsid; }; struct erofs_dev_context { @@ -96,6 +98,11 @@ struct erofs_sb_lz4_info { u16 max_pclusterblks; }; +struct erofs_fscache { + struct fscache_cookie *cookie; + struct inode *inode; +}; + struct erofs_sb_info { struct erofs_mount_opts opt; /* options */ #ifdef CONFIG_EROFS_FS_ZIP @@ -146,6 +153,10 @@ struct erofs_sb_info { /* sysfs support */ struct kobject s_kobj; /* /sys/fs/erofs/<devname> */ struct completion s_kobj_unregister; + + /* fscache support */ + struct fscache_volume *volume; + struct erofs_fscache *s_fscache; }; #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info) @@ -161,6 +172,11 @@ struct erofs_sb_info { #define set_opt(opt, option) ((opt)->mount_opt |= EROFS_MOUNT_##option) #define test_opt(opt, option) ((opt)->mount_opt & EROFS_MOUNT_##option) +static inline bool erofs_is_fscache_mode(struct super_block *sb) +{ + return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !sb->s_bdev; +} + enum { EROFS_ZIP_CACHE_DISABLED, EROFS_ZIP_CACHE_READAHEAD, @@ -381,31 +397,6 @@ extern const struct super_operations erofs_sops; extern const struct address_space_operations erofs_raw_access_aops; extern const struct address_space_operations z_erofs_aops; -/* - * Logical to physical block mapping - * - * Different with other file systems, it is used for 2 access modes: - * - * 1) RAW access mode: - * - * Users pass a valid (m_lblk, m_lofs -- usually 0) pair, - * and get the valid m_pblk, m_pofs and the longest m_len(in bytes). - * - * Note that m_lblk in the RAW access mode refers to the number of - * the compressed ondisk block rather than the uncompressed - * in-memory block for the compressed file. - * - * m_pofs equals to m_lofs except for the inline data page. - * - * 2) Normal access mode: - * - * If the inode is not compressed, it has no difference with - * the RAW access mode. However, if the inode is compressed, - * users should pass a valid (m_lblk, m_lofs) pair, and get - * the needed m_pblk, m_pofs, m_len to get the compressed data - * and the updated m_lblk, m_lofs which indicates the start - * of the corresponding uncompressed data in the file. - */ enum { BH_Encoded = BH_PrivateStart, BH_FullMapped, @@ -467,6 +458,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode, #endif /* !CONFIG_EROFS_FS_ZIP */ struct erofs_map_dev { + struct erofs_fscache *m_fscache; struct block_device *m_bdev; struct dax_device *m_daxdev; u64 m_dax_part_off; @@ -486,6 +478,8 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev); int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len); +int erofs_map_blocks(struct inode *inode, + struct erofs_map_blocks *map, int flags); /* inode.c */ static inline unsigned long erofs_inode_hash(erofs_nid_t nid) @@ -509,7 +503,7 @@ int erofs_getattr(struct user_namespace *mnt_userns, const struct path *path, /* namei.c */ extern const struct inode_operations erofs_dir_iops; -int erofs_namei(struct inode *dir, struct qstr *name, +int erofs_namei(struct inode *dir, const struct qstr *name, erofs_nid_t *nid, unsigned int *d_type); /* dir.c */ @@ -611,6 +605,36 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb, } #endif /* !CONFIG_EROFS_FS_ZIP */ +/* fscache.c */ +#ifdef CONFIG_EROFS_FS_ONDEMAND +int erofs_fscache_register_fs(struct super_block *sb); +void erofs_fscache_unregister_fs(struct super_block *sb); + +int erofs_fscache_register_cookie(struct super_block *sb, + struct erofs_fscache **fscache, + char *name, bool need_inode); +void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache); + +extern const struct address_space_operations erofs_fscache_access_aops; +#else +static inline int erofs_fscache_register_fs(struct super_block *sb) +{ + return 0; +} +static inline void erofs_fscache_unregister_fs(struct super_block *sb) {} + +static inline int erofs_fscache_register_cookie(struct super_block *sb, + struct erofs_fscache **fscache, + char *name, bool need_inode) +{ + return -EOPNOTSUPP; +} + +static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache) +{ +} +#endif + #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ #endif /* __EROFS_INTERNAL_H */ |