diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/f2fs.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/fscrypt.rst | 4 | ||||
-rw-r--r-- | Documentation/filesystems/fsverity.rst | 96 | ||||
-rw-r--r-- | Documentation/filesystems/locking.rst | 24 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/client-identifier.rst | 4 | ||||
-rw-r--r-- | Documentation/filesystems/proc.rst | 1 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.rst | 24 |
7 files changed, 77 insertions, 78 deletions
diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 220f3e0d3f55..2055e72871fe 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -158,7 +158,7 @@ nobarrier This option can be used if underlying storage guarantees If this option is set, no cache_flush commands are issued but f2fs still guarantees the write ordering of all the data writes. -barrier If this option is set, cache_flush commands are allowed to be +barrier If this option is set, cache_flush commands are allowed to be issued. fastboot This option is used when a system wants to reduce mount time as much as possible, even though normal performance diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst index ef183387da20..eccd327e6df5 100644 --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -1277,8 +1277,8 @@ the file contents themselves, as described below: For the read path (->read_folio()) of regular files, filesystems can read the ciphertext into the page cache and decrypt it in-place. The -page lock must be held until decryption has finished, to prevent the -page from becoming visible to userspace prematurely. +folio lock must be held until decryption has finished, to prevent the +folio from becoming visible to userspace prematurely. For the write path (->writepage()) of regular files, filesystems cannot encrypt data in-place in the page cache, since the cached diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst index cb8e7573882a..ede672dedf11 100644 --- a/Documentation/filesystems/fsverity.rst +++ b/Documentation/filesystems/fsverity.rst @@ -118,10 +118,11 @@ as follows: - ``hash_algorithm`` must be the identifier for the hash algorithm to use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See ``include/uapi/linux/fsverity.h`` for the list of possible values. -- ``block_size`` must be the Merkle tree block size. Currently, this - must be equal to the system page size, which is usually 4096 bytes. - Other sizes may be supported in the future. This value is not - necessarily the same as the filesystem block size. +- ``block_size`` is the Merkle tree block size, in bytes. In Linux + v6.3 and later, this can be any power of 2 between (inclusively) + 1024 and the minimum of the system page size and the filesystem + block size. In earlier versions, the page size was the only allowed + value. - ``salt_size`` is the size of the salt in bytes, or 0 if no salt is provided. The salt is a value that is prepended to every hashed block; it can be used to personalize the hashing for a particular @@ -161,6 +162,7 @@ FS_IOC_ENABLE_VERITY can fail with the following errors: - ``EBUSY``: this ioctl is already running on the file - ``EEXIST``: the file already has verity enabled - ``EFAULT``: the caller provided inaccessible memory +- ``EFBIG``: the file is too large to enable verity on - ``EINTR``: the operation was interrupted by a fatal signal - ``EINVAL``: unsupported version, hash algorithm, or block size; or reserved bits are set; or the file descriptor refers to neither a @@ -495,9 +497,11 @@ To create verity files on an ext4 filesystem, the filesystem must have been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on it. "verity" is an RO_COMPAT filesystem feature, so once set, old kernels will only be able to mount the filesystem readonly, and old -versions of e2fsck will be unable to check the filesystem. Moreover, -currently ext4 only supports mounting a filesystem with the "verity" -feature when its block size is equal to PAGE_SIZE (often 4096 bytes). +versions of e2fsck will be unable to check the filesystem. + +Originally, an ext4 filesystem with the "verity" feature could only be +mounted when its block size was equal to the system page size +(typically 4096 bytes). In Linux v6.3, this limitation was removed. ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared. @@ -518,9 +522,7 @@ support paging multi-gigabyte xattrs into memory, and to support encrypting xattrs. Note that the verity metadata *must* be encrypted when the file is, since it contains hashes of the plaintext data. -Currently, ext4 verity only supports the case where the Merkle tree -block size, filesystem block size, and page size are all the same. It -also only supports extent-based files. +ext4 only allows verity on extent-based files. f2fs ---- @@ -538,11 +540,10 @@ Like ext4, f2fs stores the verity metadata (Merkle tree and fsverity_descriptor) past the end of the file, starting at the first 64K boundary beyond i_size. See explanation for ext4 above. Moreover, f2fs supports at most 4096 bytes of xattr entries per inode -which wouldn't be enough for even a single Merkle tree block. +which usually wouldn't be enough for even a single Merkle tree block. -Currently, f2fs verity only supports a Merkle tree block size of 4096. -Also, f2fs doesn't support enabling verity on files that currently -have atomic or volatile writes pending. +f2fs doesn't support enabling verity on files that currently have +atomic or volatile writes pending. btrfs ----- @@ -567,51 +568,48 @@ Pagecache ~~~~~~~~~ For filesystems using Linux's pagecache, the ``->read_folio()`` and -``->readahead()`` methods must be modified to verify pages before they -are marked Uptodate. Merely hooking ``->read_iter()`` would be +``->readahead()`` methods must be modified to verify folios before +they are marked Uptodate. Merely hooking ``->read_iter()`` would be insufficient, since ``->read_iter()`` is not used for memory maps. -Therefore, fs/verity/ provides a function fsverity_verify_page() which -verifies a page that has been read into the pagecache of a verity -inode, but is still locked and not Uptodate, so it's not yet readable -by userspace. As needed to do the verification, -fsverity_verify_page() will call back into the filesystem to read -Merkle tree pages via fsverity_operations::read_merkle_tree_page(). +Therefore, fs/verity/ provides the function fsverity_verify_blocks() +which verifies data that has been read into the pagecache of a verity +inode. The containing folio must still be locked and not Uptodate, so +it's not yet readable by userspace. As needed to do the verification, +fsverity_verify_blocks() will call back into the filesystem to read +hash blocks via fsverity_operations::read_merkle_tree_page(). -fsverity_verify_page() returns false if verification failed; in this -case, the filesystem must not set the page Uptodate. Following this, +fsverity_verify_blocks() returns false if verification failed; in this +case, the filesystem must not set the folio Uptodate. Following this, as per the usual Linux pagecache behavior, attempts by userspace to -read() from the part of the file containing the page will fail with -EIO, and accesses to the page within a memory map will raise SIGBUS. - -fsverity_verify_page() currently only supports the case where the -Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes). +read() from the part of the file containing the folio will fail with +EIO, and accesses to the folio within a memory map will raise SIGBUS. -In principle, fsverity_verify_page() verifies the entire path in the -Merkle tree from the data page to the root hash. However, for -efficiency the filesystem may cache the hash pages. Therefore, -fsverity_verify_page() only ascends the tree reading hash pages until -an already-verified hash page is seen, as indicated by the PageChecked -bit being set. It then verifies the path to that page. +In principle, verifying a data block requires verifying the entire +path in the Merkle tree from the data block to the root hash. +However, for efficiency the filesystem may cache the hash blocks. +Therefore, fsverity_verify_blocks() only ascends the tree reading hash +blocks until an already-verified hash block is seen. It then verifies +the path to that block. This optimization, which is also used by dm-verity, results in excellent sequential read performance. This is because usually (e.g. -127 in 128 times for 4K blocks and SHA-256) the hash page from the +127 in 128 times for 4K blocks and SHA-256) the hash block from the bottom level of the tree will already be cached and checked from -reading a previous data page. However, random reads perform worse. +reading a previous data block. However, random reads perform worse. Block device based filesystems ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Block device based filesystems (e.g. ext4 and f2fs) in Linux also use the pagecache, so the above subsection applies too. However, they -also usually read many pages from a file at once, grouped into a +also usually read many data blocks from a file at once, grouped into a structure called a "bio". To make it easier for these types of filesystems to support fs-verity, fs/verity/ also provides a function -fsverity_verify_bio() which verifies all pages in a bio. +fsverity_verify_bio() which verifies all data blocks in a bio. ext4 and f2fs also support encryption. If a verity file is also -encrypted, the pages must be decrypted before being verified. To +encrypted, the data must be decrypted before being verified. To support this, these filesystems allocate a "post-read context" for each bio and store it in ``->bi_private``:: @@ -626,14 +624,14 @@ each bio and store it in ``->bi_private``:: verity, or both is enabled. After the bio completes, for each needed postprocessing step the filesystem enqueues the bio_post_read_ctx on a workqueue, and then the workqueue work does the decryption or -verification. Finally, pages where no decryption or verity error -occurred are marked Uptodate, and the pages are unlocked. +verification. Finally, folios where no decryption or verity error +occurred are marked Uptodate, and the folios are unlocked. On many filesystems, files can contain holes. Normally, -``->readahead()`` simply zeroes holes and sets the corresponding pages -Uptodate; no bios are issued. To prevent this case from bypassing -fs-verity, these filesystems use fsverity_verify_page() to verify hole -pages. +``->readahead()`` simply zeroes hole blocks and considers the +corresponding data to be up-to-date; no bios are issued. To prevent +this case from bypassing fs-verity, filesystems use +fsverity_verify_blocks() to verify hole blocks. Filesystems also disable direct I/O on verity files, since otherwise direct I/O would bypass fs-verity. @@ -644,7 +642,7 @@ Userspace utility This document focuses on the kernel, but a userspace utility for fs-verity can be found at: - https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git + https://git.kernel.org/pub/scm/fs/fsverity/fsverity-utils.git See the README.md file in the fsverity-utils source tree for details, including examples of setting up fs-verity protected files. @@ -793,9 +791,9 @@ weren't already directly answered in other parts of this document. :A: There are many reasons why this is not possible or would be very difficult, including the following: - - To prevent bypassing verification, pages must not be marked + - To prevent bypassing verification, folios must not be marked Uptodate until they've been verified. Currently, each - filesystem is responsible for marking pages Uptodate via + filesystem is responsible for marking folios Uptodate via ``->readahead()``. Therefore, currently it's not possible for the VFS to do the verification on its own. Changing this would require significant changes to the VFS and all filesystems. diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 36fa2a83d714..7de7a7272a5e 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -56,35 +56,35 @@ inode_operations prototypes:: - int (*create) (struct inode *,struct dentry *,umode_t, bool); + int (*create) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t, bool); struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int); int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); - int (*symlink) (struct inode *,struct dentry *,const char *); - int (*mkdir) (struct inode *,struct dentry *,umode_t); + int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *); + int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); - int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t); - int (*rename) (struct inode *, struct dentry *, + int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t); + int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *, struct inode *, struct dentry *, unsigned int); int (*readlink) (struct dentry *, char __user *,int); const char *(*get_link) (struct dentry *, struct inode *, struct delayed_call *); void (*truncate) (struct inode *); - int (*permission) (struct inode *, int, unsigned int); + int (*permission) (struct mnt_idmap *, struct inode *, int, unsigned int); struct posix_acl * (*get_inode_acl)(struct inode *, int, bool); - int (*setattr) (struct dentry *, struct iattr *); - int (*getattr) (const struct path *, struct kstat *, u32, unsigned int); + int (*setattr) (struct mnt_idmap *, struct dentry *, struct iattr *); + int (*getattr) (struct mnt_idmap *, const struct path *, struct kstat *, u32, unsigned int); ssize_t (*listxattr) (struct dentry *, char *, size_t); int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, u64 len); void (*update_time)(struct inode *, struct timespec *, int); int (*atomic_open)(struct inode *, struct dentry *, struct file *, unsigned open_flag, umode_t create_mode); - int (*tmpfile) (struct user_namespace *, struct inode *, + int (*tmpfile) (struct mnt_idmap *, struct inode *, struct file *, umode_t); - int (*fileattr_set)(struct user_namespace *mnt_userns, + int (*fileattr_set)(struct mnt_idmap *idmap, struct dentry *dentry, struct fileattr *fa); int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa); - struct posix_acl * (*get_acl)(struct user_namespace *, struct dentry *, int); + struct posix_acl * (*get_acl)(struct mnt_idmap *, struct dentry *, int); locking rules: all may block @@ -135,7 +135,7 @@ prototypes:: struct inode *inode, const char *name, void *buffer, size_t size); int (*set)(const struct xattr_handler *handler, - struct user_namespace *mnt_userns, + struct mnt_idmap *idmap, struct dentry *dentry, struct inode *inode, const char *name, const void *buffer, size_t size, int flags); diff --git a/Documentation/filesystems/nfs/client-identifier.rst b/Documentation/filesystems/nfs/client-identifier.rst index 5147e15815a1..a94c7a9748d7 100644 --- a/Documentation/filesystems/nfs/client-identifier.rst +++ b/Documentation/filesystems/nfs/client-identifier.rst @@ -152,7 +152,7 @@ string: via the kernel command line, or when the "nfs" module is loaded. - /sys/fs/nfs/client/net/identifier + /sys/fs/nfs/net/nfs_client/identifier This virtual file, available since Linux 5.3, is local to the network namespace in which it is accessed and so can provide distinction between network namespaces (containers) when the @@ -164,7 +164,7 @@ then that uniquifier can be used. For example, a uniquifier might be formed at boot using the container's internal identifier: sha256sum /etc/machine-id | awk '{print $1}' \\ - > /sys/fs/nfs/client/net/identifier + > /sys/fs/nfs/net/nfs_client/identifier Security considerations ----------------------- diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index e224b6d5b642..9d5fd9424e8b 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -1284,6 +1284,7 @@ support this. Table 1-9 lists the files and their meaning. rt_cache Routing cache snmp SNMP data sockstat Socket statistics + softnet_stat Per-CPU incoming packets queues statistics of online CPUs tcp TCP sockets udp UDP sockets unix UNIX domain sockets diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 2c15e7053113..c53f30251a66 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -421,31 +421,31 @@ As of kernel 2.6.22, the following members are defined: .. code-block:: c struct inode_operations { - int (*create) (struct user_namespace *, struct inode *,struct dentry *, umode_t, bool); + int (*create) (struct mnt_idmap *, struct inode *,struct dentry *, umode_t, bool); struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int); int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); - int (*symlink) (struct user_namespace *, struct inode *,struct dentry *,const char *); - int (*mkdir) (struct user_namespace *, struct inode *,struct dentry *,umode_t); + int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *); + int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); - int (*mknod) (struct user_namespace *, struct inode *,struct dentry *,umode_t,dev_t); - int (*rename) (struct user_namespace *, struct inode *, struct dentry *, + int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t); + int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *, struct inode *, struct dentry *, unsigned int); int (*readlink) (struct dentry *, char __user *,int); const char *(*get_link) (struct dentry *, struct inode *, struct delayed_call *); - int (*permission) (struct user_namespace *, struct inode *, int); + int (*permission) (struct mnt_idmap *, struct inode *, int); struct posix_acl * (*get_inode_acl)(struct inode *, int, bool); - int (*setattr) (struct user_namespace *, struct dentry *, struct iattr *); - int (*getattr) (struct user_namespace *, const struct path *, struct kstat *, u32, unsigned int); + int (*setattr) (struct mnt_idmap *, struct dentry *, struct iattr *); + int (*getattr) (struct mnt_idmap *, const struct path *, struct kstat *, u32, unsigned int); ssize_t (*listxattr) (struct dentry *, char *, size_t); void (*update_time)(struct inode *, struct timespec *, int); int (*atomic_open)(struct inode *, struct dentry *, struct file *, unsigned open_flag, umode_t create_mode); - int (*tmpfile) (struct user_namespace *, struct inode *, struct file *, umode_t); - struct posix_acl * (*get_acl)(struct user_namespace *, struct dentry *, int); - int (*set_acl)(struct user_namespace *, struct dentry *, struct posix_acl *, int); - int (*fileattr_set)(struct user_namespace *mnt_userns, + int (*tmpfile) (struct mnt_idmap *, struct inode *, struct file *, umode_t); + struct posix_acl * (*get_acl)(struct mnt_idmap *, struct dentry *, int); + int (*set_acl)(struct mnt_idmap *, struct dentry *, struct posix_acl *, int); + int (*fileattr_set)(struct mnt_idmap *idmap, struct dentry *dentry, struct fileattr *fa); int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa); }; |