linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2023-10-16	fscrypt: track master key presence separately from secret	Eric Biggers
	Master keys can be in one of three states: present, incompletely removed, and absent (as per FSCRYPT_KEY_STATUS_* used in the UAPI). Currently, the way that "present" is distinguished from "incompletely removed" internally is by whether ->mk_secret exists or not. With extent-based encryption, it will be necessary to allow per-extent keys to be derived while the master key is incompletely removed, so that I/O on open files will reliably continue working after removal of the key has been initiated. (We could allow I/O to sometimes fail in that case, but that seems problematic for reasons such as writes getting silently thrown away and diverging from the existing fscrypt semantics.) Therefore, when the filesystem is using extent-based encryption, ->mk_secret can't be wiped when the key becomes incompletely removed. As a prerequisite for doing that, this patch makes the "present" state be tracked using a new field, ->mk_present. No behavior is changed yet. The basic idea here is borrowed from Josef Bacik's patch "fscrypt: use a flag to indicate that the master key is being evicted" (https://lore.kernel.org/r/e86c16dddc049ff065f877d793ad773e4c6bfad9.1696970227.git.josef@toxicpanda.com). I reimplemented it using a "present" bool instead of an "evicted" flag, fixed a couple bugs, and tried to update everything to be consistent. Note: I considered adding a ->mk_status field instead, holding one of FSCRYPT_KEY_STATUS_*. At first that seemed nice, but it ended up being more complex (despite simplifying FS_IOC_GET_ENCRYPTION_KEY_STATUS), since it would have introduced redundancy and had weird locking rules. Reviewed-by: Neal Gompa <neal@gompa.dev> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231015061055.62673-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-10-08	fscrypt: rename fscrypt_info => fscrypt_inode_info	Josef Bacik
	We are going to track per-extent information, so it'll be necessary to distinguish between inode infos and extent infos. Rename fscrypt_info to fscrypt_inode_info, adjusting any lines that now exceed 80 characters. Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ebiggers: rebased onto fscrypt tree, renamed fscrypt_get_info(), adjusted two comments, and fixed some lines over 80 characters] Link: https://lore.kernel.org/r/20231005025757.33521-1-ebiggers@kernel.org Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-09-25	fscrypt: support crypto data unit size less than filesystem block size	Eric Biggers
	Until now, fscrypt has always used the filesystem block size as the granularity of file contents encryption. Two scenarios have come up where a sub-block granularity of contents encryption would be useful: 1. Inline crypto hardware that only supports a crypto data unit size that is less than the filesystem block size. 2. Support for direct I/O at a granularity less than the filesystem block size, for example at the block device's logical block size in order to match the traditional direct I/O alignment requirement. (1) first came up with older eMMC inline crypto hardware that only supports a crypto data unit size of 512 bytes. That specific case ultimately went away because all systems with that hardware continued using out of tree code and never actually upgraded to the upstream inline crypto framework. But, now it's coming back in a new way: some current UFS controllers only support a data unit size of 4096 bytes, and there is a proposal to increase the filesystem block size to 16K. (2) was discussed as a "nice to have" feature, though not essential, when support for direct I/O on encrypted files was being upstreamed. Still, the fact that this feature has come up several times does suggest it would be wise to have available. Therefore, this patch implements it by using one of the reserved bytes in fscrypt_policy_v2 to allow users to select a sub-block data unit size. Supported data unit sizes are powers of 2 between 512 and the filesystem block size, inclusively. Support is implemented for both the FS-layer and inline crypto cases. This patch focuses on the basic support for sub-block data units. Some things are out of scope for this patch but may be addressed later: - Supporting sub-block data units in combination with FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64, in most cases. Unfortunately this combination usually causes data unit indices to exceed 32 bits, and thus fscrypt_supported_policy() correctly disallows it. The users who potentially need this combination are using f2fs. To support it, f2fs would need to provide an option to slightly reduce its max file size. - Supporting sub-block data units in combination with FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32. This has the same problem described above, but also it will need special code to make DUN wraparound still happen on a FS block boundary. - Supporting use case (2) mentioned above. The encrypted direct I/O code will need to stop requiring and assuming FS block alignment. This won't be hard, but it belongs in a separate patch. - Supporting this feature on filesystems other than ext4 and f2fs. (Filesystems declare support for it via their fscrypt_operations.) On UBIFS, sub-block data units don't make sense because UBIFS encrypts variable-length blocks as a result of compression. CephFS could support it, but a bit more work would be needed to make the fscrypt_*_block_inplace functions play nicely with sub-block data units. I don't think there's a use case for this on CephFS anyway. Link: https://lore.kernel.org/r/20230925055451.59499-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-04-06	fscrypt: optimize fscrypt_initialize()	Eric Biggers
	fscrypt_initialize() is a "one-time init" function that is called whenever the key is set up for any inode on any filesystem. Make it implement "one-time init" more efficiently by not taking a global mutex in the "already initialized case" and doing fewer pointer dereferences. Link: https://lore.kernel.org/r/20230406181245.36091-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-27	fscrypt: use WARN_ON_ONCE instead of WARN_ON	Eric Biggers
	As per Linus's suggestion (https://lore.kernel.org/r/CAHk-=whefxRGyNGzCzG6BVeM=5vnvgb-XhSeFJVxJyAxAF8XRA@mail.gmail.com), use WARN_ON_ONCE instead of WARN_ON. This barely adds any extra overhead, and it makes it so that if any of these ever becomes reachable (they shouldn't, but that's the point), the logs can't be flooded. Link: https://lore.kernel.org/r/20230320233943.73600-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-02-07	fscrypt: clean up fscrypt_add_test_dummy_key()	Eric Biggers
	Now that fscrypt_add_test_dummy_key() is only called by setup_file_encryption_key() and not by the individual filesystems, un-export it. Also change its prototype to take the fscrypt_key_specifier directly, as the caller already has it. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230208062107.199831-6-ebiggers@kernel.org
2023-02-07	fscrypt: add the test dummy encryption key on-demand	Eric Biggers
	When the key for an inode is not found but the inode is using the test_dummy_encryption policy, automatically add the test_dummy_encryption key to the filesystem keyring. This eliminates the need for all the individual filesystems to do this at mount time, which is a bit tricky to clean up from on failure. Note: this covers the call to fscrypt_find_master_key() from inode key setup, but not from the fscrypt ioctls. So, this isn't exactly the same as the key being present from the very beginning. I think we can tolerate that, though, since the inode key setup caller is the only one that actually matters in the context of test_dummy_encryption. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230208062107.199831-2-ebiggers@kernel.org
2022-12-01	fscrypt: Add SM4 XTS/CTS symmetric algorithm support	Tianjia Zhang
	Add support for XTS and CTS mode variant of SM4 algorithm. The former is used to encrypt file contents, while the latter (SM4-CTS-CBC) is used to encrypt filenames. SM4 is a symmetric algorithm widely used in China, and is even mandatory algorithm in some special scenarios. We need to provide these users with the ability to encrypt files or disks using SM4-XTS. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221201125819.36932-3-tianjia.zhang@linux.alibaba.com
2022-11-15	fscrypt: pass super_block to fscrypt_put_master_key_activeref()	Eric Biggers
	As this code confused Linus [1], pass the super_block as an argument to fscrypt_put_master_key_activeref(). This removes the need to have the back-pointer ->mk_sb, so remove that. [1] https://lore.kernel.org/linux-fscrypt/CAHk-=wgud4Bc_um+htgfagYpZAnOoCb3NUoW67hc9LhOKsMtJg@mail.gmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221110082942.351615-1-ebiggers@kernel.org
2022-09-21	fscrypt: stop holding extra request_queue references	Eric Biggers
	Now that the fscrypt_master_key lifetime has been reworked to not be subject to the quirks of the keyrings subsystem, blk_crypto_evict_key() no longer gets called after the filesystem has already been unmounted. Therefore, there is no longer any need to hold extra references to the filesystem's request_queue(s). (And these references didn't always do their intended job anyway, as pinning a request_queue doesn't necessarily pin the corresponding blk_crypto_profile.) Stop taking these extra references. Instead, just pass the super_block to fscrypt_destroy_inline_crypt_key(), and use it to get the list of block devices the key needs to be evicted from. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220901193208.138056-3-ebiggers@kernel.org
2022-09-21	fscrypt: stop using keyrings subsystem for fscrypt_master_key	Eric Biggers
	The approach of fs/crypto/ internally managing the fscrypt_master_key structs as the payloads of "struct key" objects contained in a "struct key" keyring has outlived its usefulness. The original idea was to simplify the code by reusing code from the keyrings subsystem. However, several issues have arisen that can't easily be resolved: - When a master key struct is destroyed, blk_crypto_evict_key() must be called on any per-mode keys embedded in it. (This started being the case when inline encryption support was added.) Yet, the keyrings subsystem can arbitrarily delay the destruction of keys, even past the time the filesystem was unmounted. Therefore, currently there is no easy way to call blk_crypto_evict_key() when a master key is destroyed. Currently, this is worked around by holding an extra reference to the filesystem's request_queue(s). But it was overlooked that the request_queue reference is not guaranteed to pin the corresponding blk_crypto_profile too; for device-mapper devices that support inline crypto, it doesn't. This can cause a use-after-free. - When the last inode that was using an incompletely-removed master key is evicted, the master key removal is completed by removing the key struct from the keyring. Currently this is done via key_invalidate(). Yet, key_invalidate() takes the key semaphore. This can deadlock when called from the shrinker, since in fscrypt_ioctl_add_key(), memory is allocated with GFP_KERNEL under the same semaphore. - More generally, the fact that the keyrings subsystem can arbitrarily delay the destruction of keys (via garbage collection delay, or via random processes getting temporary key references) is undesirable, as it means we can't strictly guarantee that all secrets are ever wiped. - Doing the master key lookups via the keyrings subsystem results in the key_permission LSM hook being called. fscrypt doesn't want this, as all access control for encrypted files is designed to happen via the files themselves, like any other files. The workaround which SELinux users are using is to change their SELinux policy to grant key search access to all domains. This works, but it is an odd extra step that shouldn't really have to be done. The fix for all these issues is to change the implementation to what I should have done originally: don't use the keyrings subsystem to keep track of the filesystem's fscrypt_master_key structs. Instead, just store them in a regular kernel data structure, and rework the reference counting, locking, and lifetime accordingly. Retain support for RCU-mode key lookups by using a hash table. Replace fscrypt_sb_free() with fscrypt_sb_delete(), which releases the keys synchronously and runs a bit earlier during unmount, so that block devices are still available. A side effect of this patch is that neither the master keys themselves nor the filesystem keyrings will be listed in /proc/keys anymore. ("Master key users" and the master key users keyrings will still be listed.) However, this was mostly an implementation detail, and it was intended just for debugging purposes. I don't know of anyone using it. This patch does not change how "master key users" (->mk_users) works; that still uses the keyrings subsystem. That is still needed for key quotas, and changing that isn't necessary to solve the issues listed above. If we decide to change that too, it would be a separate patch. I've marked this as fixing the original commit that added the fscrypt keyring, but as noted above the most important issue that this patch fixes wasn't introduced until the addition of inline encryption support. Fixes: 22d94f493bfb ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl") Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220901193208.138056-2-ebiggers@kernel.org
2022-06-10	fscrypt: Add HCTR2 support for filename encryption	Nathan Huckleberry
	HCTR2 is a tweakable, length-preserving encryption mode that is intended for use on CPUs with dedicated crypto instructions. HCTR2 has the property that a bitflip in the plaintext changes the entire ciphertext. This property fixes a known weakness with filename encryption: when two filenames in the same directory share a prefix of >= 16 bytes, with AES-CTS-CBC their encrypted filenames share a common substring, leaking information. HCTR2 does not have this problem. More information on HCTR2 can be found here: "Length-preserving encryption with HCTR2": https://eprint.iacr.org/2021/1441.pdf Signed-off-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-05-09	fscrypt: factor out fscrypt_policy_to_key_spec()	Eric Biggers
	Factor out a function that builds the fscrypt_key_specifier for an fscrypt_policy. Before this was only needed when finding the key for a file, but now it will also be needed for test_dummy_encryption support. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220501050857.538984-4-ebiggers@kernel.org
2022-04-13	fscrypt: log when starting to use inline encryption	Eric Biggers
	When inline encryption is used, the usual message "fscrypt: AES-256-XTS using implementation <impl>" doesn't appear in the kernel log. Add a similar message for the blk-crypto case that indicates that inline encryption was used, and whether blk-crypto-fallback was used or not. This can be useful for debugging performance problems. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220414053415.158986-1-ebiggers@kernel.org
2021-10-25	fscrypt: improve a few comments	Eric Biggers
	Improve a few comments. These were extracted from the patch "fscrypt: add support for hardware-wrapped keys" (https://lore.kernel.org/r/20211021181608.54127-4-ebiggers@kernel.org). Link: https://lore.kernel.org/r/20211026021042.6581-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-09-22	fscrypt: allow 256-bit master keys with AES-256-XTS	Eric Biggers
	fscrypt currently requires a 512-bit master key when AES-256-XTS is used, since AES-256-XTS keys are 512-bit and fscrypt requires that the master key be at least as long any key that will be derived from it. However, this is overly strict because AES-256-XTS doesn't actually have a 512-bit security strength, but rather 256-bit. The fact that XTS takes twice the expected key size is a quirk of the XTS mode. It is sufficient to use 256 bits of entropy for AES-256-XTS, provided that it is first properly expanded into a 512-bit key, which HKDF-SHA512 does. Therefore, relax the check of the master key size to use the security strength of the derived key rather than the size of the derived key (except for v1 encryption policies, which don't use HKDF). Besides making things more flexible for userspace, this is needed in order for the use of a KDF which only takes a 256-bit key to be introduced into the fscrypt key hierarchy. This will happen with hardware-wrapped keys support, as all known hardware which supports that feature uses an SP800-108 KDF using AES-256-CMAC, so the wrapped keys are wrapped 256-bit AES keys. Moreover, there is interest in fscrypt supporting the same type of AES-256-CMAC based KDF in software as an alternative to HKDF-SHA512. There is no security problem with such features, so fix the key length check to work properly with them. Reviewed-by: Paul Crowley <paulcrowley@google.com> Link: https://lore.kernel.org/r/20210921030303.5598-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-06-05	fscrypt: fix derivation of SipHash keys on big endian CPUs	Eric Biggers
	Typically, the cryptographic APIs that fscrypt uses take keys as byte arrays, which avoids endianness issues. However, siphash_key_t is an exception. It is defined as 'u64 key[2];', i.e. the 128-bit key is expected to be given directly as two 64-bit words in CPU endianness. fscrypt_derive_dirhash_key() and fscrypt_setup_iv_ino_lblk_32_key() forgot to take this into account. Therefore, the SipHash keys used to index encrypted+casefolded directories differ on big endian vs. little endian platforms, as do the SipHash keys used to hash inode numbers for IV_INO_LBLK_32-encrypted directories. This makes such directories non-portable between these platforms. Fix this by always using the little endian order. This is a breaking change for big endian platforms, but this should be fine in practice since these features (encrypt+casefold support, and the IV_INO_LBLK_32 flag) aren't known to actually be used on any big endian platforms yet. Fixes: aa408f835d02 ("fscrypt: derive dirhash key for casefolded directories") Fixes: e3b1078bedd3 ("fscrypt: add support for IV_INO_LBLK_32 policies") Cc: <stable@vger.kernel.org> # v5.6+ Link: https://lore.kernel.org/r/20210605075033.54424-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-12-02	fscrypt: allow deleting files with unsupported encryption policy	Eric Biggers
	Currently it's impossible to delete files that use an unsupported encryption policy, as the kernel will just return an error when performing any operation on the top-level encrypted directory, even just a path lookup into the directory or opening the directory for readdir. More specifically, this occurs in any of the following cases: - The encryption context has an unrecognized version number. Current kernels know about v1 and v2, but there could be more versions in the future. - The encryption context has unrecognized encryption modes (FSCRYPT_MODE_) or flags (FSCRYPT_POLICY_FLAG_), an unrecognized combination of modes, or reserved bits set. - The encryption key has been added and the encryption modes are recognized but aren't available in the crypto API -- for example, a directory is encrypted with FSCRYPT_MODE_ADIANTUM but the kernel doesn't have CONFIG_CRYPTO_ADIANTUM enabled. It's desirable to return errors for most operations on files that use an unsupported encryption policy, but the current behavior is too strict. We need to allow enough to delete files, so that people can't be stuck with undeletable files when downgrading kernel versions. That includes allowing directories to be listed and allowing dentries to be looked up. Fix this by modifying the key setup logic to treat an unsupported encryption policy in the same way as "key unavailable" in the cases that are required for a recursive delete to work: preparing for a readdir or a dentry lookup, revalidating a dentry, or checking whether an inode has the same encryption policy as its parent directory. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201203022041.230976-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-12-02	fscrypt: unexport fscrypt_get_encryption_info()	Eric Biggers
	Now that fscrypt_get_encryption_info() is only called from files in fs/crypto/ (due to all key setup now being handled by higher-level helper functions instead of directly by filesystems), unexport it and move its declaration to fscrypt_private.h. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201203022041.230976-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-11-24	fscrypt: simplify master key locking	Eric Biggers
	The stated reasons for separating fscrypt_master_key::mk_secret_sem from the standard semaphore contained in every 'struct key' no longer apply. First, due to commit a992b20cd4ee ("fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()"), fscrypt_get_encryption_info() is no longer called from within a filesystem transaction. Second, due to commit d3ec10aa9581 ("KEYS: Don't write out to userspace while holding key semaphore"), the semaphore for the "keyring" key type no longer ranks above page faults. That leaves performance as the only possible reason to keep the separate mk_secret_sem. Specifically, having mk_secret_sem reduces the contention between setup_file_encryption_key() and FS_IOC_{ADD,REMOVE}_ENCRYPTION_KEY. However, these ioctls aren't executed often, so this doesn't seem to be worth the extra complexity. Therefore, simplify the locking design by just using key->sem instead of mk_secret_sem. Link: https://lore.kernel.org/r/20201117032626.320275-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-11-16	fscrypt: remove kernel-internal constants from UAPI header	Eric Biggers
	There isn't really any valid reason to use __FSCRYPT_MODE_MAX or FSCRYPT_POLICY_FLAGS_VALID in a userspace program. These constants are only meant to be used by the kernel internally, and they are defined in the UAPI header next to the mode numbers and flags only so that kernel developers don't forget to update them when adding new modes or flags. In https://lkml.kernel.org/r/20201005074133.1958633-2-satyat@google.com there was an example of someone wanting to use __FSCRYPT_MODE_MAX in a user program, and it was wrong because the program would have broken if __FSCRYPT_MODE_MAX were ever increased. So having this definition available is harmful. FSCRYPT_POLICY_FLAGS_VALID has the same problem. So, remove these definitions from the UAPI header. Replace FSCRYPT_POLICY_FLAGS_VALID with just listing the valid flags explicitly in the one kernel function that needs it. Move __FSCRYPT_MODE_MAX to fscrypt_private.h, remove the double underscores (which were only present to discourage use by userspace), and add a BUILD_BUG_ON() and comments to (hopefully) ensure it is kept in sync. Keep the old name FS_POLICY_FLAGS_VALID, since it's been around for longer and there's a greater chance that removing it would break source compatibility with some program. Indeed, mtd-utils is using it in an #ifdef, and removing it would introduce compiler warnings (about FS_POLICY_FLAGS_PAD_* being redefined) into the mtd-utils build. However, reduce its value to 0x07 so that it only includes the flags with old names (the ones present before Linux 5.4), and try to make it clear that it's now "frozen" and no new flags should be added to it. Fixes: 2336d0deb2d4 ("fscrypt: use FSCRYPT_ prefix for uapi constants") Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/20201024005132.495952-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-11-06	fscrypt: remove reachable WARN in fscrypt_setup_iv_ino_lblk_32_key()	Eric Biggers
	I_CREATING isn't actually set until the inode has been assigned an inode number and inserted into the inode hash table. So the WARN_ON() in fscrypt_setup_iv_ino_lblk_32_key() is wrong, and it can trigger when creating an encrypted file on ext4. Remove it. This was sometimes causing xfstest generic/602 to fail on ext4. I didn't notice it before because due to a separate oversight, new inodes that haven't been assigned an inode number yet don't necessarily have i_ino == 0 as I had thought, so by chance I never saw the test fail. Fixes: a992b20cd4ee ("fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()") Reported-by: Theodore Y. Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20201031004556.87862-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22	fscrypt: handle test_dummy_encryption in more logical way	Eric Biggers
	The behavior of the test_dummy_encryption mount option is that when a new file (or directory or symlink) is created in an unencrypted directory, it's automatically encrypted using a dummy encryption policy. That's it; in particular, the encryption (or lack thereof) of existing files (or directories or symlinks) doesn't change. Unfortunately the implementation of test_dummy_encryption is a bit weird and confusing. When test_dummy_encryption is enabled and a file is being created in an unencrypted directory, we set up an encryption key (->i_crypt_info) for the directory. This isn't actually used to do any encryption, however, since the directory is still unencrypted! Instead, ->i_crypt_info is only used for inheriting the encryption policy. One consequence of this is that the filesystem ends up providing a "dummy context" (policy + nonce) instead of a "dummy policy". In commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I mistakenly thought this was required. However, actually the nonce only ends up being used to derive a key that is never used. Another consequence of this implementation is that it allows for 'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge case that can be forgotten about. For example, currently FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the dummy encryption policy when the filesystem is mounted with test_dummy_encryption. That seems like the wrong thing to do, since again, the directory itself is not actually encrypted. Therefore, switch to a more logical and maintainable implementation where the dummy encryption policy inheritance is done without setting up keys for unencrypted directories. This involves: - Adding a function fscrypt_policy_to_inherit() which returns the encryption policy to inherit from a directory. This can be a real policy, a dummy policy, or no policy. - Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc. with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc. - Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead of an inode. Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22	fscrypt: stop pretending that key setup is nofs-safe	Eric Biggers
	fscrypt_get_encryption_info() has never actually been safe to call in a context that needs GFP_NOFS, since it calls crypto_alloc_skcipher(). crypto_alloc_skcipher() isn't GFP_NOFS-safe, even if called under memalloc_nofs_save(). This is because it may load kernel modules, and also because it internally takes crypto_alg_sem. Other tasks can do GFP_KERNEL allocations while holding crypto_alg_sem for write. The use of fscrypt_init_mutex isn't GFP_NOFS-safe either. So, stop pretending that fscrypt_get_encryption_info() is nofs-safe. I.e., when it allocates memory, just use GFP_KERNEL instead of GFP_NOFS. Note, another reason to do this is that GFP_NOFS is deprecated in favor of using memalloc_nofs_save() in the proper places. Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22	fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()	Eric Biggers
	fscrypt_get_encryption_info() is intended to be GFP_NOFS-safe. But actually it isn't, since it uses functions like crypto_alloc_skcipher() which aren't GFP_NOFS-safe, even when called under memalloc_nofs_save(). Therefore it can deadlock when called from a context that needs GFP_NOFS, e.g. during an ext4 transaction or between f2fs_lock_op() and f2fs_unlock_op(). This happens when creating a new encrypted file. We can't fix this by just not setting up the key for new inodes right away, since new symlinks need their key to encrypt the symlink target. So we need to set up the new inode's key before starting the transaction. But just calling fscrypt_get_encryption_info() earlier doesn't work, since it assumes the encryption context is already set, and the encryption context can't be set until the transaction. The recently proposed fscrypt support for the ceph filesystem (https://lkml.kernel.org/linux-fscrypt/20200821182813.52570-1-jlayton@kernel.org/T/#u) will have this same ordering problem too, since ceph will need to encrypt new symlinks before setting their encryption context. Finally, f2fs can deadlock when the filesystem is mounted with '-o test_dummy_encryption' and a new file is created in an existing unencrypted directory. Similarly, this is caused by holding too many locks when calling fscrypt_get_encryption_info(). To solve all these problems, add new helper functions: - fscrypt_prepare_new_inode() sets up a new inode's encryption key (fscrypt_info), using the parent directory's encryption policy and a new random nonce. It neither reads nor writes the encryption context. - fscrypt_set_context() persists the encryption context of a new inode, using the information from the fscrypt_info already in memory. This replaces fscrypt_inherit_context(). Temporarily keep fscrypt_inherit_context() around until all filesystems have been converted to use fscrypt_set_context(). Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21	fscrypt: use smp_load_acquire() for ->i_crypt_info	Eric Biggers
	Normally smp_store_release() or cmpxchg_release() is paired with smp_load_acquire(). Sometimes smp_load_acquire() can be replaced with the more lightweight READ_ONCE(). However, for this to be safe, all the published memory must only be accessed in a way that involves the pointer itself. This may not be the case if allocating the object also involves initializing a static or global variable, for example. fscrypt_info includes various sub-objects which are internal to and are allocated by other kernel subsystems such as keyrings and crypto. So by using READ_ONCE() for ->i_crypt_info, we're relying on internal implementation details of these other kernel subsystems. Remove this fragile assumption by using smp_load_acquire() instead. (Note: I haven't seen any real-world problems here. This change is just fixing the code to be guaranteed correct and less fragile.) Fixes: e37a784d8b6a ("fscrypt: use READ_ONCE() to access ->i_crypt_info") Link: https://lore.kernel.org/r/20200721225920.114347-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21	fscrypt: use smp_load_acquire() for fscrypt_prepared_key	Eric Biggers
	Normally smp_store_release() or cmpxchg_release() is paired with smp_load_acquire(). Sometimes smp_load_acquire() can be replaced with the more lightweight READ_ONCE(). However, for this to be safe, all the published memory must only be accessed in a way that involves the pointer itself. This may not be the case if allocating the object also involves initializing a static or global variable, for example. fscrypt_prepared_key includes a pointer to a crypto_skcipher object, which is internal to and is allocated by the crypto subsystem. By using READ_ONCE() for it, we're relying on internal implementation details of the crypto subsystem. Remove this fragile assumption by using smp_load_acquire() instead. (Note: I haven't seen any real-world problems here. This change is just fixing the code to be guaranteed correct and less fragile.) Fixes: 5fee36095cda ("fscrypt: add inline encryption support") Cc: Satya Tangirala <satyat@google.com> Link: https://lore.kernel.org/r/20200721225920.114347-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-20	fscrypt: rename FS_KEY_DERIVATION_NONCE_SIZE	Eric Biggers
	The name "FS_KEY_DERIVATION_NONCE_SIZE" is a bit outdated since due to the addition of FSCRYPT_POLICY_FLAG_DIRECT_KEY, the file nonce may now be used as a tweak instead of for key derivation. Also, we're now prefixing the fscrypt constants with "FSCRYPT_" instead of "FS_". Therefore, rename this constant to FSCRYPT_FILE_NONCE_SIZE. Link: https://lore.kernel.org/r/20200708215722.147154-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-08	fscrypt: add inline encryption support	Satya Tangirala
	Add support for inline encryption to fs/crypto/. With "inline encryption", the block layer handles the decryption/encryption as part of the bio, instead of the filesystem doing the crypto itself via Linux's crypto API. This model is needed in order to take advantage of the inline encryption hardware present on most modern mobile SoCs. To use inline encryption, the filesystem needs to be mounted with '-o inlinecrypt'. Blk-crypto will then be used instead of the traditional filesystem-layer crypto whenever possible to encrypt the contents of any encrypted files in that filesystem. Fscrypt still provides the key and IV to use, and the actual ciphertext on-disk is still the same; therefore it's testable using the existing fscrypt ciphertext verification tests. Note that since blk-crypto has a fallback to Linux's crypto API, and also supports all the encryption modes currently supported by fscrypt, this feature is usable and testable even without actual inline encryption hardware. Per-filesystem changes will be needed to set encryption contexts when submitting bios and to implement the 'inlinecrypt' mount option. This patch just adds the common code. Signed-off-by: Satya Tangirala <satyat@google.com> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20200702015607.1215430-3-satyat@google.com Co-developed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-19	fscrypt: add support for IV_INO_LBLK_32 policies	Eric Biggers
	The eMMC inline crypto standard will only specify 32 DUN bits (a.k.a. IV bits), unlike UFS's 64. IV_INO_LBLK_64 is therefore not applicable, but an encryption format which uses one key per policy and permits the moving of encrypted file contents (as f2fs's garbage collector requires) is still desirable. To support such hardware, add a new encryption format IV_INO_LBLK_32 that makes the best use of the 32 bits: the IV is set to 'SipHash-2-4(inode_number) + file_logical_block_number mod 2^32', where the SipHash key is derived from the fscrypt master key. We hash only the inode number and not also the block number, because we need to maintain contiguity of DUNs to merge bios. Unlike with IV_INO_LBLK_64, with this format IV reuse is possible; this is unavoidable given the size of the DUN. This means this format should only be used where the requirements of the first paragraph apply. However, the hash spreads out the IVs in the whole usable range, and the use of a keyed hash makes it difficult for an attacker to determine which files use which IVs. Besides the above differences, this flag works like IV_INO_LBLK_64 in that on ext4 it is only allowed if the stable_inodes feature has been enabled to prevent inode numbers and the filesystem UUID from changing. Link: https://lore.kernel.org/r/20200515204141.251098-1-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Paul Crowley <paulcrowley@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-18	fscrypt: support test_dummy_encryption=v2	Eric Biggers
	v1 encryption policies are deprecated in favor of v2, and some new features (e.g. encryption+casefolding) are only being added for v2. Therefore, the "test_dummy_encryption" mount option (which is used for encryption I/O testing with xfstests) needs to support v2 policies. To do this, extend its syntax to be "test_dummy_encryption=v1" or "test_dummy_encryption=v2". The existing "test_dummy_encryption" (no argument) also continues to be accepted, to specify the default setting -- currently v1, but the next patch changes it to v2. To cleanly support both v1 and v2 while also making it easy to support specifying other encryption settings in the future (say, accepting "$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a pointer to the dummy fscrypt_context rather than using mount flags. To avoid concurrency issues, don't allow test_dummy_encryption to be set or changed during a remount. (The former restriction is new, but xfstests doesn't run into it, so no one should notice.) Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4, there are two regressions, both of which are test bugs: ext4/023 and ext4/028 fail because they set an xattr and expect it to be stored inline, but the increase in size of the fscrypt_context from 24 to 40 bytes causes this xattr to be spilled into an external block. Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-12	fscrypt: fix all kerneldoc warnings	Eric Biggers
	Fix all kerneldoc warnings in fs/crypto/ and include/linux/fscrypt.h. Most of these were due to missing documentation for function parameters. Detected with: scripts/kernel-doc -v -none fs/crypto/*.{c,h} include/linux/fscrypt.h This cleanup makes it possible to check new patches for kerneldoc warnings without having to filter out all the existing ones. For consistency, also adjust some function "brief descriptions" to include the parentheses and to wrap at 80 characters. (The latter matches the checkpatch expectation.) Link: https://lore.kernel.org/r/20200511191358.53096-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-03-31	Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt	Linus Torvalds
	Pull fscrypt updates from Eric Biggers: "Add an ioctl FS_IOC_GET_ENCRYPTION_NONCE which retrieves a file's encryption nonce. This makes it easier to write automated tests which verify that fscrypt is doing the encryption correctly" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: ubifs: wire up FS_IOC_GET_ENCRYPTION_NONCE f2fs: wire up FS_IOC_GET_ENCRYPTION_NONCE ext4: wire up FS_IOC_GET_ENCRYPTION_NONCE fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl
2020-03-19	fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl	Eric Biggers
	Add an ioctl FS_IOC_GET_ENCRYPTION_NONCE which retrieves the nonce from an encrypted file or directory. The nonce is the 16-byte random value stored in the inode's encryption xattr. It is normally used together with the master key to derive the inode's actual encryption key. The nonces are needed by automated tests that verify the correctness of the ciphertext on-disk. Except for the IV_INO_LBLK_64 case, there's no way to replicate a file's ciphertext without knowing that file's nonce. The nonces aren't secret, and the existing ciphertext verification tests in xfstests retrieve them from disk using debugfs or dump.f2fs. But in environments that lack these debugging tools, getting the nonces by manually parsing the filesystem structure would be very hard. To make this important type of testing much easier, let's just add an ioctl that retrieves the nonce. Link: https://lore.kernel.org/r/20200314205052.93294-2-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-03-07	fscrypt: don't evict dirty inodes after removing key	Eric Biggers
	After FS_IOC_REMOVE_ENCRYPTION_KEY removes a key, it syncs the filesystem and tries to get and put all inodes that were unlocked by the key so that unused inodes get evicted via fscrypt_drop_inode(). Normally, the inodes are all clean due to the sync. However, after the filesystem is sync'ed, userspace can modify and close one of the files. (Userspace is supposed to close the files before removing the key. But it doesn't always happen, and the kernel can't assume it.) This causes the inode to be dirtied and have i_count == 0. Then, fscrypt_drop_inode() failed to consider this case and indicated that the inode can be dropped, causing the write to be lost. On f2fs, other problems such as a filesystem freeze could occur due to the inode being freed while still on f2fs's dirty inode list. Fix this bug by making fscrypt_drop_inode() only drop clean inodes. I've written an xfstest which detects this bug on ext4, f2fs, and ubifs. Fixes: b1c0ec3599f4 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl") Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/20200305084138.653498-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-22	fscrypt: clarify what is meant by a per-file key	Eric Biggers
	Now that there's sometimes a second type of per-file key (the dirhash key), clarify some function names, macros, and documentation that specifically deal with per-file encryption keys. Link: https://lore.kernel.org/r/20200120223201.241390-4-ebiggers@kernel.org Reviewed-by: Daniel Rosenberg <drosen@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-22	fscrypt: derive dirhash key for casefolded directories	Daniel Rosenberg
	When we allow indexed directories to use both encryption and casefolding, for the dirhash we can't just hash the ciphertext filenames that are stored on-disk (as is done currently) because the dirhash must be case insensitive, but the stored names are case-preserving. Nor can we hash the plaintext names with an unkeyed hash (or a hash keyed with a value stored on-disk like ext4's s_hash_seed), since that would leak information about the names that encryption is meant to protect. Instead, if we can accept a dirhash that's only computable when the fscrypt key is available, we can hash the plaintext names with a keyed hash using a secret key derived from the directory's fscrypt master key. We'll use SipHash-2-4 for this purpose. Prepare for this by deriving a SipHash key for each casefolded encrypted directory. Make sure to handle deriving the key not only when setting up the directory's fscrypt_info, but also in the case where the casefold flag is enabled after the fscrypt_info was already set up. (We could just always derive the key regardless of casefolding, but that would introduce unnecessary overhead for people not using casefolding.) Signed-off-by: Daniel Rosenberg <drosen@google.com> [EB: improved commit message, updated fscrypt.rst, squashed with change that avoids unnecessarily deriving the key, and many other cleanups] Link: https://lore.kernel.org/r/20200120223201.241390-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-12-31	fscrypt: check for appropriate use of DIRECT_KEY flag earlier	Eric Biggers
	FSCRYPT_POLICY_FLAG_DIRECT_KEY is currently only allowed with Adiantum encryption. But FS_IOC_SET_ENCRYPTION_POLICY allowed it in combination with other encryption modes, and an error wasn't reported until later when the encrypted directory was actually used. Fix it to report the error earlier by validating the correct use of the DIRECT_KEY flag in fscrypt_supported_policy(), similar to how we validate the IV_INO_LBLK_64 flag. Link: https://lore.kernel.org/r/20191209211829.239800-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-12-31	fscrypt: verify that the crypto_skcipher has the correct ivsize	Eric Biggers
	As a sanity check, verify that the allocated crypto_skcipher actually has the ivsize that fscrypt is assuming it has. This will always be the case unless there's a bug. But if there ever is such a bug (e.g. like there was in earlier versions of the ESSIV conversion patch [1]) it's preferable for it to be immediately obvious, and not rely on the ciphertext verification tests failing due to uninitialized IV bytes. [1] https://lkml.kernel.org/linux-crypto/20190702215517.GA69157@gmail.com/ Link: https://lore.kernel.org/r/20191209203918.225691-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-12-31	fscrypt: use crypto_skcipher_driver_name()	Eric Biggers
	Crypto API users shouldn't really be accessing struct skcipher_alg directly. <crypto/skcipher.h> already has a function crypto_skcipher_driver_name(), so use that instead. No change in behavior. Link: https://lore.kernel.org/r/20191209203810.225302-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-11-06	fscrypt: add support for IV_INO_LBLK_64 policies	Eric Biggers
	Inline encryption hardware compliant with the UFS v2.1 standard or with the upcoming version of the eMMC standard has the following properties: (1) Per I/O request, the encryption key is specified by a previously loaded keyslot. There might be only a small number of keyslots. (2) Per I/O request, the starting IV is specified by a 64-bit "data unit number" (DUN). IV bits 64-127 are assumed to be 0. The hardware automatically increments the DUN for each "data unit" of configurable size in the request, e.g. for each filesystem block. Property (1) makes it inefficient to use the traditional fscrypt per-file keys. Property (2) precludes the use of the existing DIRECT_KEY fscrypt policy flag, which needs at least 192 IV bits. Therefore, add a new fscrypt policy flag IV_INO_LBLK_64 which causes the encryption to modified as follows: - The encryption keys are derived from the master key, encryption mode number, and filesystem UUID. - The IVs are chosen as (inode_number << 32) \| file_logical_block_num. For filenames encryption, file_logical_block_num is 0. Since the file nonces aren't used in the key derivation, many files may share the same encryption key. This is much more efficient on the target hardware. Including the inode number in the IVs and mixing the filesystem UUID into the keys ensures that data in different files is nevertheless still encrypted differently. Additionally, limiting the inode and block numbers to 32 bits and placing the block number in the low bits maintains compatibility with the 64-bit DUN convention (property (2) above). Since this scheme assumes that inode numbers are stable (which may preclude filesystem shrinking) and that inode and file logical block numbers are at most 32-bit, IV_INO_LBLK_64 will only be allowed on filesystems that meet these constraints. These are acceptable limitations for the cases where this format would actually be used. Note that IV_INO_LBLK_64 is an on-disk format, not an implementation. This patch just adds support for it using the existing filesystem layer encryption. A later patch will add support for inline encryption. Reviewed-by: Paul Crowley <paulcrowley@google.com> Co-developed-by: Satya Tangirala <satyat@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-11-06	fscrypt: avoid data race on fscrypt_mode::logged_impl_name	Eric Biggers
	The access to logged_impl_name is technically a data race, which tools like KCSAN could complain about in the future. See: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE Fix by using xchg(), which also ensures that only one thread does the logging. This also required switching from bool to int, to avoid a build error on the RISC-V architecture which doesn't implement xchg on bytes. Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-10-21	fscrypt: zeroize fscrypt_info before freeing	Eric Biggers
	memset the struct fscrypt_info to zero before freeing. This isn't really needed currently, since there's no secret key directly in the fscrypt_info. But there's a decent chance that someone will add such a field in the future, e.g. in order to use an API that takes a raw key such as siphash(). So it's good to do this as a hardening measure. Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-10-21	fscrypt: invoke crypto API for ESSIV handling	Eric Biggers
	Instead of open-coding the calculations for ESSIV handling, use an ESSIV skcipher which does all of this under the hood. ESSIV was added to the crypto API in v5.4. This is based on a patch from Ard Biesheuvel, but reworked to apply after all the fscrypt changes that went into v5.4. Tested with 'kvm-xfstests -c ext4,f2fs -g encrypt', including the ciphertext verification tests for v1 and v2 encryption policies. Originally-from: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-12	fscrypt: allow unprivileged users to add/remove keys for v2 policies	Eric Biggers
	Allow the FS_IOC_ADD_ENCRYPTION_KEY and FS_IOC_REMOVE_ENCRYPTION_KEY ioctls to be used by non-root users to add and remove encryption keys from the filesystem-level crypto keyrings, subject to limitations. Motivation: while privileged fscrypt key management is sufficient for some users (e.g. Android and Chromium OS, where a privileged process manages all keys), the old API by design also allows non-root users to set up and use encrypted directories, and we don't want to regress on that. Especially, we don't want to force users to continue using the old API, running into the visibility mismatch between files and keyrings and being unable to "lock" encrypted directories. Intuitively, the ioctls have to be privileged since they manipulate filesystem-level state. However, it's actually safe to make them unprivileged if we very carefully enforce some specific limitations. First, each key must be identified by a cryptographic hash so that a user can't add the wrong key for another user's files. For v2 encryption policies, we use the key_identifier for this. v1 policies don't have this, so managing keys for them remains privileged. Second, each key a user adds is charged to their quota for the keyrings service. Thus, a user can't exhaust memory by adding a huge number of keys. By default each non-root user is allowed up to 200 keys; this can be changed using the existing sysctl 'kernel.keys.maxkeys'. Third, if multiple users add the same key, we keep track of those users of the key (of which there remains a single copy), and won't really remove the key, i.e. "lock" the encrypted files, until all those users have removed it. This prevents denial of service attacks that would be possible under simpler schemes, such allowing the first user who added a key to remove it -- since that could be a malicious user who has compromised the key. Of course, encryption keys should be kept secret, but the idea is that using encryption should never be less secure than not using encryption, even if your key was compromised. We tolerate that a user will be unable to really remove a key, i.e. unable to "lock" their encrypted files, if another user has added the same key. But in a sense, this is actually a good thing because it will avoid providing a false notion of security where a key appears to have been removed when actually it's still in memory, available to any attacker who compromises the operating system kernel. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-12	fscrypt: v2 encryption policy support	Eric Biggers
	Add a new fscrypt policy version, "v2". It has the following changes from the original policy version, which we call "v1" (): - Master keys (the user-provided encryption keys) are only ever used as input to HKDF-SHA512. This is more flexible and less error-prone, and it avoids the quirks and limitations of the AES-128-ECB based KDF. Three classes of cryptographically isolated subkeys are defined: - Per-file keys, like used in v1 policies except for the new KDF. - Per-mode keys. These implement the semantics of the DIRECT_KEY flag, which for v1 policies made the master key be used directly. These are also planned to be used for inline encryption when support for it is added. - Key identifiers (see below). - Each master key is identified by a 16-byte master_key_identifier, which is derived from the key itself using HKDF-SHA512. This prevents users from associating the wrong key with an encrypted file or directory. This was easily possible with v1 policies, which identified the key by an arbitrary 8-byte master_key_descriptor. - The key must be provided in the filesystem-level keyring, not in a process-subscribed keyring. The following UAPI additions are made: - The existing ioctl FS_IOC_SET_ENCRYPTION_POLICY can now be passed a fscrypt_policy_v2 to set a v2 encryption policy. It's disambiguated from fscrypt_policy/fscrypt_policy_v1 by the version code prefix. - A new ioctl FS_IOC_GET_ENCRYPTION_POLICY_EX is added. It allows getting the v1 or v2 encryption policy of an encrypted file or directory. The existing FS_IOC_GET_ENCRYPTION_POLICY ioctl could not be used because it did not have a way for userspace to indicate which policy structure is expected. The new ioctl includes a size field, so it is extensible to future fscrypt policy versions. - The ioctls FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY, and FS_IOC_GET_ENCRYPTION_KEY_STATUS now support managing keys for v2 encryption policies. Such keys are kept logically separate from keys for v1 encryption policies, and are identified by 'identifier' rather than by 'descriptor'. The 'identifier' need not be provided when adding a key, since the kernel will calculate it anyway. This patch temporarily keeps adding/removing v2 policy keys behind the same permission check done for adding/removing v1 policy keys: capable(CAP_SYS_ADMIN). However, the next patch will carefully take advantage of the cryptographically secure master_key_identifier to allow non-root users to add/remove v2 policy keys, thus providing a full replacement for v1 policies. () Actually, in the API fscrypt_policy::version is 0 while on-disk fscrypt_context::format is 1. But I believe it makes the most sense to advance both to '2' to have them be in sync, and to consider the numbering to start at 1 except for the API quirk. Reviewed-by: Paul Crowley <paulcrowley@google.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-12	fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl	Eric Biggers
	Add a new fscrypt ioctl, FS_IOC_REMOVE_ENCRYPTION_KEY. This ioctl removes an encryption key that was added by FS_IOC_ADD_ENCRYPTION_KEY. It wipes the secret key itself, then "locks" the encrypted files and directories that had been unlocked using that key -- implemented by evicting the relevant dentries and inodes from the VFS caches. The problem this solves is that many fscrypt users want the ability to remove encryption keys, causing the corresponding encrypted directories to appear "locked" (presented in ciphertext form) again. Moreover, users want removing an encryption key to really remove it, in the sense that the removed keys cannot be recovered even if kernel memory is compromised, e.g. by the exploit of a kernel security vulnerability or by a physical attack. This is desirable after a user logs out of the system, for example. In many cases users even already assume this to be the case and are surprised to hear when it's not. It is not sufficient to simply unlink the master key from the keyring (or to revoke or invalidate it), since the actual encryption transform objects are still pinned in memory by their inodes. Therefore, to really remove a key we must also evict the relevant inodes. Currently one workaround is to run 'sync && echo 2 > /proc/sys/vm/drop_caches'. But, that evicts all unused inodes in the system rather than just the inodes associated with the key being removed, causing severe performance problems. Moreover, it requires root privileges, so regular users can't "lock" their encrypted files. Another workaround, used in Chromium OS kernels, is to add a new VFS-level ioctl FS_IOC_DROP_CACHE which is a more restricted version of drop_caches that operates on a single super_block. It does: shrink_dcache_sb(sb); invalidate_inodes(sb, false); But it's still a hack. Yet, the major users of filesystem encryption want this feature badly enough that they are actually using these hacks. To properly solve the problem, start maintaining a list of the inodes which have been "unlocked" using each master key. Originally this wasn't possible because the kernel didn't keep track of in-use master keys at all. But, with the ->s_master_keys keyring it is now possible. Then, add an ioctl FS_IOC_REMOVE_ENCRYPTION_KEY. It finds the specified master key in ->s_master_keys, then wipes the secret key itself, which prevents any additional inodes from being unlocked with the key. Then, it syncs the filesystem and evicts the inodes in the key's list. The normal inode eviction code will free and wipe the per-file keys (in ->i_crypt_info). Note that freeing ->i_crypt_info without evicting the inodes was also considered, but would have been racy. Some inodes may still be in use when a master key is removed, and we can't simply revoke random file descriptors, mmap's, etc. Thus, the ioctl simply skips in-use inodes, and returns -EBUSY to indicate that some inodes weren't evicted. The master key secret is still removed, but the fscrypt_master_key struct remains to keep track of the remaining inodes. Userspace can then retry the ioctl to evict the remaining inodes. Alternatively, if userspace adds the key again, the refreshed secret will be associated with the existing list of inodes so they remain correctly tracked for future key removals. The ioctl doesn't wipe pagecache pages. Thus, we tolerate that after a kernel compromise some portions of plaintext file contents may still be recoverable from memory. This can be solved by enabling page poisoning system-wide, which security conscious users may choose to do. But it's very difficult to solve otherwise, e.g. note that plaintext file contents may have been read in other places than pagecache pages. Like FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY is initially restricted to privileged users only. This is sufficient for some use cases, but not all. A later patch will relax this restriction, but it will require introducing key hashes, among other changes. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-12	fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl	Eric Biggers
	Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY. This ioctl adds an encryption key to the filesystem's fscrypt keyring ->s_master_keys, making any files encrypted with that key appear "unlocked". Why we need this ~~~~~~~~~~~~~~~~ The main problem is that the "locked/unlocked" (ciphertext/plaintext) status of encrypted files is global, but the fscrypt keys are not. fscrypt only looks for keys in the keyring(s) the process accessing the filesystem is subscribed to: the thread keyring, process keyring, and session keyring, where the session keyring may contain the user keyring. Therefore, userspace has to put fscrypt keys in the keyrings for individual users or sessions. But this means that when a process with a different keyring tries to access encrypted files, whether they appear "unlocked" or not is nondeterministic. This is because it depends on whether the files are currently present in the inode cache. Fixing this by consistently providing each process its own view of the filesystem depending on whether it has the key or not isn't feasible due to how the VFS caches work. Furthermore, while sometimes users expect this behavior, it is misguided for two reasons. First, it would be an OS-level access control mechanism largely redundant with existing access control mechanisms such as UNIX file permissions, ACLs, LSMs, etc. Encryption is actually for protecting the data at rest. Second, almost all users of fscrypt actually do need the keys to be global. The largest users of fscrypt, Android and Chromium OS, achieve this by having PID 1 create a "session keyring" that is inherited by every process. This works, but it isn't scalable because it prevents session keyrings from being used for any other purpose. On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't similarly abuse the session keyring, so to make 'sudo' work on all systems it has to link all the user keyrings into root's user keyring [2]. This is ugly and raises security concerns. Moreover it can't make the keys available to system services, such as sshd trying to access the user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to read certificates from the user's home directory (see [5]); or to Docker containers (see [6], [7]). By having an API to add a key to the filesystem we'll be able to fix the above bugs, remove userspace workarounds, and clearly express the intended semantics: the locked/unlocked status of an encrypted directory is global, and encryption is orthogonal to OS-level access control. Why not use the add_key() syscall ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We use an ioctl for this API rather than the existing add_key() system call because the ioctl gives us the flexibility needed to implement fscrypt-specific semantics that will be introduced in later patches: - Supporting key removal with the semantics such that the secret is removed immediately and any unused inodes using the key are evicted; also, the eviction of any in-use inodes can be retried. - Calculating a key-dependent cryptographic identifier and returning it to userspace. - Allowing keys to be added and removed by non-root users, but only keys for v2 encryption policies; and to prevent denial-of-service attacks, users can only remove keys they themselves have added, and a key is only really removed after all users who added it have removed it. Trying to shoehorn these semantics into the keyrings syscalls would be very difficult, whereas the ioctls make things much easier. However, to reuse code the implementation still uses the keyrings service internally. Thus we get lockless RCU-mode key lookups without having to re-implement it, and the keys automatically show up in /proc/keys for debugging purposes. References: [1] https://github.com/google/fscrypt [2] https://goo.gl/55cCrI#heading=h.vf09isp98isb [3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939 [4] https://github.com/google/fscrypt/issues/116 [5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715 [6] https://github.com/google/fscrypt/issues/128 [7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-12	fscrypt: rename keyinfo.c to keysetup.c	Eric Biggers
	Rename keyinfo.c to keysetup.c since this better describes what the file does (sets up the key), and it matches the new file keysetup_v1.c. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>