diff options
Diffstat (limited to 'Documentation/filesystems/bcachefs/casefolding.rst')
-rw-r--r-- | Documentation/filesystems/bcachefs/casefolding.rst | 108 |
1 files changed, 108 insertions, 0 deletions
diff --git a/Documentation/filesystems/bcachefs/casefolding.rst b/Documentation/filesystems/bcachefs/casefolding.rst new file mode 100644 index 000000000000..871a38f557e8 --- /dev/null +++ b/Documentation/filesystems/bcachefs/casefolding.rst @@ -0,0 +1,108 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Casefolding +=========== + +bcachefs has support for case-insensitive file and directory +lookups using the regular `chattr +F` (`S_CASEFOLD`, `FS_CASEFOLD_FL`) +casefolding attributes. + +The main usecase for casefolding is compatibility with software written +against other filesystems that rely on casefolded lookups +(eg. NTFS and Wine/Proton). +Taking advantage of file-system level casefolding can lead to great +loading time gains in many applications and games. + +Casefolding support requires a kernel with the `CONFIG_UNICODE` enabled. +Once a directory has been flagged for casefolding, a feature bit +is enabled on the superblock which marks the filesystem as using +casefolding. +When the feature bit for casefolding is enabled, it is no longer possible +to mount that filesystem on kernels without `CONFIG_UNICODE` enabled. + +On the lookup/query side: casefolding is implemented by allocating a new +string of `BCH_NAME_MAX` length using the `utf8_casefold` function to +casefold the query string. + +On the dirent side: casefolding is implemented by ensuring the `bkey`'s +hash is made from the casefolded string and storing the cached casefolded +name with the regular name in the dirent. + +The structure looks like this: + +* Regular: [dirent data][regular name][nul][nul]... +* Casefolded: [dirent data][reg len][cf len][regular name][casefolded name][nul][nul]... + +(Do note, the number of NULs here is merely for illustration; their count can +vary per-key, and they may not even be present if the key is aligned to +`sizeof(u64)`.) + +This is efficient as it means that for all file lookups that require casefolding, +it has identical performance to a regular lookup: +a hash comparison and a `memcmp` of the name. + +Rationale +--------- + +Several designs were considered for this system: +One was to introduce a dirent_v2, however that would be painful especially as +the hash system only has support for a single key type. This would also need +`BCH_NAME_MAX` to change between versions, and a new feature bit. + +Another option was to store without the two lengths, and just take the length of +the regular name and casefolded name contiguously / 2 as the length. This would +assume that the regular length == casefolded length, but that could potentially +not be true, if the uppercase unicode glyph had a different UTF-8 encoding than +the lowercase unicode glyph. +It would be possible to disregard the casefold cache for those cases, but it was +decided to simply encode the two string lengths in the key to avoid random +performance issues if this edgecase was ever hit. + +The option settled on was to use a free-bit in d_type to mark a dirent as having +a casefold cache, and then treat the first 4 bytes the name block as lengths. +You can see this in the `d_cf_name_block` member of union in `bch_dirent`. + +The feature bit was used to allow casefolding support to be enabled for the majority +of users, but some allow users who have no need for the feature to still use bcachefs as +`CONFIG_UNICODE` can increase the kernel side a significant amount due to the tables used, +which may be decider between using bcachefs for eg. embedded platforms. + +Other filesystems like ext4 and f2fs have a super-block level option for casefolding +encoding, but bcachefs currently does not provide this. ext4 and f2fs do not expose +any encodings than a single UTF-8 version. When future encodings are desirable, +they will be added trivially using the opts mechanism. + +dentry/dcache considerations +---------------------------- + +Currently, in casefolded directories, bcachefs (like other filesystems) will not cache +negative dentry's. + +This is because currently doing so presents a problem in the following scenario: + + - Lookup file "blAH" in a casefolded directory + - Creation of file "BLAH" in a casefolded directory + - Lookup file "blAH" in a casefolded directory + +This would fail if negative dentry's were cached. + +This is slightly suboptimal, but could be fixed in future with some vfs work. + + +References +---------- + +(from Peter Anvin, on the list) + +It is worth noting that Microsoft has basically declared their +"recommended" case folding (upcase) table to be permanently frozen (for +new filesystem instances in the case where they use an on-disk +translation table created at format time.) As far as I know they have +never supported anything other than 1:1 conversion of BMP code points, +nor normalization. + +The exFAT specification enumerates the full recommended upcase table, +although in a somewhat annoying format (basically a hex dump of +compressed data): + +https://learn.microsoft.com/en-us/windows/win32/fileio/exfat-specification |