summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-07cachefiles: Implement begin and end I/O operationDavid Howells
Implement the methods for beginning and ending an I/O operation. When called to begin an I/O operation, we are guaranteed that the cookie has reached a certain stage (we're called by fscache after it has done a suitable wait). If a file is available, we paste a ref over into the cache resources for the I/O routines to use. This means that the object can be invalidated whilst the I/O is ongoing without the need to synchronise as the file pointer in the object is replaced, but the file pointer in the cache resources is unaffected. Ending the operation just requires ditching any refs we have and dropping the access guarantee that fscache got for us on the cookie. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819645033.215744.2199344081658268312.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906951916.143852.9531384743995679857.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967161222.1823006.4461476204800357263.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021559030.640689.3684291785218094142.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Implement backing file wranglingDavid Howells
Implement the wrangling of backing files, including the following pieces: (1) Lookup and creation of a file on disk, using a tmpfile if the file isn't yet present. The file is then opened, sized for DIO and the file handle is attached to the cachefiles_object struct. The inode is marked to indicate that it's in use by a kernel service. (2) Invalidation of an object, creating a tmpfile and switching the file pointer in the cachefiles object. (3) Committing a file to disk, including setting the coherency xattr on it and, if necessary, creating a hard link to it. Note that this would be a good place to use Omar Sandoval's vfs_link() with AT_LINK_REPLACE[1] as I may have to unlink an old file before I can link a tmpfile into place. (4) Withdrawal of open objects when a cache is being withdrawn or a cookie is relinquished. This involves committing or discarding the file. Changes ======= ver #2: - Fix logging of wrong error[1]. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/20211203094950.GA2480@kili/ [1] Link: https://lore.kernel.org/r/163819644097.215744.4505389616742411239.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906949512.143852.14222856795032602080.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967158526.1823006.17482695321424642675.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021557060.640689.16373541458119269871.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Implement culling daemon commandsDavid Howells
Implement the ability for the userspace daemon to try and cull a file or directory in the cache. Two daemon commands are implemented: (1) The "inuse" command. This queries if a file is in use or whether it can be deleted. It checks the S_KERNEL_FILE flag on the inode referred to by the specified filename. (2) The "cull" command. This asks for a file or directory to be removed, where removal means either unlinking it or moving it to the graveyard directory for userspace to dismantle. Changes ======= ver #2: - Fix logging of wrong error[1]. - Need to unmark an inode we've moved to the graveyard before unlocking. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/20211203094950.GA2480@kili/ [1] Link: https://lore.kernel.org/r/163819643179.215744.13641580295708315695.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906945705.143852.8177595531814485350.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967155792.1823006.1088936326902550910.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021555037.640689.9472627499842585255.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Mark a backing file in use with an inode flagDavid Howells
Use an inode flag, S_KERNEL_FILE, to mark that a backing file is in use by the kernel to prevent cachefiles or other kernel services from interfering with that file. Using S_SWAPFILE instead isn't really viable as that has other effects in the I/O paths. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819642273.215744.6414248677118690672.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906943215.143852.16972351425323967014.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967154118.1823006.13227551961786743991.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021541207.640689.564689725898537127.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/164021552299.640689.10578652796777392062.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Implement metadata/coherency data storage in xattrsDavid Howells
Use an xattr on each backing file in the cache to store some metadata, such as the content type and the coherency data. Five content types are defined: (0) No content stored. (1) The file contains a single monolithic blob and must be all or nothing. This would be used for something like an AFS directory or a symlink. (2) The file is populated with content completely up to a point with nothing beyond that. (3) The file has a map attached and is sparsely populated. This would be stored in one or more additional xattrs. (4) The file is dirty, being in the process of local modification and the contents are not necessarily represented correctly by the metadata. The file should be deleted if this is seen on binding. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819641320.215744.16346770087799536862.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906942248.143852.5423738045012094252.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967151734.1823006.9301249989443622576.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021550471.640689.553853918307994335.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Implement key to filename encodingDavid Howells
Implement a function to encode a binary cookie key as something that can be used as a filename. Four options are considered: (1) All printable chars with no '/' characters. Prepend a 'D' to indicate the encoding but otherwise use as-is. (2) Appears to be an array of __be32. Encode as 'S' plus a list of hex-encoded 32-bit ints separated by commas. If a number is 0, it is rendered as "" instead of "0". (3) Appears to be an array of __le32. Encoded as (2) but with a 'T' encoding prefix. (4) Encoded as base64 with an 'E' prefix plus a second char indicating how much padding is involved. A non-standard base64 encoding is used because '/' cannot be used in the encoded form. If (1) is not possible, whichever of (2), (3) or (4) produces the shortest string is selected (hex-encoding a number may be less dense than base64 encoding it). Note that the prefix characters have to be selected from the set [DEIJST@] lest cachefilesd remove the files because it recognise the name. Changes ======= ver #2: - Fix a short allocation that didn't allow for a string terminator[1] Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/bcefb8f2-576a-b3fc-cc29-89808ebfd7c1@linux.alibaba.com/ [1] Link: https://lore.kernel.org/r/163819640393.215744.15212364106412961104.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906940529.143852.17352132319136117053.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967149827.1823006.6088580775428487961.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021549223.640689.14762875188193982341.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Implement object lifecycle funcsDavid Howells
Implement allocate, get, see and put functions for the cachefiles_object struct. The members of the struct we're going to need are also added. Additionally, implement a lifecycle tracepoint. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819639457.215744.4600093239395728232.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906939569.143852.3594314410666551982.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967148857.1823006.6332962598220464364.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021547762.640689.8422781599594931000.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Add tracepoints for calls to the VFSDavid Howells
Add tracepoints in cachefiles to monitor when it does various VFS operations, such as mkdir. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819638517.215744.12773133137536579766.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906938316.143852.17227990869551737803.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967147139.1823006.4909879317496543392.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021546287.640689.3501604495002415631.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Implement volume supportDavid Howells
Implement support for creating the directory layout for a volume on disk and setting up and withdrawing volume caching. Each volume has a directory named for the volume key under the root of the cache (prefixed with an 'I' to indicate to cachefilesd that it's an index) and then creates a bunch of hash bucket subdirectories under that (named as '@' plus a hex number) in which cookie files will be created. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819635314.215744.13081522301564537723.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906936397.143852.17788457778396467161.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967143860.1823006.7185205806080225038.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021545212.640689.5064821392307582927.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Implement cache registration and withdrawalDavid Howells
Do the following: (1) Fill out cachefiles_daemon_add_cache() so that it sets up the cache directories and registers the cache with cachefiles. (2) Add a function to do the top-level part of cache withdrawal and unregistration. (3) Add a function to sync a cache. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819633175.215744.10857127598041268340.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906935445.143852.15545194974036410029.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967142904.1823006.244055483596047072.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021543872.640689.14370017789605073222.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Implement a function to get/create a directory in the cacheDavid Howells
Implement a function to get/create structural directories in the cache. This is used for setting up a cache and creating volume substructures. The directory in memory are marked with the S_KERNEL_FILE inode flag whilst they're in use to tell rmdir to reject attempts to remove them. Changes ======= ver #3: - Return an indication as to whether the directory was freshly created. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819631182.215744.3322471539523262619.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906933130.143852.962088616746509062.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967141952.1823006.7832985646370603833.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021542169.640689.18266858945694357839.stgit@warthog.procyon.org.uk/ # v4
2022-01-07vfs, cachefiles: Mark a backing file in use with an inode flagDavid Howells
Use an inode flag, S_KERNEL_FILE, to mark that a backing file is in use by the kernel to prevent cachefiles or other kernel services from interfering with that file. Alter rmdir to reject attempts to remove a directory marked with this flag. This is used by cachefiles to prevent cachefilesd from removing them. Using S_SWAPFILE instead isn't really viable as that has other effects in the I/O paths. Changes ======= ver #3: - Check for the object pointer being NULL in the tracepoints rather than the caller. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819630256.215744.4815885535039369574.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906931596.143852.8642051223094013028.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967141000.1823006.12920680657559677789.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021541207.640689.564689725898537127.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Provide a function to check how much space there isDavid Howells
Provide a function to check how much space there is. This also flips the state on the cache and will signal the daemon to inform it of the change and to ask it to do some culling if necessary. We will also need to subtract the amount of data currently being written to the cache (cache->b_writing) from the amount of available space to avoid hitting ENOSPC accidentally. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819629322.215744.13457425294680841213.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906930100.143852.1681026700865762069.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967140058.1823006.7781243664702837128.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021539957.640689.12477177372616805706.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Register a miscdev and parse commands over itDavid Howells
Register a misc device with which to talk to the daemon. The misc device holds a cache set up through it around and closing the device kills the cache. cachefilesd communicates with the kernel by passing it single-line text commands. Parse these and use them to parameterise the cache state. This does not implement the command to actually bring a cache online. That's left for later. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819628388.215744.17712097043607299608.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906929128.143852.14065207858943654011.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967139085.1823006.3514846391807454287.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021538400.640689.9172006906288062041.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Add security derivationDavid Howells
Implement code to derive a new set of creds for the cachefiles to use when making VFS or I/O calls and to change the auditing info since the application interacting with the network filesystem is not accessing the cache directly. Cachefiles uses override_creds() to change the effective creds temporarily. set_security_override_from_ctx() is called to derive the LSM 'label' that the cachefiles driver will act with. set_create_files_as() is called to determine the LSM 'label' that will be applied to files and directories created in the cache. These functions alter the new creds. Also implement a couple of functions to wrap the calls to begin/end cred overriding. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819627469.215744.3603633690679962985.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906928172.143852.15886637013364286786.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967138138.1823006.7620933448261939504.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021537001.640689.4081334436031700558.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Add cache error reporting macroDavid Howells
Add a macro to report a cache I/O error and to tell fscache that the cache is in trouble. Also add a pointer to the fscache cache cookie from the cachefiles_cache struct as we need that to pass to fscache_io_error(). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819626562.215744.1503690975344731661.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906927235.143852.13694625647880837563.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967137158.1823006.2065038830569321335.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021536053.640689.5306822604644352548.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Add a couple of tracepoints for logging errorsDavid Howells
Add two trace points to log errors, one for vfs operations like mkdir or create, and one for I/O operations, like read, write or truncate. Also add the beginnings of a struct that is going to represent a data file and place a debugging ID in it for the tracepoints to record. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819625632.215744.17907340966178411033.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906926297.143852.18267924605548658911.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967135390.1823006.2512120406360156424.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021534029.640689.1875723624947577095.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Add some error injection supportDavid Howells
Add support for injecting ENOSPC or EIO errors. This needs to be enabled by CONFIG_CACHEFILES_ERROR_INJECTION=y. Once enabled, ENOSPC on things like write and mkdir can be triggered by: echo 1 >/proc/sys/cachefiles/error_injection and EIO can be triggered on most operations by: echo 2 >/proc/sys/cachefiles/error_injection Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819624706.215744.6911916249119962943.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906925343.143852.5465695512984025812.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967134412.1823006.7354285948280296595.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021532340.640689.18209494225772443698.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Define structsDavid Howells
Define the cachefiles_cache struct that's going to carry the cache-level parameters and state of a cache. Define the beginning of the cachefiles_object struct that's going to carry the state for a data storage object. For the moment this is just a debugging ID for logging purposes. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819623690.215744.2824739137193655547.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906924292.143852.15881439716653984905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967131405.1823006.4480555941533935597.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021530610.640689.846094074334176928.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Introduce rewritten driverDavid Howells
Introduce basic skeleton of the rewritten cachefiles driver including config options so that it can be enabled for compilation. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819622766.215744.9108359326983195047.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906923341.143852.3856498104256721447.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967130320.1823006.15791456613198441566.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021528993.640689.9069695476048171884.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Provide a function to resize a cookieDavid Howells
Provide a function to change the size of the storage attached to a cookie, to match the size of the file being cached when it's changed by truncate or fallocate: void fscache_resize_cookie(struct fscache_cookie *cookie, loff_t new_size); This acts synchronously and is expected to run under the inode lock of the caller. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819621839.215744.7895597119803515402.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906922387.143852.16394459879816147793.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967128998.1823006.10740669081985775576.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021527861.640689.3466382085497236267.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Provide a function to note the release of a pageDavid Howells
Provide a function to be called from a network filesystem's releasepage method to indicate that a page has been released that might have been a reflection of data upon the server - and now that data must be reloaded from the server or the cache. This is used to end an optimisation for empty files, in particular files that have just been created locally, whereby we know there cannot yet be any data that we would need to read from the server or the cache. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819617128.215744.4725572296135656508.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906920354.143852.7511819614661372008.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967128061.1823006.611781655060034988.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021525963.640689.9264556596205140044.stgit@warthog.procyon.org.uk/ # v4
2022-01-07spi: spi-meson-spifc: Add missing pm_runtime_disable() in meson_spifc_probeMiaoqian Lin
If the probe fails, we should use pm_runtime_disable() to balance pm_runtime_enable(). Add missing pm_runtime_disable() for meson_spifc_probe. Fixes: c3e4bc5434d2 ("spi: meson: Add support for Amlogic Meson SPIFC") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220107075424.7774-1-linmq006@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07spi: atmel: Fix typoQinghua Jin
Change 'actualy' to 'actually' Signed-off-by: Qinghua Jin <qhjin.dev@gmail.com> Link: https://lore.kernel.org/r/20220107024631.396862-1-qhjin.dev@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07regulator: Add MAX20086-MAX20089 driverWatson Chow
The MAX20086-MAX20089 are dual/quad power protectors for cameras. Add a driver that supports controlling the outputs individually. Additional features, such as overcurrent detection, may be added later if needed. Signed-off-by: Watson Chow <watson.chow@avnet.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/20220106224350.16957-3-laurent.pinchart+renesas@ideasonboard.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07dt-bindings: regulators: Add bindings for Maxim MAX20086-MAX20089Laurent Pinchart
The MAX20086-MAX20089 are dual/quad power protectors for cameras. Add corresponding DT bindings. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/20220106224350.16957-2-laurent.pinchart+renesas@ideasonboard.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07btrfs: output more debug messages for uncommitted transactionQu Wenruo
Print extra information about how many dirty bytes an uncommitted has at the end of mount. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: respect the max size in the header when activating swap fileFilipe Manana
If we extended the size of a swapfile after its header was created (by the mkswap utility) and then try to activate it, we will map the entire file when activating the swap file, instead of limiting to the max size defined in the swap file's header. Currently test case generic/643 from fstests fails because we do not respect that size limit defined in the swap file's header. So fix this by not mapping file ranges beyond the max size defined in the swap header. This is the same type of bug that iomap used to have, and was fixed in commit 36ca7943ac18ae ("mm/swap: consider max pages in iomap_swapfile_add_extent"). Fixes: ed46ff3d423780 ("Btrfs: support swap files") CC: stable@vger.kernel.org # 5.4+ Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: fix argument list that the kdoc format and script verifiedYang Li
The warnings were found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/btrfs/extent_io.c:3210: warning: Function parameter or member 'bio_ctrl' not described in 'btrfs_bio_add_page' fs/btrfs/extent_io.c:3210: warning: Excess function parameter 'bio' description in 'btrfs_bio_add_page' fs/btrfs/extent_io.c:3210: warning: Excess function parameter 'prev_bio_flags' description in 'btrfs_bio_add_page' fs/btrfs/space-info.c:1602: warning: Excess function parameter 'root' description in 'btrfs_reserve_metadata_bytes' fs/btrfs/space-info.c:1602: warning: Function parameter or member 'fs_info' not described in 'btrfs_reserve_metadata_bytes' Note: this is fixing only the warnings regarding parameter list, the first line is not strictly conforming to the kdoc format as the btrfs codebase does not stick to that and keeps the first line more free form (because it's only for internal use). Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add note ] Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: remove unnecessary parameter type from compression_decompress_bioSu Yue
btrfs_decompress_bio, the only caller of compression_decompress_bio gets type from @cb and passes it to compression_decompress_bio. However, compression_decompress_bio can get compression type directly from @cb. So remove the parameter and access it through @cb. No functional change. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: selftests: dump extent io tree if extent-io-tree test failedQu Wenruo
When code modifying extent-io-tree get modified and got that selftest failed, it can take some time to pin down the cause. To make it easier to expose the problem, dump the extent io tree if the selftest failed. This can save developers debug time, especially since the selftest we can not use the trace events, thus have to manually add debug trace points. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: scrub: cleanup the argument list of scrub_stripe()Qu Wenruo
The argument list of btrfs_stripe() has similar problems of scrub_chunk(): - Duplicated and ambiguous @base argument Can be fetched from btrfs_block_group::bg. - Ambiguous argument @length It's again device extent length - Ambiguous argument @num The instinctive guess would be mirror number, but in fact it's stripe index. Fix it by: - Remove @base parameter - Rename @length to @dev_extent_len - Rename @num to @stripe_index Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: scrub: cleanup the argument list of scrub_chunk()Qu Wenruo
The argument list of scrub_chunk() has the following problems: - Duplicated @chunk_offset It is the same as btrfs_block_group::start. - Confusing @length The most instinctive guess is chunk length, and one may want to delete it, but the truth is, it's the device extent length. Fix this by: - Remove @chunk_offset Use btrfs_block_group::start instead. - Rename @length to @dev_extent_len Also rename the caller to remove the ambiguous naming. - Rename @cache to @bg The "_cache" suffix for btrfs_block_group has been removed for a while. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: remove reada infrastructureQu Wenruo
Currently there is only one user for btrfs metadata readahead, and that's scrub. But even for the single user, it's not providing the correct functionality it needs, as scrub needs reada for commit root, which current readahead can't provide. (Although it's pretty easy to add such feature). Despite this, there are some extra problems related to metadata readahead: - Duplicated feature with btrfs_path::reada - Partly duplicated feature of btrfs_fs_info::buffer_radix Btrfs already caches its metadata in buffer_radix, while readahead tries to read the tree block no matter if it's already cached. - Poor layer separation Metadata readahead works kinda at device level. This is definitely not the correct layer it should be, since metadata is at btrfs logical address space, it should not bother device at all. This brings extra chance for bugs to sneak in, while brings unnecessary complexity. - Dead code In the very beginning of scrub.c we have #undef DEBUG, rendering all the debug related code useless and unable to test. Thus here I purpose to remove the metadata readahead mechanism completely. [BENCHMARK] There is a full benchmark for the scrub performance difference using the old btrfs_reada_add() and btrfs_path::reada. For the worst case (no dirty metadata, slow HDD), there could be a 5% performance drop for scrub. For other cases (even SATA SSD), there is no distinguishable performance difference. The number is reported scrub speed, in MiB/s. The resolution is limited by the reported duration, which only has a resolution of 1 second. Old New Diff SSD 455.3 466.332 +2.42% HDD 103.927 98.012 -5.69% Comprehensive test methodology is in the cover letter of the patch. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: scrub: use btrfs_path::reada for extent tree readaheadQu Wenruo
For scrub, we trigger two readaheads for two trees, extent tree to get where to scrub, and csum tree to get the data checksum. For csum tree we already trigger readahead in btrfs_lookup_csums_range(), by setting path->reada. But for extent tree we don't have any path based readahead. Add the readahead for extent tree as well, so we can later remove the btrfs_reada_add() based readahead. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: scrub: remove the unnecessary path parameter for scrub_raid56_parity()Qu Wenruo
In function scrub_stripe() we allocated two btrfs_path's, one @path for extent tree search and another @ppath for full stripe extent tree search for RAID56. This is totally umncessary, as the @ppath usage is completely inside scrub_raid56_parity(), thus we can move the path allocation into scrub_raid56_parity() completely. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: refactor unlock_upNikolay Borisov
The purpose of this function is to unlock all nodes in a btrfs path which are above 'lowest_unlock' and whose slot used is different than 0. As such it used slightly awkward structure of 'if' as well as somewhat cryptic "no_skip" control variable which denotes whether we should check the current level of skipability or no. This patch does the following (cosmetic) refactorings: * Renames 'no_skip' to 'check_skip' and makes it a boolean. This variable controls whether we are below the lowest_unlock/skip_level levels. * Consolidates the 2 conditions which warrant checking whether the current level should be skipped under 1 common if (check_skip) branch, this increase indentation level but is not critical. * Consolidates the 'skip_level < i && i >= lowest_unlock' and 'i >= lowest_unlock && i > skip_level' condition into a common branch since those are identical. * Eliminates the local extent_buffer variable as in this case it doesn't bring anything to function readability. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: skip transaction commit after failure to create subvolumeFilipe Manana
At ioctl.c:create_subvol(), when we fail to create a subvolume we always commit the transaction. In most cases this is a no-op, since all the error paths, except for one, abort the transaction - the only exception is when we fail to insert the new root item into the root tree, in that case we don't abort the transaction because we didn't do anything that is irreversible - however we end up committing the transaction which although is not a functional problem, it adds unnecessary rotation of the backup roots in the superblock and unnecessary work. So change that to commit a transaction only when no error happened, otherwise just call btrfs_end_transaction() to release our reference on the transaction. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: fix chunk allocation condition for zoned allocatorNaohiro Aota
The ZNS specification defines a limit on the number of "active" zones. That limit impose us to limit the number of block groups which can be used for an allocation at the same time. Not to exceed the limit, we reuse the existing active block groups as much as possible when we can't activate any other zones without sacrificing an already activated block group in commit a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space"). However, the check is wrong in two ways. First, it checks the condition for every raid index (ffe_ctl->index). Even if it reaches the condition and "ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size" is met, there can be other block groups having enough space to hold ffe_ctl->num_bytes. (Actually, this won't happen in the current zoned code as it only supports SINGLE profile. But, it can happen once it enables other RAID types.) Second, it checks the active zone availability depending on the raid index. The raid index is just an index for space_info->block_groups, so it has nothing to do with chunk allocation. These mistakes are causing a faulty allocation in a certain situation. Consider we are running zoned btrfs on a device whose max_active_zone == 0 (no limit). And, suppose no block group have a room to fit ffe_ctl->num_bytes but some room to meet ffe_ctl->min_alloc_size (i.e. max_extent_size > num_bytes >= min_alloc_size). In this situation, the following occur: - With SINGLE raid_index, it reaches the chunk allocation checking code - The check returns true because we can activate a new zone (no limit) - But, before allocating the chunk, it iterates to the next raid index (RAID5) - Since there are no RAID5 block groups on zoned mode, it again reaches the check code - The check returns false because of btrfs_can_activate_zone()'s "if (raid_index != BTRFS_RAID_SINGLE)" part - That results in returning -ENOSPC without allocating a new chunk As a result, we end up hitting -ENOSPC too early. Move the check to the right place in the can_allocate_chunk() hook, and do the active zone check depending on the allocation flag, not on the raid index. CC: stable@vger.kernel.org # 5.16 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: add extent allocator hook to decide to allocate chunk or notNaohiro Aota
Introduce a new hook for an extent allocator policy. With the new hook, a policy can decide to allocate a new block group or not. If not, it will return -ENOSPC, so btrfs_reserve_extent() will cut the allocation size in half and retry the allocation if min_alloc_size is large enough. The hook has a place holder and will be replaced with the real implementation in the next patch. CC: stable@vger.kernel.org # 5.16 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: unset dedicated block group on allocation failureNaohiro Aota
Allocating an extent from a block group can fail for various reasons. When an allocation from a dedicated block group (for tree-log or relocation data) fails, we need to unregister it as a dedicated one so that we can allocate a new block group for the dedicated one. However, we are returning early when the block group in case it is read-only, fully used, or not be able to activate the zone. As a result, we keep the non-usable block group as a dedicated one, leading to further allocation failure. With many block groups, the allocator will iterate hopeless loop to find a free extent, results in a hung task. Fix the issue by delaying the return and doing the proper cleanups. CC: stable@vger.kernel.org # 5.16 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: drop redundant check for REQ_OP_ZONE_APPEND and btrfs_is_zonedJohannes Thumshirn
REQ_OP_ZONE_APPEND can only work on zoned devices, so it is redundant to check if the filesystem is zoned when REQ_OP_ZONE_APPEND is set as the bio's bio_op. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: sink zone check into btrfs_repair_one_zoneJohannes Thumshirn
Sink zone check into btrfs_repair_one_zone() so we don't need to do it in all callers. Also as btrfs_repair_one_zone() doesn't return a sensible error, make it a boolean function and return false in case it got called on a non-zoned filesystem and true on a zoned filesystem. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: simplify btrfs_check_meta_write_pointerJohannes Thumshirn
btrfs_check_meta_write_pointer() will always be called with a NULL 'cache_ret' argument. As there's no need to check if we have a valid block_group passed in remove these checks. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: encapsulate inode locking for zoned relocationJohannes Thumshirn
Encapsulate the inode lock needed for serializing the data relocation writes on a zoned filesystem into a helper. This streamlines the code reading flow and hides special casing for zoned filesystems. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: sysfs: add devinfo/fsid to retrieve actual fsid from the deviceAnand Jain
In the case of the seed device, the fsid can be different from the mounted sprout fsid. The userland has to read the device superblock to know the fsid but, that idea fails if the device is missing. So add a sysfs interface devinfo/<devid>/fsid to show the fsid of the device. For example: $ cd /sys/fs/btrfs/b10b02a5-f9de-4276-b9e8-2bfd09a578a8 $ cat devinfo/1/fsid c44d771f-639d-4df3-99ec-5bc7ad2af93b $ cat devinfo/3/fsid b10b02a5-f9de-4276-b9e8-2bfd09a578a8 Though it's related to seeding, the name of the sysfs file is plain fsid as it matches what blkid says. A path to the device's fsid will aid scripting. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: reserve extra space for the free space treeJosef Bacik
Filipe reported a problem where sometimes he'd get an ENOSPC abort when running delayed refs with generic/619 and the free space tree enabled. This is partly because we do not reserve space for modifying the free space tree, nor do we have a block rsv associated with that tree. The delayed_refs_rsv tracks the amount of space required to run delayed refs. This means 1 modification means 1 change to the extent root. With the free space tree this turns into 2 changes, because modifying 1 extent means updating the extent tree and potentially updating the free space tree to either remove that entry or add the free space. Thus if we have the FST enabled, simply double the reservation size for our modification. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: include the free space tree in the global rsv minimum calculationJosef Bacik
Filipe reported a problem where generic/619 was failing with an ENOSPC abort while running delayed refs, like the following BTRFS: Transaction aborted (error -28) WARNING: CPU: 3 PID: 522920 at fs/btrfs/free-space-tree.c:1049 add_to_free_space_tree+0xe5/0x110 [btrfs] CPU: 3 PID: 522920 Comm: kworker/u16:19 Tainted: G W 5.16.0-rc2-btrfs-next-106 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] RIP: 0010:add_to_free_space_tree+0xe5/0x110 [btrfs] RSP: 0000:ffffa65087fb7b20 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff9131eeaa RDI: 00000000ffffffff RBP: ffff8d62e26481b8 R08: ffffffff9ad97ce0 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffe4 R13: ffff8d61c25fe688 R14: ffff8d61ebd88800 R15: ffff8d61ebd88a90 FS: 0000000000000000(0000) GS:ffff8d64ed400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa46a8b1000 CR3: 0000000148d18003 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __btrfs_free_extent+0x516/0x950 [btrfs] __btrfs_run_delayed_refs+0x2b1/0x1250 [btrfs] btrfs_run_delayed_refs+0x86/0x210 [btrfs] flush_space+0x403/0x630 [btrfs] ? call_rcu_tasks_generic+0x50/0x80 ? lock_release+0x223/0x4a0 ? btrfs_get_alloc_profile+0xb5/0x290 [btrfs] ? do_raw_spin_unlock+0x4b/0xa0 btrfs_async_reclaim_metadata_space+0x139/0x320 [btrfs] process_one_work+0x24c/0x5b0 worker_thread+0x55/0x3c0 ? process_one_work+0x5b0/0x5b0 kthread+0x17c/0x1a0 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 There's a couple of reasons for this, but in generic/619's case the largest reason is because it is a very small file system, ad we do not reserve enough space for the global reserve. With the free space tree we now have the free space tree that we need to modify when running delayed refs. This means we need the global reserve to take this into account when it calculates the minimum size it needs to be. This is especially important for very small file systems. Fix this by adjusting the minimum global block rsv size math to include the size of the free space tree when calculating the size. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: scrub: merge SCRUB_PAGES_PER_RD_BIO and SCRUB_PAGES_PER_WR_BIOQu Wenruo
These two values were introduced in commit ff023aac3119 ("Btrfs: add code to scrub to copy read data to another disk") as an optimization. But the truth is, block layer scheduler can do whatever it wants to merge/split bios to improve performance. Doing such "optimization" is not really going to affect much, especially considering how good current block layer optimizations are doing. Remove such old and immature optimization from our code. Since we're here, also change BUG_ON()s using these two macros to use ASSERT()s. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: update SCRUB_MAX_PAGES_PER_BLOCKQu Wenruo
Use BTRFS_MAX_METADATA_BLOCKSIZE and SZ_4K (minimal sectorsize) to calculate this value. And remove one stale comment on the value, in fact with recent subpage support, BTRFS_MAX_METADATA_BLOCKSIZE * PAGE_SIZE is already beyond BTRFS_STRIPE_LEN, just we don't use the full page. Also since we're here, update the BUG_ON() related to SCRUB_MAX_PAGES_PER_BLOCK to ASSERT(). As those ASSERT() are really only for developers to catch early obvious bugs, not to let end users suffer. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>