diff options
Diffstat (limited to 'Documentation/filesystems/bcachefs')
-rw-r--r-- | Documentation/filesystems/bcachefs/CodingStyle.rst | 186 | ||||
-rw-r--r-- | Documentation/filesystems/bcachefs/SubmittingPatches.rst | 105 | ||||
-rw-r--r-- | Documentation/filesystems/bcachefs/casefolding.rst | 108 | ||||
-rw-r--r-- | Documentation/filesystems/bcachefs/future/idle_work.rst | 78 | ||||
-rw-r--r-- | Documentation/filesystems/bcachefs/index.rst | 29 |
5 files changed, 505 insertions, 1 deletions
diff --git a/Documentation/filesystems/bcachefs/CodingStyle.rst b/Documentation/filesystems/bcachefs/CodingStyle.rst new file mode 100644 index 000000000000..b29562a6bf55 --- /dev/null +++ b/Documentation/filesystems/bcachefs/CodingStyle.rst @@ -0,0 +1,186 @@ +.. SPDX-License-Identifier: GPL-2.0 + +bcachefs coding style +===================== + +Good development is like gardening, and codebases are our gardens. Tend to them +every day; look for little things that are out of place or in need of tidying. +A little weeding here and there goes a long way; don't wait until things have +spiraled out of control. + +Things don't always have to be perfect - nitpicking often does more harm than +good. But appreciate beauty when you see it - and let people know. + +The code that you are afraid to touch is the code most in need of refactoring. + +A little organizing here and there goes a long way. + +Put real thought into how you organize things. + +Good code is readable code, where the structure is simple and leaves nowhere +for bugs to hide. + +Assertions are one of our most important tools for writing reliable code. If in +the course of writing a patchset you encounter a condition that shouldn't +happen (and will have unpredictable or undefined behaviour if it does), or +you're not sure if it can happen and not sure how to handle it yet - make it a +BUG_ON(). Don't leave undefined or unspecified behavior lurking in the codebase. + +By the time you finish the patchset, you should understand better which +assertions need to be handled and turned into checks with error paths, and +which should be logically impossible. Leave the BUG_ON()s in for the ones which +are logically impossible. (Or, make them debug mode assertions if they're +expensive - but don't turn everything into a debug mode assertion, so that +we're not stuck debugging undefined behaviour should it turn out that you were +wrong). + +Assertions are documentation that can't go out of date. Good assertions are +wonderful. + +Good assertions drastically and dramatically reduce the amount of testing +required to shake out bugs. + +Good assertions are based on state, not logic. To write good assertions, you +have to think about what the invariants on your state are. + +Good invariants and assertions will hold everywhere in your codebase. This +means that you can run them in only a few places in the checked in version, but +should you need to debug something that caused the assertion to fail, you can +quickly shotgun them everywhere to find the codepath that broke the invariant. + +A good assertion checks something that the compiler could check for us, and +elide - if we were working in a language with embedded correctness proofs that +the compiler could check. This is something that exists today, but it'll likely +still be a few decades before it comes to systems programming languages. But we +can still incorporate that kind of thinking into our code and document the +invariants with runtime checks - much like the way people working in +dynamically typed languages may add type annotations, gradually making their +code statically typed. + +Looking for ways to make your assertions simpler - and higher level - will +often nudge you towards making the entire system simpler and more robust. + +Good code is code where you can poke around and see what it's doing - +introspection. We can't debug anything if we can't see what's going on. + +Whenever we're debugging, and the solution isn't immediately obvious, if the +issue is that we don't know where the issue is because we can't see what's +going on - fix that first. + +We have the tools to make anything visible at runtime, efficiently - RCU and +percpu data structures among them. Don't let things stay hidden. + +The most important tool for introspection is the humble pretty printer - in +bcachefs, this means `*_to_text()` functions, which output to printbufs. + +Pretty printers are wonderful, because they compose and you can use them +everywhere. Having functions to print whatever object you're working with will +make your error messages much easier to write (therefore they will actually +exist) and much more informative. And they can be used from sysfs/debugfs, as +well as tracepoints. + +Runtime info and debugging tools should come with clear descriptions and +labels, and good structure - we don't want files with a list of bare integers, +like in procfs. Part of the job of the debugging tools is to educate users and +new developers as to how the system works. + +Error messages should, whenever possible, tell you everything you need to debug +the issue. It's worth putting effort into them. + +Tracepoints shouldn't be the first thing you reach for. They're an important +tool, but always look for more immediate ways to make things visible. When we +have to rely on tracing, we have to know which tracepoints we're looking for, +and then we have to run the troublesome workload, and then we have to sift +through logs. This is a lot of steps to go through when a user is hitting +something, and if it's intermittent it may not even be possible. + +The humble counter is an incredibly useful tool. They're cheap and simple to +use, and many complicated internal operations with lots of things that can +behave weirdly (anything involving memory reclaim, for example) become +shockingly easy to debug once you have counters on every distinct codepath. + +Persistent counters are even better. + +When debugging, try to get the most out of every bug you come across; don't +rush to fix the initial issue. Look for things that will make related bugs +easier the next time around - introspection, new assertions, better error +messages, new debug tools, and do those first. Look for ways to make the system +better behaved; often one bug will uncover several other bugs through +downstream effects. + +Fix all that first, and then the original bug last - even if that means keeping +a user waiting. They'll thank you in the long run, and when they understand +what you're doing you'll be amazed at how patient they're happy to be. Users +like to help - otherwise they wouldn't be reporting the bug in the first place. + +Talk to your users. Don't isolate yourself. + +Users notice all sorts of interesting things, and by just talking to them and +interacting with them you can benefit from their experience. + +Spend time doing support and helpdesk stuff. Don't just write code - code isn't +finished until it's being used trouble free. + +This will also motivate you to make your debugging tools as good as possible, +and perhaps even your documentation, too. Like anything else in life, the more +time you spend at it the better you'll get, and you the developer are the +person most able to improve the tools to make debugging quick and easy. + +Be wary of how you take on and commit to big projects. Don't let development +become product-manager focused. Often time an idea is a good one but needs to +wait for its proper time - but you won't know if it's the proper time for an +idea until you start writing code. + +Expect to throw a lot of things away, or leave them half finished for later. +Nobody writes all perfect code that all gets shipped, and you'll be much more +productive in the long run if you notice this early and shift to something +else. The experience gained and lessons learned will be valuable for all the +other work you do. + +But don't be afraid to tackle projects that require significant rework of +existing code. Sometimes these can be the best projects, because they can lead +us to make existing code more general, more flexible, more multipurpose and +perhaps more robust. Just don't hesitate to abandon the idea if it looks like +it's going to make a mess of things. + +Complicated features can often be done as a series of refactorings, with the +final change that actually implements the feature as a quite small patch at the +end. It's wonderful when this happens, especially when those refactorings are +things that improve the codebase in their own right. When that happens there's +much less risk of wasted effort if the feature you were going for doesn't work +out. + +Always strive to work incrementally. Always strive to turn the big projects +into little bite sized projects that can prove their own merits. + +Instead of always tackling those big projects, look for little things that +will be useful, and make the big projects easier. + +The question of what's likely to be useful is where junior developers most +often go astray - doing something because it seems like it'll be useful often +leads to overengineering. Knowing what's useful comes from many years of +experience, or talking with people who have that experience - or from simply +reading lots of code and looking for common patterns and issues. Don't be +afraid to throw things away and do something simpler. + +Talk about your ideas with your fellow developers; often times the best things +come from relaxed conversations where people aren't afraid to say "what if?". + +Don't neglect your tools. + +The most important tools (besides the compiler and our text editor) are the +tools we use for testing. The shortest possible edit/test/debug cycle is +essential for working productively. We learn, gain experience, and discover the +errors in our thinking by running our code and seeing what happens. If your +time is being wasted because your tools are bad or too slow - don't accept it, +fix it. + +Put effort into your documentation, commit messages, and code comments - but +don't go overboard. A good commit message is wonderful - but if the information +was important enough to go in a commit message, ask yourself if it would be +even better as a code comment. + +A good code comment is wonderful, but even better is the comment that didn't +need to exist because the code was so straightforward as to be obvious; +organized into small clean and tidy modules, with clear and descriptive names +for functions and variables, where every line of code has a clear purpose. diff --git a/Documentation/filesystems/bcachefs/SubmittingPatches.rst b/Documentation/filesystems/bcachefs/SubmittingPatches.rst new file mode 100644 index 000000000000..18c79d548391 --- /dev/null +++ b/Documentation/filesystems/bcachefs/SubmittingPatches.rst @@ -0,0 +1,105 @@ +Submitting patches to bcachefs +============================== + +Here are suggestions for submitting patches to bcachefs subsystem. + +Submission checklist +-------------------- + +Patches must be tested before being submitted, either with the xfstests suite +[0]_, or the full bcachefs test suite in ktest [1]_, depending on what's being +touched. Note that ktest wraps xfstests and will be an easier method to running +it for most users; it includes single-command wrappers for all the mainstream +in-kernel local filesystems. + +Patches will undergo more testing after being merged (including +lockdep/kasan/preempt/etc. variants), these are not generally required to be +run by the submitter - but do put some thought into what you're changing and +which tests might be relevant, e.g. are you dealing with tricky memory layout +work? kasan, are you doing locking work? then lockdep; and ktest includes +single-command variants for the debug build types you'll most likely need. + +The exception to this rule is incomplete WIP/RFC patches: if you're working on +something nontrivial, it's encouraged to send out a WIP patch to let people +know what you're doing and make sure you're on the right track. Just make sure +it includes a brief note as to what's done and what's incomplete, to avoid +confusion. + +Rigorous checkpatch.pl adherence is not required (many of its warnings are +considered out of date), but try not to deviate too much without reason. + +Focus on writing code that reads well and is organized well; code should be +aesthetically pleasing. + +CI +-- + +Instead of running your tests locally, when running the full test suite it's +preferable to let a server farm do it in parallel, and then have the results +in a nice test dashboard (which can tell you which failures are new, and +presents results in a git log view, avoiding the need for most bisecting). + +That exists [2]_, and community members may request an account. If you work for +a big tech company, you'll need to help out with server costs to get access - +but the CI is not restricted to running bcachefs tests: it runs any ktest test +(which generally makes it easy to wrap other tests that can run in qemu). + +Other things to think about +--------------------------- + +- How will we debug this code? Is there sufficient introspection to diagnose + when something starts acting wonky on a user machine? + + We don't necessarily need every single field of every data structure visible + with introspection, but having the important fields of all the core data + types wired up makes debugging drastically easier - a bit of thoughtful + foresight greatly reduces the need to have people build custom kernels with + debug patches. + + More broadly, think about all the debug tooling that might be needed. + +- Does it make the codebase more or less of a mess? Can we also try to do some + organizing, too? + +- Do new tests need to be written? New assertions? How do we know and verify + that the code is correct, and what happens if something goes wrong? + + We don't yet have automated code coverage analysis or easy fault injection - + but for now, pretend we did and ask what they might tell us. + + Assertions are hugely important, given that we don't yet have a systems + language that can do ergonomic embedded correctness proofs. Hitting an assert + in testing is much better than wandering off into undefined behaviour la-la + land - use them. Use them judiciously, and not as a replacement for proper + error handling, but use them. + +- Does it need to be performance tested? Should we add new performance counters? + + bcachefs has a set of persistent runtime counters which can be viewed with + the 'bcachefs fs top' command; this should give users a basic idea of what + their filesystem is currently doing. If you're doing a new feature or looking + at old code, think if anything should be added. + +- If it's a new on disk format feature - have upgrades and downgrades been + tested? (Automated tests exists but aren't in the CI, due to the hassle of + disk image management; coordinate to have them run.) + +Mailing list, IRC +----------------- + +Patches should hit the list [3]_, but much discussion and code review happens +on IRC as well [4]_; many people appreciate the more conversational approach +and quicker feedback. + +Additionally, we have a lively user community doing excellent QA work, which +exists primarily on IRC. Please make use of that resource; user feedback is +important for any nontrivial feature, and documenting it in commit messages +would be a good idea. + +.. rubric:: References + +.. [0] git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git +.. [1] https://evilpiepirate.org/git/ktest.git/ +.. [2] https://evilpiepirate.org/~testdashboard/ci/ +.. [3] linux-bcachefs@vger.kernel.org +.. [4] irc.oftc.net#bcache, #bcachefs-dev diff --git a/Documentation/filesystems/bcachefs/casefolding.rst b/Documentation/filesystems/bcachefs/casefolding.rst new file mode 100644 index 000000000000..871a38f557e8 --- /dev/null +++ b/Documentation/filesystems/bcachefs/casefolding.rst @@ -0,0 +1,108 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Casefolding +=========== + +bcachefs has support for case-insensitive file and directory +lookups using the regular `chattr +F` (`S_CASEFOLD`, `FS_CASEFOLD_FL`) +casefolding attributes. + +The main usecase for casefolding is compatibility with software written +against other filesystems that rely on casefolded lookups +(eg. NTFS and Wine/Proton). +Taking advantage of file-system level casefolding can lead to great +loading time gains in many applications and games. + +Casefolding support requires a kernel with the `CONFIG_UNICODE` enabled. +Once a directory has been flagged for casefolding, a feature bit +is enabled on the superblock which marks the filesystem as using +casefolding. +When the feature bit for casefolding is enabled, it is no longer possible +to mount that filesystem on kernels without `CONFIG_UNICODE` enabled. + +On the lookup/query side: casefolding is implemented by allocating a new +string of `BCH_NAME_MAX` length using the `utf8_casefold` function to +casefold the query string. + +On the dirent side: casefolding is implemented by ensuring the `bkey`'s +hash is made from the casefolded string and storing the cached casefolded +name with the regular name in the dirent. + +The structure looks like this: + +* Regular: [dirent data][regular name][nul][nul]... +* Casefolded: [dirent data][reg len][cf len][regular name][casefolded name][nul][nul]... + +(Do note, the number of NULs here is merely for illustration; their count can +vary per-key, and they may not even be present if the key is aligned to +`sizeof(u64)`.) + +This is efficient as it means that for all file lookups that require casefolding, +it has identical performance to a regular lookup: +a hash comparison and a `memcmp` of the name. + +Rationale +--------- + +Several designs were considered for this system: +One was to introduce a dirent_v2, however that would be painful especially as +the hash system only has support for a single key type. This would also need +`BCH_NAME_MAX` to change between versions, and a new feature bit. + +Another option was to store without the two lengths, and just take the length of +the regular name and casefolded name contiguously / 2 as the length. This would +assume that the regular length == casefolded length, but that could potentially +not be true, if the uppercase unicode glyph had a different UTF-8 encoding than +the lowercase unicode glyph. +It would be possible to disregard the casefold cache for those cases, but it was +decided to simply encode the two string lengths in the key to avoid random +performance issues if this edgecase was ever hit. + +The option settled on was to use a free-bit in d_type to mark a dirent as having +a casefold cache, and then treat the first 4 bytes the name block as lengths. +You can see this in the `d_cf_name_block` member of union in `bch_dirent`. + +The feature bit was used to allow casefolding support to be enabled for the majority +of users, but some allow users who have no need for the feature to still use bcachefs as +`CONFIG_UNICODE` can increase the kernel side a significant amount due to the tables used, +which may be decider between using bcachefs for eg. embedded platforms. + +Other filesystems like ext4 and f2fs have a super-block level option for casefolding +encoding, but bcachefs currently does not provide this. ext4 and f2fs do not expose +any encodings than a single UTF-8 version. When future encodings are desirable, +they will be added trivially using the opts mechanism. + +dentry/dcache considerations +---------------------------- + +Currently, in casefolded directories, bcachefs (like other filesystems) will not cache +negative dentry's. + +This is because currently doing so presents a problem in the following scenario: + + - Lookup file "blAH" in a casefolded directory + - Creation of file "BLAH" in a casefolded directory + - Lookup file "blAH" in a casefolded directory + +This would fail if negative dentry's were cached. + +This is slightly suboptimal, but could be fixed in future with some vfs work. + + +References +---------- + +(from Peter Anvin, on the list) + +It is worth noting that Microsoft has basically declared their +"recommended" case folding (upcase) table to be permanently frozen (for +new filesystem instances in the case where they use an on-disk +translation table created at format time.) As far as I know they have +never supported anything other than 1:1 conversion of BMP code points, +nor normalization. + +The exFAT specification enumerates the full recommended upcase table, +although in a somewhat annoying format (basically a hex dump of +compressed data): + +https://learn.microsoft.com/en-us/windows/win32/fileio/exfat-specification diff --git a/Documentation/filesystems/bcachefs/future/idle_work.rst b/Documentation/filesystems/bcachefs/future/idle_work.rst new file mode 100644 index 000000000000..59a332509dcd --- /dev/null +++ b/Documentation/filesystems/bcachefs/future/idle_work.rst @@ -0,0 +1,78 @@ +Idle/background work classes design doc: + +Right now, our behaviour at idle isn't ideal, it was designed for servers that +would be under sustained load, to keep pending work at a "medium" level, to +let work build up so we can process it in more efficient batches, while also +giving headroom for bursts in load. + +But for desktops or mobile - scenarios where work is less sustained and power +usage is more important - we want to operate differently, with a "rush to +idle" so the system can go to sleep. We don't want to be dribbling out +background work while the system should be idle. + +The complicating factor is that there are a number of background tasks, which +form a heirarchy (or a digraph, depending on how you divide it up) - one +background task may generate work for another. + +Thus proper idle detection needs to model this heirarchy. + +- Foreground writes +- Page cache writeback +- Copygc, rebalance +- Journal reclaim + +When we implement idle detection and rush to idle, we need to be careful not +to disturb too much the existing behaviour that works reasonably well when the +system is under sustained load (or perhaps improve it in the case of +rebalance, which currently does not actively attempt to let work batch up). + +SUSTAINED LOAD REGIME +--------------------- + +When the system is under continuous load, we want these jobs to run +continuously - this is perhaps best modelled with a P/D controller, where +they'll be trying to keep a target value (i.e. fragmented disk space, +available journal space) roughly in the middle of some range. + +The goal under sustained load is to balance our ability to handle load spikes +without running out of x resource (free disk space, free space in the +journal), while also letting some work accumululate to be batched (or become +unnecessary). + +For example, we don't want to run copygc too aggressively, because then it +will be evacuating buckets that would have become empty (been overwritten or +deleted) anyways, and we don't want to wait until we're almost out of free +space because then the system will behave unpredicably - suddenly we're doing +a lot more work to service each write and the system becomes much slower. + +IDLE REGIME +----------- + +When the system becomes idle, we should start flushing our pending work +quicker so the system can go to sleep. + +Note that the definition of "idle" depends on where in the heirarchy a task +is - a task should start flushing work more quickly when the task above it has +stopped generating new work. + +e.g. rebalance should start flushing more quickly when page cache writeback is +idle, and journal reclaim should only start flushing more quickly when both +copygc and rebalance are idle. + +It's important to let work accumulate when more work is still incoming and we +still have room, because flushing is always more efficient if we let it batch +up. New writes may overwrite data before rebalance moves it, and tasks may be +generating more updates for the btree nodes that journal reclaim needs to flush. + +On idle, how much work we do at each interval should be proportional to the +length of time we have been idle for. If we're idle only for a short duration, +we shouldn't flush everything right away; the system might wake up and start +generating new work soon, and flushing immediately might end up doing a lot of +work that would have been unnecessary if we'd allowed things to batch more. + +To summarize, we will need: + + - A list of classes for background tasks that generate work, which will + include one "foreground" class. + - Tracking for each class - "Am I doing work, or have I gone to sleep?" + - And each class should check the class above it when deciding how much work to issue. diff --git a/Documentation/filesystems/bcachefs/index.rst b/Documentation/filesystems/bcachefs/index.rst index e2bd61ccd96f..e5c4c2120b93 100644 --- a/Documentation/filesystems/bcachefs/index.rst +++ b/Documentation/filesystems/bcachefs/index.rst @@ -4,8 +4,35 @@ bcachefs Documentation ====================== +Subsystem-specific development process notes +-------------------------------------------- + +Development notes specific to bcachefs. These are intended to supplement +:doc:`general kernel development handbook </process/index>`. + .. toctree:: - :maxdepth: 2 + :maxdepth: 1 :numbered: + CodingStyle + SubmittingPatches + +Filesystem implementation +------------------------- + +Documentation for filesystem features and their implementation details. +At this moment, only a few of these are described here. + +.. toctree:: + :maxdepth: 1 + :numbered: + + casefolding errorcodes + +Future design +------------- +.. toctree:: + :maxdepth: 1 + + future/idle_work |