summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-07x86/sgx: Fix NULL pointer dereference on non-SGX systemsDave Hansen
== Problem == Nathan Chancellor reported an oops when aceessing the 'sgx_total_bytes' sysfs file: https://lore.kernel.org/all/YbzhBrimHGGpddDM@archlinux-ax161/ The sysfs output code accesses the sgx_numa_nodes[] array unconditionally. However, this array is allocated during SGX initialization, which only occurs on systems where SGX is supported. If the sysfs file is accessed on systems without SGX support, sgx_numa_nodes[] is NULL and an oops occurs. == Solution == To fix this, hide the entire nodeX/x86/ attribute group on systems without SGX support using the ->is_visible attribute group callback. Unfortunately, SGX is initialized via a device_initcall() which occurs _after_ the ->is_visible() callback. Instead of moving SGX initialization earlier, call sysfs_update_group() during SGX initialization to update the group visiblility. This update requires moving the SGX sysfs code earlier in sgx/main.c. There are no code changes other than the addition of arch_update_sysfs_visibility() and a minor whitespace fixup to arch_node_attr_is_visible() which checkpatch caught. CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-sgx@vger.kernel.org Cc: x86@kernel.org Fixes: 50468e431335 ("x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/20220104171527.5E8416A8@davehans-spike.ostc.intel.com
2022-01-07sch_cake: revise Diffserv docsKevin Bracey
Documentation incorrectly stated that CS1 is equivalent to LE for diffserv8. But when LE was added to the table, CS1 was pushed into tin 1, leaving only LE in tin 0. Also "TOS1" no longer exists, as that is the same codepoint as LE. Make other tweaks properly distinguishing codepoints from classes and putting current Diffserve codepoints ahead of legacy ones. Signed-off-by: Kevin Bracey <kevin@bracey.fi> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220106215637.3132391-1-kevin@bracey.fi Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07scripts: sphinx-pre-install: Fix ctex support on DebianMauro Carvalho Chehab
The name of the package with ctexhook.sty is different on Debian/Ubuntu. Reported-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Tested-by: Akira Yokosawa <akiyks@gmail.com> Link: https://lore.kernel.org/r/63882425609a2820fac78f5e94620abeb7ed5f6f.1641429634.git.mchehab@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07docs: discourage use of list tablesJonathan Corbet
Our documentation encourages the use of list-table formats, but that advice runs counter to the objective of keeping the plain-text documentation as useful and readable as possible. Turn that advice around the other way so that people don't keep adding these tables. Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07docs: 5.Posting.rst: describe Fixes: and Link: tagsThorsten Leemhuis
Explain Fixes: and Link: tags in Documentation/process/5.Posting.rst, which are missing in this file for unknown reasons and only described in Documentation/process/submitting-patches.rst. Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info> CC: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://lore.kernel.org/r/c4a5f5e25fa84b26fd383bba6eafde4ab57c9de7.1641314856.git.linux@leemhuis.info Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07Documentation: kgdb: Replace deprecated remotebaudChristian Löhle
Using set remotebaud to set the baud rate was deprecated in gdb-7.7 and completely removed from the command parser in gdb-7.8 (released in 2014). Adopt set serial baud instead. Signed-off-by: Christian Loehle <cloehle@hyperstone.com> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/4050689967ed46baaa3bfadda53a0e73@hyperstone.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07docs: automarkup.py: Fix invalid HTML link output and broken URI fragmentsJames Clark
Since commit d18b01789ae5 ("docs: Add automatic cross-reference for documentation pages"), references that were already explicitly defined with "ref:" and referred to other pages with a path have been doubled. This is reported as the following error by Firefox: Start tag "a" seen but an element of the same type was already open. End tag "a" violates nesting rules. As well as the invalid HTML, this also obscures the URI fragment links to subsections because the second link overrides the first. For example on the page admin-guide/hw-vuln/mds.html the last link should be to the "Default Mitigations" subsection using a # URI fragment: admin-guide/hw-vuln/l1tf.html#default-mitigations But it is obsured by a second link to the whole page: admin-guide/hw-vuln/l1tf.html The full HTML with the double <a> tags looks like this: <a class="reference internal" href="l1tf.html#default-mitigations"> <span class="std std-ref"> <a class="reference internal" href="l1tf.html"> <span class="doc">L1TF - L1 Terminal Fault</span> </a> </span> </a> After this commit, there is only a single link: <a class="reference internal" href="l1tf.html#default-mitigations"> <span class="std std-ref">Documentation/admin-guide/hw-vuln//l1tf.rst</span> </a> Now that the second link is removed, the browser correctly jumps to the default-mitigations subsection when clicking the link. The fix is to check that nodes in the document to be modified are not already references. A reference is counted as any text that is a descendant of a reference type node. Only plain text should be converted to new references, otherwise the doubling occurs. Testing ======= * Test that the build stdout is the same (ignoring ordering), and that no new warnings are printed. * Diff all .html files and check that the only modifications occur to the bad double links. * The auto linking of bare references to pages without "ref:" is still working. Fixes: d18b01789ae5 ("docs: Add automatic cross-reference for documentation pages") Reviewed-by: Nícolas F. R. A. Prado <n@nfraprado.net> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20220105143640.330602-2-james.clark@arm.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07netrom: fix api breakage in nr_setsockopt()Dan Carpenter
This needs to copy an unsigned int from user space instead of a long to avoid breaking user space with an API change. I have updated all the integer overflow checks from ULONG to UINT as well. This is a slight API change but I do not expect it to affect anything in real life. Fixes: 3087a6f36ee0 ("netrom: fix copying in user data in nr_setsockopt") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07ax25: uninitialized variable in ax25_setsockopt()Dan Carpenter
The "opt" variable is unsigned long but we only copy 4 bytes from the user so the lower 4 bytes are uninitialized. I have changed the integer overflow checks from ULONG to UINT as well. This is a slight API change but I don't expect it to break anything. Fixes: a7b75c5a8c41 ("net: pass a sockptr_t into ->setsockopt") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07Merge branch 'octeontx2-ptp-bugs'David S. Miller
Subbaraya Sundeep says: ==================== octeontx2: Fix PTP bugs This patchset addresses two problems found when using ptp. Patch 1 - Increases the refcount of ptp device before use which was missing and it lead to refcount increment after use bug when module is loaded and unloaded couple of times. Patch 2 - PTP resources allocated by VF are not being freed during VF teardown. This patch fixes that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07octeontx2-nicvf: Free VF PTP resources.Rakesh Babu Saladi
When a VF is removed respective PTP resources are not being freed currently. This patch fixes it. Fixes: 43510ef4ddad ("octeontx2-nicvf: Add PTP hardware clock support to NIX VF") Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07octeontx2-af: Increment ptp refcount before useSubbaraya Sundeep
Before using the ptp pci device by AF driver increment the reference count of it. Fixes: a8b90c9d26d6 ("octeontx2-af: Add PTP device id for CN10K and 95O silcons") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07spi: spi-meson-spifc: Add missing pm_runtime_disable() in meson_spifc_probeMiaoqian Lin
If the probe fails, we should use pm_runtime_disable() to balance pm_runtime_enable(). Add missing pm_runtime_disable() for meson_spifc_probe. Fixes: c3e4bc5434d2 ("spi: meson: Add support for Amlogic Meson SPIFC") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220107075424.7774-1-linmq006@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07spi: atmel: Fix typoQinghua Jin
Change 'actualy' to 'actually' Signed-off-by: Qinghua Jin <qhjin.dev@gmail.com> Link: https://lore.kernel.org/r/20220107024631.396862-1-qhjin.dev@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07regulator: Add MAX20086-MAX20089 driverWatson Chow
The MAX20086-MAX20089 are dual/quad power protectors for cameras. Add a driver that supports controlling the outputs individually. Additional features, such as overcurrent detection, may be added later if needed. Signed-off-by: Watson Chow <watson.chow@avnet.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/20220106224350.16957-3-laurent.pinchart+renesas@ideasonboard.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07dt-bindings: regulators: Add bindings for Maxim MAX20086-MAX20089Laurent Pinchart
The MAX20086-MAX20089 are dual/quad power protectors for cameras. Add corresponding DT bindings. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/20220106224350.16957-2-laurent.pinchart+renesas@ideasonboard.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07btrfs: output more debug messages for uncommitted transactionQu Wenruo
Print extra information about how many dirty bytes an uncommitted has at the end of mount. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: respect the max size in the header when activating swap fileFilipe Manana
If we extended the size of a swapfile after its header was created (by the mkswap utility) and then try to activate it, we will map the entire file when activating the swap file, instead of limiting to the max size defined in the swap file's header. Currently test case generic/643 from fstests fails because we do not respect that size limit defined in the swap file's header. So fix this by not mapping file ranges beyond the max size defined in the swap header. This is the same type of bug that iomap used to have, and was fixed in commit 36ca7943ac18ae ("mm/swap: consider max pages in iomap_swapfile_add_extent"). Fixes: ed46ff3d423780 ("Btrfs: support swap files") CC: stable@vger.kernel.org # 5.4+ Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: fix argument list that the kdoc format and script verifiedYang Li
The warnings were found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/btrfs/extent_io.c:3210: warning: Function parameter or member 'bio_ctrl' not described in 'btrfs_bio_add_page' fs/btrfs/extent_io.c:3210: warning: Excess function parameter 'bio' description in 'btrfs_bio_add_page' fs/btrfs/extent_io.c:3210: warning: Excess function parameter 'prev_bio_flags' description in 'btrfs_bio_add_page' fs/btrfs/space-info.c:1602: warning: Excess function parameter 'root' description in 'btrfs_reserve_metadata_bytes' fs/btrfs/space-info.c:1602: warning: Function parameter or member 'fs_info' not described in 'btrfs_reserve_metadata_bytes' Note: this is fixing only the warnings regarding parameter list, the first line is not strictly conforming to the kdoc format as the btrfs codebase does not stick to that and keeps the first line more free form (because it's only for internal use). Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add note ] Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: remove unnecessary parameter type from compression_decompress_bioSu Yue
btrfs_decompress_bio, the only caller of compression_decompress_bio gets type from @cb and passes it to compression_decompress_bio. However, compression_decompress_bio can get compression type directly from @cb. So remove the parameter and access it through @cb. No functional change. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: selftests: dump extent io tree if extent-io-tree test failedQu Wenruo
When code modifying extent-io-tree get modified and got that selftest failed, it can take some time to pin down the cause. To make it easier to expose the problem, dump the extent io tree if the selftest failed. This can save developers debug time, especially since the selftest we can not use the trace events, thus have to manually add debug trace points. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: scrub: cleanup the argument list of scrub_stripe()Qu Wenruo
The argument list of btrfs_stripe() has similar problems of scrub_chunk(): - Duplicated and ambiguous @base argument Can be fetched from btrfs_block_group::bg. - Ambiguous argument @length It's again device extent length - Ambiguous argument @num The instinctive guess would be mirror number, but in fact it's stripe index. Fix it by: - Remove @base parameter - Rename @length to @dev_extent_len - Rename @num to @stripe_index Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: scrub: cleanup the argument list of scrub_chunk()Qu Wenruo
The argument list of scrub_chunk() has the following problems: - Duplicated @chunk_offset It is the same as btrfs_block_group::start. - Confusing @length The most instinctive guess is chunk length, and one may want to delete it, but the truth is, it's the device extent length. Fix this by: - Remove @chunk_offset Use btrfs_block_group::start instead. - Rename @length to @dev_extent_len Also rename the caller to remove the ambiguous naming. - Rename @cache to @bg The "_cache" suffix for btrfs_block_group has been removed for a while. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: remove reada infrastructureQu Wenruo
Currently there is only one user for btrfs metadata readahead, and that's scrub. But even for the single user, it's not providing the correct functionality it needs, as scrub needs reada for commit root, which current readahead can't provide. (Although it's pretty easy to add such feature). Despite this, there are some extra problems related to metadata readahead: - Duplicated feature with btrfs_path::reada - Partly duplicated feature of btrfs_fs_info::buffer_radix Btrfs already caches its metadata in buffer_radix, while readahead tries to read the tree block no matter if it's already cached. - Poor layer separation Metadata readahead works kinda at device level. This is definitely not the correct layer it should be, since metadata is at btrfs logical address space, it should not bother device at all. This brings extra chance for bugs to sneak in, while brings unnecessary complexity. - Dead code In the very beginning of scrub.c we have #undef DEBUG, rendering all the debug related code useless and unable to test. Thus here I purpose to remove the metadata readahead mechanism completely. [BENCHMARK] There is a full benchmark for the scrub performance difference using the old btrfs_reada_add() and btrfs_path::reada. For the worst case (no dirty metadata, slow HDD), there could be a 5% performance drop for scrub. For other cases (even SATA SSD), there is no distinguishable performance difference. The number is reported scrub speed, in MiB/s. The resolution is limited by the reported duration, which only has a resolution of 1 second. Old New Diff SSD 455.3 466.332 +2.42% HDD 103.927 98.012 -5.69% Comprehensive test methodology is in the cover letter of the patch. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: scrub: use btrfs_path::reada for extent tree readaheadQu Wenruo
For scrub, we trigger two readaheads for two trees, extent tree to get where to scrub, and csum tree to get the data checksum. For csum tree we already trigger readahead in btrfs_lookup_csums_range(), by setting path->reada. But for extent tree we don't have any path based readahead. Add the readahead for extent tree as well, so we can later remove the btrfs_reada_add() based readahead. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: scrub: remove the unnecessary path parameter for scrub_raid56_parity()Qu Wenruo
In function scrub_stripe() we allocated two btrfs_path's, one @path for extent tree search and another @ppath for full stripe extent tree search for RAID56. This is totally umncessary, as the @ppath usage is completely inside scrub_raid56_parity(), thus we can move the path allocation into scrub_raid56_parity() completely. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: refactor unlock_upNikolay Borisov
The purpose of this function is to unlock all nodes in a btrfs path which are above 'lowest_unlock' and whose slot used is different than 0. As such it used slightly awkward structure of 'if' as well as somewhat cryptic "no_skip" control variable which denotes whether we should check the current level of skipability or no. This patch does the following (cosmetic) refactorings: * Renames 'no_skip' to 'check_skip' and makes it a boolean. This variable controls whether we are below the lowest_unlock/skip_level levels. * Consolidates the 2 conditions which warrant checking whether the current level should be skipped under 1 common if (check_skip) branch, this increase indentation level but is not critical. * Consolidates the 'skip_level < i && i >= lowest_unlock' and 'i >= lowest_unlock && i > skip_level' condition into a common branch since those are identical. * Eliminates the local extent_buffer variable as in this case it doesn't bring anything to function readability. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: skip transaction commit after failure to create subvolumeFilipe Manana
At ioctl.c:create_subvol(), when we fail to create a subvolume we always commit the transaction. In most cases this is a no-op, since all the error paths, except for one, abort the transaction - the only exception is when we fail to insert the new root item into the root tree, in that case we don't abort the transaction because we didn't do anything that is irreversible - however we end up committing the transaction which although is not a functional problem, it adds unnecessary rotation of the backup roots in the superblock and unnecessary work. So change that to commit a transaction only when no error happened, otherwise just call btrfs_end_transaction() to release our reference on the transaction. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: fix chunk allocation condition for zoned allocatorNaohiro Aota
The ZNS specification defines a limit on the number of "active" zones. That limit impose us to limit the number of block groups which can be used for an allocation at the same time. Not to exceed the limit, we reuse the existing active block groups as much as possible when we can't activate any other zones without sacrificing an already activated block group in commit a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space"). However, the check is wrong in two ways. First, it checks the condition for every raid index (ffe_ctl->index). Even if it reaches the condition and "ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size" is met, there can be other block groups having enough space to hold ffe_ctl->num_bytes. (Actually, this won't happen in the current zoned code as it only supports SINGLE profile. But, it can happen once it enables other RAID types.) Second, it checks the active zone availability depending on the raid index. The raid index is just an index for space_info->block_groups, so it has nothing to do with chunk allocation. These mistakes are causing a faulty allocation in a certain situation. Consider we are running zoned btrfs on a device whose max_active_zone == 0 (no limit). And, suppose no block group have a room to fit ffe_ctl->num_bytes but some room to meet ffe_ctl->min_alloc_size (i.e. max_extent_size > num_bytes >= min_alloc_size). In this situation, the following occur: - With SINGLE raid_index, it reaches the chunk allocation checking code - The check returns true because we can activate a new zone (no limit) - But, before allocating the chunk, it iterates to the next raid index (RAID5) - Since there are no RAID5 block groups on zoned mode, it again reaches the check code - The check returns false because of btrfs_can_activate_zone()'s "if (raid_index != BTRFS_RAID_SINGLE)" part - That results in returning -ENOSPC without allocating a new chunk As a result, we end up hitting -ENOSPC too early. Move the check to the right place in the can_allocate_chunk() hook, and do the active zone check depending on the allocation flag, not on the raid index. CC: stable@vger.kernel.org # 5.16 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: add extent allocator hook to decide to allocate chunk or notNaohiro Aota
Introduce a new hook for an extent allocator policy. With the new hook, a policy can decide to allocate a new block group or not. If not, it will return -ENOSPC, so btrfs_reserve_extent() will cut the allocation size in half and retry the allocation if min_alloc_size is large enough. The hook has a place holder and will be replaced with the real implementation in the next patch. CC: stable@vger.kernel.org # 5.16 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: unset dedicated block group on allocation failureNaohiro Aota
Allocating an extent from a block group can fail for various reasons. When an allocation from a dedicated block group (for tree-log or relocation data) fails, we need to unregister it as a dedicated one so that we can allocate a new block group for the dedicated one. However, we are returning early when the block group in case it is read-only, fully used, or not be able to activate the zone. As a result, we keep the non-usable block group as a dedicated one, leading to further allocation failure. With many block groups, the allocator will iterate hopeless loop to find a free extent, results in a hung task. Fix the issue by delaying the return and doing the proper cleanups. CC: stable@vger.kernel.org # 5.16 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: drop redundant check for REQ_OP_ZONE_APPEND and btrfs_is_zonedJohannes Thumshirn
REQ_OP_ZONE_APPEND can only work on zoned devices, so it is redundant to check if the filesystem is zoned when REQ_OP_ZONE_APPEND is set as the bio's bio_op. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: sink zone check into btrfs_repair_one_zoneJohannes Thumshirn
Sink zone check into btrfs_repair_one_zone() so we don't need to do it in all callers. Also as btrfs_repair_one_zone() doesn't return a sensible error, make it a boolean function and return false in case it got called on a non-zoned filesystem and true on a zoned filesystem. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: simplify btrfs_check_meta_write_pointerJohannes Thumshirn
btrfs_check_meta_write_pointer() will always be called with a NULL 'cache_ret' argument. As there's no need to check if we have a valid block_group passed in remove these checks. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: zoned: encapsulate inode locking for zoned relocationJohannes Thumshirn
Encapsulate the inode lock needed for serializing the data relocation writes on a zoned filesystem into a helper. This streamlines the code reading flow and hides special casing for zoned filesystems. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: sysfs: add devinfo/fsid to retrieve actual fsid from the deviceAnand Jain
In the case of the seed device, the fsid can be different from the mounted sprout fsid. The userland has to read the device superblock to know the fsid but, that idea fails if the device is missing. So add a sysfs interface devinfo/<devid>/fsid to show the fsid of the device. For example: $ cd /sys/fs/btrfs/b10b02a5-f9de-4276-b9e8-2bfd09a578a8 $ cat devinfo/1/fsid c44d771f-639d-4df3-99ec-5bc7ad2af93b $ cat devinfo/3/fsid b10b02a5-f9de-4276-b9e8-2bfd09a578a8 Though it's related to seeding, the name of the sysfs file is plain fsid as it matches what blkid says. A path to the device's fsid will aid scripting. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: reserve extra space for the free space treeJosef Bacik
Filipe reported a problem where sometimes he'd get an ENOSPC abort when running delayed refs with generic/619 and the free space tree enabled. This is partly because we do not reserve space for modifying the free space tree, nor do we have a block rsv associated with that tree. The delayed_refs_rsv tracks the amount of space required to run delayed refs. This means 1 modification means 1 change to the extent root. With the free space tree this turns into 2 changes, because modifying 1 extent means updating the extent tree and potentially updating the free space tree to either remove that entry or add the free space. Thus if we have the FST enabled, simply double the reservation size for our modification. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: include the free space tree in the global rsv minimum calculationJosef Bacik
Filipe reported a problem where generic/619 was failing with an ENOSPC abort while running delayed refs, like the following BTRFS: Transaction aborted (error -28) WARNING: CPU: 3 PID: 522920 at fs/btrfs/free-space-tree.c:1049 add_to_free_space_tree+0xe5/0x110 [btrfs] CPU: 3 PID: 522920 Comm: kworker/u16:19 Tainted: G W 5.16.0-rc2-btrfs-next-106 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] RIP: 0010:add_to_free_space_tree+0xe5/0x110 [btrfs] RSP: 0000:ffffa65087fb7b20 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff9131eeaa RDI: 00000000ffffffff RBP: ffff8d62e26481b8 R08: ffffffff9ad97ce0 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffe4 R13: ffff8d61c25fe688 R14: ffff8d61ebd88800 R15: ffff8d61ebd88a90 FS: 0000000000000000(0000) GS:ffff8d64ed400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa46a8b1000 CR3: 0000000148d18003 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __btrfs_free_extent+0x516/0x950 [btrfs] __btrfs_run_delayed_refs+0x2b1/0x1250 [btrfs] btrfs_run_delayed_refs+0x86/0x210 [btrfs] flush_space+0x403/0x630 [btrfs] ? call_rcu_tasks_generic+0x50/0x80 ? lock_release+0x223/0x4a0 ? btrfs_get_alloc_profile+0xb5/0x290 [btrfs] ? do_raw_spin_unlock+0x4b/0xa0 btrfs_async_reclaim_metadata_space+0x139/0x320 [btrfs] process_one_work+0x24c/0x5b0 worker_thread+0x55/0x3c0 ? process_one_work+0x5b0/0x5b0 kthread+0x17c/0x1a0 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 There's a couple of reasons for this, but in generic/619's case the largest reason is because it is a very small file system, ad we do not reserve enough space for the global reserve. With the free space tree we now have the free space tree that we need to modify when running delayed refs. This means we need the global reserve to take this into account when it calculates the minimum size it needs to be. This is especially important for very small file systems. Fix this by adjusting the minimum global block rsv size math to include the size of the free space tree when calculating the size. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: scrub: merge SCRUB_PAGES_PER_RD_BIO and SCRUB_PAGES_PER_WR_BIOQu Wenruo
These two values were introduced in commit ff023aac3119 ("Btrfs: add code to scrub to copy read data to another disk") as an optimization. But the truth is, block layer scheduler can do whatever it wants to merge/split bios to improve performance. Doing such "optimization" is not really going to affect much, especially considering how good current block layer optimizations are doing. Remove such old and immature optimization from our code. Since we're here, also change BUG_ON()s using these two macros to use ASSERT()s. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: update SCRUB_MAX_PAGES_PER_BLOCKQu Wenruo
Use BTRFS_MAX_METADATA_BLOCKSIZE and SZ_4K (minimal sectorsize) to calculate this value. And remove one stale comment on the value, in fact with recent subpage support, BTRFS_MAX_METADATA_BLOCKSIZE * PAGE_SIZE is already beyond BTRFS_STRIPE_LEN, just we don't use the full page. Also since we're here, update the BUG_ON() related to SCRUB_MAX_PAGES_PER_BLOCK to ASSERT(). As those ASSERT() are really only for developers to catch early obvious bugs, not to let end users suffer. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: do not check -EAGAIN when truncating inodes in the log rootJosef Bacik
We only throttle the btrfs_truncate_inode_items if the root is SHAREABLE, which isn't set on the log root, which means this loop is unnecessary. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: make should_throttle loop local in btrfs_truncate_inode_itemsJosef Bacik
We reset this bool on every loop through the truncate loop, make this variable local to the loop. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: combine extra if statements in btrfs_truncate_inode_itemsJosef Bacik
We have if (del_item) // do something else // something else if (del_item) // do yet another thing else // something else entirely back to back in btrfs_truncate_inode_items, collapse these two sets of if statements into one. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: convert BUG() for pending_del_nr into an ASSERTJosef Bacik
This is a logic correctness check, convert it into an ASSERT() instead of a BUG(). Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: convert BUG_ON() in btrfs_truncate_inode_items to ASSERTJosef Bacik
We have a correctness BUG_ON() in btrfs_truncate_inode_items to make sure that we're always using min_type == BTRFS_EXTENT_DATA_KEY if new_size is > 0. Convert this to an ASSERT. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: add inode to truncate controlJosef Bacik
In the future we're going to want to use btrfs_truncate_inode_items without looking up the associated inode. In order to accommodate this add the inode to btrfs_truncate_control and handle the case where control->inode is NULL appropriately. This is fairly straightforward, we simply need to add a helper for the trace points, as the file extent map update is controlled by a flag on btrfs_truncate_control. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: pass the ino via truncate controlJosef Bacik
In the future we are going to want to truncate inode items without needing to have an btrfs_inode to pass in, so add ino to the btrfs_truncate_control and use that to look up the inode items to truncate. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: use a flag to control when to clear the file extent rangeJosef Bacik
We only care about updating the file extent range when we are doing a normal truncation. We skip this for tree logging currently, but we can also skip this for eviction as well. Using a flag makes it more explicit when we want to do this work. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: control extent reference updates with a control flag for truncateJosef Bacik
We've had weird bugs in the past where we forgot to adjust the truncate path to deal with the fact that we can be called by the tree log path. Instead of checking if our root is a LOG_ROOT use a flag on the btrfs_truncate_control to indicate that we don't want to do extent reference updates during this truncate. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: only call inode_sub_bytes in truncate paths that careJosef Bacik
We currently have a bunch of awkward checks to make sure we only update the inode i_bytes if we're truncating the real inode. Instead keep track of the number of bytes we need to sub in the btrfs_truncate_control, and then do the appropriate adjustment in the truncate paths that care. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>