summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-15m68k/nommu: stop using GENERIC_IOMAPArnd Bergmann
There is no need to go through the GENERIC_IOMAP wrapper for PIO on nommu platforms, since these always come from PCI I/O space that is itself memory mapped. Instead, the generic ioport_map() can just return the MMIO location of the ports directly by applying the PCI_IO_PA offset, while ioread32/iowrite32 trivially turn into readl/writel as they do on most other architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-15mips: drop GENERIC_IOMAP wrapperArnd Bergmann
All PIO on MIPS platforms is memory mapped, so there is no benefit in the lib/iomap.c wrappers that switch between inb/outb and readb/writeb style accessses. In fact, the '#define PIO_RESERVED 0' setting completely disables the GENERIC_IOMAP functionality, and the '#define PIO_OFFSET mips_io_port_base' setting is based on a misunderstanding of what the offset is meant to do. MIPS started using GENERIC_IOMAP in 2018 with commit b962aeb02205 ("MIPS: Use GENERIC_IOMAP") replacing a simple custom implementation of the same interfaces, but at the time the asm-generic/io.h version was not usable yet. Since the header is now always included, it's now possible to go back to the even simpler version. Use the normal GENERIC_PCI_IOMAP functionality for all mips platforms without the hacky GENERIC_IOMAP, and provide a custom pci_iounmap() for the CONFIG_PCI_DRIVERS_LEGACY case to ensure the I/O port base never gets unmapped. The readsl() prototype needs an extra 'const' keyword to make it compatible with the generic ioread32_rep() alias. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-15Revert "sched/core: Reduce cost of sched_move_task when config autogroup"Dietmar Eggemann
This reverts commit eff6c8ce8d4d7faef75f66614dd20bb50595d261. Hazem reported a 30% drop in UnixBench spawn test with commit eff6c8ce8d4d ("sched/core: Reduce cost of sched_move_task when config autogroup") on a m6g.xlarge AWS EC2 instance with 4 vCPUs and 16 GiB RAM (aarch64) (single level MC sched domain): https://lkml.kernel.org/r/20250205151026.13061-1-hagarhem@amazon.com There is an early bail from sched_move_task() if p->sched_task_group is equal to p's 'cpu cgroup' (sched_get_task_group()). E.g. both are pointing to taskgroup '/user.slice/user-1000.slice/session-1.scope' (Ubuntu '22.04.5 LTS'). So in: do_exit() sched_autogroup_exit_task() sched_move_task() if sched_get_task_group(p) == p->sched_task_group return /* p is enqueued */ dequeue_task() \ sched_change_group() | task_change_group_fair() | detach_task_cfs_rq() | (1) set_task_rq() | attach_task_cfs_rq() | enqueue_task() / (1) isn't called for p anymore. Turns out that the regression is related to sgs->group_util in group_is_overloaded() and group_has_capacity(). If (1) isn't called for all the 'spawn' tasks then sgs->group_util is ~900 and sgs->group_capacity = 1024 (single CPU sched domain) and this leads to group_is_overloaded() returning true (2) and group_has_capacity() false (3) much more often compared to the case when (1) is called. I.e. there are much more cases of 'group_is_overloaded' and 'group_fully_busy' in WF_FORK wakeup sched_balance_find_dst_cpu() which then returns much more often a CPU != smp_processor_id() (5). This isn't good for these extremely short running tasks (FORK + EXIT) and also involves calling sched_balance_find_dst_group_cpu() unnecessary (single CPU sched domain). Instead if (1) is called for 'p->flags & PF_EXITING' then the path (4),(6) is taken much more often. select_task_rq_fair(..., wake_flags = WF_FORK) cpu = smp_processor_id() new_cpu = sched_balance_find_dst_cpu(..., cpu, ...) group = sched_balance_find_dst_group(..., cpu) do { update_sg_wakeup_stats() sgs->group_type = group_classify() if group_is_overloaded() (2) return group_overloaded if !group_has_capacity() (3) return group_fully_busy return group_has_spare (4) } while group if local_sgs.group_type > idlest_sgs.group_type return idlest (5) case group_has_spare: if local_sgs.idle_cpus >= idlest_sgs.idle_cpus return NULL (6) Unixbench Tests './Run -c 4 spawn' on: (a) VM AWS instance (m7gd.16xlarge) with v6.13 ('maxcpus=4 nr_cpus=4') and Ubuntu 22.04.5 LTS (aarch64). Shell & test run in '/user.slice/user-1000.slice/session-1.scope'. w/o patch w/ patch 21005 27120 (b) i7-13700K with tip/sched/core ('nosmt maxcpus=8 nr_cpus=8') and Ubuntu 22.04.5 LTS (x86_64). Shell & test run in '/A'. w/o patch w/ patch 67675 88806 CONFIG_SCHED_AUTOGROUP=y & /sys/proc/kernel/sched_autogroup_enabled equal 0 or 1. Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Hagar Hemdan <hagarhem@amazon.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250314151345.275739-1-dietmar.eggemann@arm.com
2025-03-15sched/uclamp: Optimize sched_uclamp_used static key enablingXuewen Yan
Repeat calls of static_branch_enable() to an already enabled static key introduce overhead, because it calls cpus_read_lock(). Users may frequently set the uclamp value of tasks, triggering the repeat enabling of the sched_uclamp_used static key. Optimize this and avoid repeat calls to static_branch_enable() by checking whether it's enabled already. [ mingo: Rewrote the changelog for legibility ] Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20250219093747.2612-2-xuewen.yan@unisoc.com
2025-03-15sched/uclamp: Use the uclamp_is_used() helper instead of open-coding itXuewen Yan
Don't open-code static_branch_unlikely(&sched_uclamp_used), we have the uclamp_is_used() wrapper around it. [ mingo: Clean up the changelog ] Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20250219093747.2612-1-xuewen.yan@unisoc.com
2025-03-15Merge tag 'i2c-host-fixes-6.14-rc7' of ↵Wolfram Sang
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current i2c-host-fixes for v6.14-rc7 - omap: fixed irq ACKS to avoid irq storming and system hang. - ali1535, ali15x3, sis630: fixed error path at probe exit.
2025-03-15crypto: testmgr - Remove NULL dst acomp testsHerbert Xu
In preparation for the partial removal of NULL dst acomp support, remove the tests for them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: acomp - Add request chaining and virtual addressesHerbert Xu
This adds request chaining and virtual address support to the acomp interface. It is identical to the ahash interface, except that a new flag CRYPTO_ACOMP_REQ_NONDMA has been added to indicate that the virtual addresses are not suitable for DMA. This is because all existing and potential acomp users can provide memory that is suitable for DMA so there is no need for a fall-back copy path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: scomp - Disable BH when taking per-cpu spin lockHerbert Xu
Disable BH when taking per-cpu spin locks. This isn't an issue right now because the only user zswap calls scomp from process context. However, if scomp is called from softirq context the spin lock may dead-lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: acomp - Move stream management into scomp layerHerbert Xu
Rather than allocating the stream memory in the request object, move it into a per-cpu buffer managed by scomp. This takes the stress off the user from having to manage large request objects and setting up their own per-cpu buffers in order to do so. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: scomp - Remove tfm argument from alloc/free_ctxHerbert Xu
The tfm argument is completely unused and meaningless as the same stream object is identical over all transforms of a given algorithm. Remove it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: api - Add cra_type->destroy hookHerbert Xu
Add a cra_type->destroy hook so that resources can be freed after the last user of a registered algorithm is gone. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: artpec6 - change from kzalloc to kcalloc in artpec6_crypto_probe()Ethan Carter Edwards
We are trying to get rid of all multiplications from allocation functions to prevent potential integer overflows. Here the multiplication is probably safe, but using kcalloc() is more appropriate and improves readability. This patch has no effect on runtime behavior. Link: https://github.com/KSPP/linux/issues/162 [1] Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: skcipher - Make skcipher_walk src.virt.addr constHerbert Xu
Mark the src.virt.addr field in struct skcipher_walk as a pointer to const data. This guarantees that the user won't modify the data which should be done through dst.virt.addr to ensure that flushing is done when necessary. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: skcipher - Eliminate duplicate virt.addr fieldHerbert Xu
Reuse the addr field from struct scatter_walk for skcipher_walk. Keep the existing virt.addr fields but make them const for the user to access the mapped address. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: scatterwalk - Add memcpy_sglistHerbert Xu
Add memcpy_sglist which copies one SG list to another. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: scatterwalk - Change scatterwalk_next calling conventionHerbert Xu
Rather than returning the address and storing the length into an argument pointer, add an address field to the walk struct and use that to store the address. The length is returned directly. Change the done functions to use this stored address instead of getting them from the caller. Split the address into two using a union. The user should only access the const version so that it is never changed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: ccp - Fix uAPI definitions of PSP errorsDionna Glaze
Additions to the error enum after explicit 0x27 setting for SEV_RET_INVALID_KEY leads to incorrect value assignments. Use explicit values to match the manufacturer specifications more clearly. Fixes: 3a45dc2b419e ("crypto: ccp: Define the SEV-SNP commands") CC: stable@vger.kernel.org Signed-off-by: Dionna Glaze <dionnaglaze@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15dt-bindings: rng: rockchip,rk3588-rng: Drop unnecessary status from exampleKrzysztof Kozlowski
Device nodes are enabled by default, so no need for 'status = "okay"' in the DTS example. Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15MAINTAINERS: Add Lukas & Ignat & Stefan for asymmetric keysLukas Wunner
Herbert asks for long-term maintenance of everything under crypto/asymmetric_keys/ and associated algorithms (ECDSA, GOST, RSA) [1]. Ignat has kindly agreed to co-maintain this with me going forward. Stefan has agreed to be added as reviewer for ECDSA. He introduced it in 2021 and has been meticulously providing reviews for 3rd party patches anyway. Retain David Howells' maintainer entry until he explicitly requests to be removed. He originally introduced asymmetric keys in 2012. RSA was introduced by Tadeusz Struk as an employee of Intel in 2015, but he's changed jobs and last contributed to the implementation in 2016. GOST was introduced by Vitaly Chikunov as an employee of Basealt LLC [2] (Базальт СПО [3]) in 2019. This company is an OFAC sanctioned entity [4][5], which makes employees ineligible as maintainer [6]. It's not clear if Vitaly is still working for Basealt, he did not immediately respond to my e-mail. Since knowledge and use of GOST algorithms is relatively limited outside the Russian Federation, assign "Odd fixes" status for now. [1] https://lore.kernel.org/r/Z8QNJqQKhyyft_gz@gondor.apana.org.au/ [2] https://prohoster.info/ru/blog/novosti-interneta/reliz-yadra-linux-5-2 [3] https://www.basealt.ru/ [4] https://ofac.treasury.gov/recent-actions/20240823 [5] https://sanctionssearch.ofac.treas.gov/Details.aspx?id=50178 [6] https://lore.kernel.org/r/7ee74c1b5b589619a13c6318c9fbd0d6ac7c334a.camel@HansenPartnership.com/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: octeontx2 - suppress auth failure screaming due to negative testsShashank Gupta
This patch addresses an issue where authentication failures were being erroneously reported due to negative test failures in the "ccm(aes)" selftest. pr_debug suppress unnecessary screaming of these tests. Signed-off-by: Shashank Gupta <shashankg@marvell.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15MAINTAINERS: add myself to co-maintain ZSTDDavid Sterba
Recently the ZSTD tree has been removed from linux-next due to lack of updates for last year [1]. As there are several users of ZSTD in kernel we need to keep the code maintained and updated. I'll act as a backup to get the ZSTD upstream to linux, Nick is OK with that [2]. [1] https://lore.kernel.org/all/20250216224200.50b9dd6a@canb.auug.org.au/ [2] https://github.com/facebook/zstd/issues/4262#issuecomment-2691527952 CC: Nick Terrell <terrelln@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Nick Terrell <terrelln@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: virtio - Erase some sensitive memory when it is freedChristophe JAILLET
virtcrypto_clear_request() does the same as the code here, but uses kfree_sensitive() for one of the free operation. So, better safe than sorry, use virtcrypto_clear_request() directly to save a few lines of code and cleanly free the memory. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Lei Yang <leiyang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15async_xor: Remove unused 'async_xor_val'Dr. David Alan Gilbert
async_xor_val has been unused since commit a7c224a820c3 ("md/raid5: convert to new xor compution interface") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-14Merge tag 'v6.14-rc6-smb3-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Two fixes for oplock break/lease races * tag 'v6.14-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: prevent connection release during oplock break notification ksmbd: fix use-after-free in ksmbd_free_work_struct
2025-03-14perf test: Add pipe output testing for annotateIan Rogers
Parameterize the basic testing to generate directly a perf.data file or to generate/use one from pipe input or output. To simplify the refactor move some of the head/grep logic around. Use "-q" with grep to make the test output cleaner. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250311211635.541090-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-14perf test: Fixes to variable expansion and stdout for diff testIan Rogers
When make_data fails its error message needs to go to stderr rather than stdout and the stdout value is captured in a variable. Quote the $err value so that it is always a valid input for test. This error is commonly encountered if no sample data is gathered by the test. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312001841.1515779-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-14perf libunwind: Fixup conversion perf_sample->user_regs to a pointerArnaldo Carvalho de Melo
The dc6d2bc2d893a878 ("perf sample: Make user_regs and intr_regs optional") misses the changes to a file, resulting in this problem: $ make LIBUNWIND=1 -C tools/perf O=/tmp/build/perf-tools-next install-bin <SNIP> CC /tmp/build/perf-tools-next/util/unwind-libunwind-local.o CC /tmp/build/perf-tools-next/util/unwind-libunwind.o <SNIP> util/unwind-libunwind-local.c: In function ‘access_mem’: util/unwind-libunwind-local.c:582:56: error: ‘ui->sample->user_regs’ is a pointer; did you mean to use ‘->’? 582 | if (__write || !stack || !ui->sample->user_regs.regs) { | ^ | -> util/unwind-libunwind-local.c:587:38: error: passing argument 2 of ‘perf_reg_value’ from incompatible pointer type [-Wincompatible-pointer-types] 587 | ret = perf_reg_value(&start, &ui->sample->user_regs, | ^~~~~~~~~~~~~~~~~~~~~~ | | | struct regs_dump ** <SNIP> ⬢ [acme@toolbox perf-tools-next]$ git bisect bad dc6d2bc2d893a878e7b58578ff01b4738708deb4 is the first bad commit commit dc6d2bc2d893a878e7b58578ff01b4738708deb4 (HEAD) Author: Ian Rogers <irogers@google.com> Date: Mon Jan 13 11:43:45 2025 -0800 perf sample: Make user_regs and intr_regs optional Detected using: make -C tools/perf build-test Fixes: dc6d2bc2d893a878 ("perf sample: Make user_regs and intr_regs optional") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250313033121.758978-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-14bcachefs: Kill bch2_remount()Kent Overstreet
Single caller, so inline it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Kill a bit of dead codeKent Overstreet
Found with CC=clang W=1 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Use max() to improve gen_after()Thorsten Blum
Use max() to simplify gen_after() and improve its readability. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Remove unnecessary byte allocationThorsten Blum
The extra byte is not used - remove it. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: We no longer read stripes into memory at startupKent Overstreet
And the stripes heap gets deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: trace_stripe_createKent Overstreet
Add a simple tracepoint for stripe creation, we'll want to expand this later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: get_existing_stripe() uses new stripe lruKent Overstreet
Convert to the new persistent stripe LRU. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: ec_stripe_delete() uses new stripe lruKent Overstreet
Convert to the new persistent stripe LRU. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: journal write path commentKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Kick devices out after too many write IO errorsKent Overstreet
We're improving our handling of write errors - we shouldn't write degraded data just because a write failed once, we should retry it (on other devices, if possible). But for this to work, we need to kick devices out when they're only returning errors - otherwise those retries will loop infinitely. This adds a configurable timeout - if writes are failing for too long, we'll set that device read-only. In the future we should also implement more tracking and another knob for an "allowed error rate", so that we can kick out drives that are acting "unhealthy". Another thing we'll want is a mechanism (likely in userspace) for bringing a device back in after a transient error - perhaps a cable was jiggled, or there was a controller reset. After transient errors we also need a mechanism to walk (from the journal) recent btree updates that weren't flushed to that device and treat them as "degraded", since unflushed data may well not have been written. Out of scope for this patch, but becoming relevant. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Change BCH_MEMBER_STATE_failed semanticsKent Overstreet
Previously, we woudn't try to read at all from a failed device - that doesn't make much sense, the device may be unhealthy (perhaps taking longer than it should to service reads), but if it's our only option we should still try to read from it. Now, bch2_bkey_pick_read_device() will pick failed devices only if there are no non-failed replicas to read from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: bch2_dev_get_ioref() may now sleepKent Overstreet
The next patch implementing freezing will change bch2_dev_get_ioref() to sleep if a device is currently frozen. Add an annotation and fix the journal code accordingly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Fix btree_node_scan io_ref handlingKent Overstreet
This was completely fubar; it's now simplified a bit as well. Note that for_each_online_member() takes and releases io_refs as it iterates, so we need to release that if we break. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Implement blk_holder_opsKent Overstreet
We can't use the standard fs_holder_ops because they're meant for single device filesystems - fs_bdev_mark_dead() in particular - and they assume that the blk_holder is the super_block, which also doesn't work for a multi device filesystem. These generally follow the standard fs_holder_ops; the locking/refcounting is a bit simplified because c->ro_ref suffices, and bch2_fs_bdev_mark_dead() is not necessarily shutting down the entire filesystem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Make sure c->vfs_sb is set before starting fsKent Overstreet
This is necessary for the new blk_holder_ops, which want the vfs super_block available for synchronization. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Stash a pointer to the filesystem for blk_holder_opsKent Overstreet
Note that we open block devices before we allocate bch_fs, but once attached to a filesystem they will be closed before the bch_fs is torn down - so stashing a pointer without a refcount looks incorrect but it's not. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Finish bch2_account_io_completion() conversionsKent Overstreet
More prep work for automatically kicking devices out after too many IO errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: bch2_account_io_completion()Kent Overstreet
We need to start accounting successes for every IO, not just failures, so introduce a unified hook for io completion accounting and convert io_read.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Fix read path io_ref handlingKent Overstreet
We were using our device pointer after we'd released our ref to it. Unlikely to be a race that's practical to hit, since actually removing a member device is a whole process besides just taking it offline, but - needs to be fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: data_update now checks for extents that can't be movedKent Overstreet
If a device is ro or failed, we might not have anywhere to move a replica. Check for this early, before doing the read and attempting to write. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: give bch2_write_super() a proper error codeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: bcachefs_metadata_version_extent_flagsKent Overstreet
This implements a new extent field bitflags that apply to the whole extent. There's been a couple things we've wanted this for in the past, but the immediate need is extent poisoning, to solve a rebalance issue. Unknown extent fields can't be parsed (we won't known their size, so we can't advance to the next field), so this is an incompat feature, and using it prevents the filesystem from being mounted by old versions. This also adds the BCH_EXTENT_poisoned flag; this indicates that the data is known to be bad (i.e. there was a checksum error, and we had to write a new checksum) and reads will return errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>