Age | Commit message (Collapse) | Author |
|
There is no need to go through the GENERIC_IOMAP wrapper for PIO on
nommu platforms, since these always come from PCI I/O space that is
itself memory mapped.
Instead, the generic ioport_map() can just return the MMIO location
of the ports directly by applying the PCI_IO_PA offset, while
ioread32/iowrite32 trivially turn into readl/writel as they do
on most other architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
All PIO on MIPS platforms is memory mapped, so there is no benefit in
the lib/iomap.c wrappers that switch between inb/outb and readb/writeb
style accessses.
In fact, the '#define PIO_RESERVED 0' setting completely disables
the GENERIC_IOMAP functionality, and the '#define PIO_OFFSET
mips_io_port_base' setting is based on a misunderstanding of what the
offset is meant to do.
MIPS started using GENERIC_IOMAP in 2018 with commit b962aeb02205 ("MIPS:
Use GENERIC_IOMAP") replacing a simple custom implementation of the same
interfaces, but at the time the asm-generic/io.h version was not usable
yet. Since the header is now always included, it's now possible to go
back to the even simpler version.
Use the normal GENERIC_PCI_IOMAP functionality for all mips platforms
without the hacky GENERIC_IOMAP, and provide a custom pci_iounmap()
for the CONFIG_PCI_DRIVERS_LEGACY case to ensure the I/O port base never
gets unmapped.
The readsl() prototype needs an extra 'const' keyword to make it
compatible with the generic ioread32_rep() alias.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
This reverts commit eff6c8ce8d4d7faef75f66614dd20bb50595d261.
Hazem reported a 30% drop in UnixBench spawn test with commit
eff6c8ce8d4d ("sched/core: Reduce cost of sched_move_task when config
autogroup") on a m6g.xlarge AWS EC2 instance with 4 vCPUs and 16 GiB RAM
(aarch64) (single level MC sched domain):
https://lkml.kernel.org/r/20250205151026.13061-1-hagarhem@amazon.com
There is an early bail from sched_move_task() if p->sched_task_group is
equal to p's 'cpu cgroup' (sched_get_task_group()). E.g. both are
pointing to taskgroup '/user.slice/user-1000.slice/session-1.scope'
(Ubuntu '22.04.5 LTS').
So in:
do_exit()
sched_autogroup_exit_task()
sched_move_task()
if sched_get_task_group(p) == p->sched_task_group
return
/* p is enqueued */
dequeue_task() \
sched_change_group() |
task_change_group_fair() |
detach_task_cfs_rq() | (1)
set_task_rq() |
attach_task_cfs_rq() |
enqueue_task() /
(1) isn't called for p anymore.
Turns out that the regression is related to sgs->group_util in
group_is_overloaded() and group_has_capacity(). If (1) isn't called for
all the 'spawn' tasks then sgs->group_util is ~900 and
sgs->group_capacity = 1024 (single CPU sched domain) and this leads to
group_is_overloaded() returning true (2) and group_has_capacity() false
(3) much more often compared to the case when (1) is called.
I.e. there are much more cases of 'group_is_overloaded' and
'group_fully_busy' in WF_FORK wakeup sched_balance_find_dst_cpu() which
then returns much more often a CPU != smp_processor_id() (5).
This isn't good for these extremely short running tasks (FORK + EXIT)
and also involves calling sched_balance_find_dst_group_cpu() unnecessary
(single CPU sched domain).
Instead if (1) is called for 'p->flags & PF_EXITING' then the path
(4),(6) is taken much more often.
select_task_rq_fair(..., wake_flags = WF_FORK)
cpu = smp_processor_id()
new_cpu = sched_balance_find_dst_cpu(..., cpu, ...)
group = sched_balance_find_dst_group(..., cpu)
do {
update_sg_wakeup_stats()
sgs->group_type = group_classify()
if group_is_overloaded() (2)
return group_overloaded
if !group_has_capacity() (3)
return group_fully_busy
return group_has_spare (4)
} while group
if local_sgs.group_type > idlest_sgs.group_type
return idlest (5)
case group_has_spare:
if local_sgs.idle_cpus >= idlest_sgs.idle_cpus
return NULL (6)
Unixbench Tests './Run -c 4 spawn' on:
(a) VM AWS instance (m7gd.16xlarge) with v6.13 ('maxcpus=4 nr_cpus=4')
and Ubuntu 22.04.5 LTS (aarch64).
Shell & test run in '/user.slice/user-1000.slice/session-1.scope'.
w/o patch w/ patch
21005 27120
(b) i7-13700K with tip/sched/core ('nosmt maxcpus=8 nr_cpus=8') and
Ubuntu 22.04.5 LTS (x86_64).
Shell & test run in '/A'.
w/o patch w/ patch
67675 88806
CONFIG_SCHED_AUTOGROUP=y & /sys/proc/kernel/sched_autogroup_enabled equal
0 or 1.
Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Hagar Hemdan <hagarhem@amazon.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250314151345.275739-1-dietmar.eggemann@arm.com
|
|
Repeat calls of static_branch_enable() to an already enabled
static key introduce overhead, because it calls cpus_read_lock().
Users may frequently set the uclamp value of tasks, triggering
the repeat enabling of the sched_uclamp_used static key.
Optimize this and avoid repeat calls to static_branch_enable()
by checking whether it's enabled already.
[ mingo: Rewrote the changelog for legibility ]
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20250219093747.2612-2-xuewen.yan@unisoc.com
|
|
Don't open-code static_branch_unlikely(&sched_uclamp_used), we have
the uclamp_is_used() wrapper around it.
[ mingo: Clean up the changelog ]
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20250219093747.2612-1-xuewen.yan@unisoc.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.14-rc7
- omap: fixed irq ACKS to avoid irq storming and system hang.
- ali1535, ali15x3, sis630: fixed error path at probe exit.
|
|
In preparation for the partial removal of NULL dst acomp support,
remove the tests for them.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
This adds request chaining and virtual address support to the
acomp interface.
It is identical to the ahash interface, except that a new flag
CRYPTO_ACOMP_REQ_NONDMA has been added to indicate that the
virtual addresses are not suitable for DMA. This is because
all existing and potential acomp users can provide memory that
is suitable for DMA so there is no need for a fall-back copy
path.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Disable BH when taking per-cpu spin locks. This isn't an issue
right now because the only user zswap calls scomp from process
context. However, if scomp is called from softirq context the
spin lock may dead-lock.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Rather than allocating the stream memory in the request object,
move it into a per-cpu buffer managed by scomp. This takes the
stress off the user from having to manage large request objects
and setting up their own per-cpu buffers in order to do so.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The tfm argument is completely unused and meaningless as the
same stream object is identical over all transforms of a given
algorithm. Remove it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add a cra_type->destroy hook so that resources can be freed after
the last user of a registered algorithm is gone.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
We are trying to get rid of all multiplications from allocation
functions to prevent potential integer overflows. Here the
multiplication is probably safe, but using kcalloc() is more
appropriate and improves readability. This patch has no effect
on runtime behavior.
Link: https://github.com/KSPP/linux/issues/162 [1]
Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Mark the src.virt.addr field in struct skcipher_walk as a pointer
to const data. This guarantees that the user won't modify the data
which should be done through dst.virt.addr to ensure that flushing
is done when necessary.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Reuse the addr field from struct scatter_walk for skcipher_walk.
Keep the existing virt.addr fields but make them const for the
user to access the mapped address.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add memcpy_sglist which copies one SG list to another.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Rather than returning the address and storing the length into an
argument pointer, add an address field to the walk struct and use
that to store the address. The length is returned directly.
Change the done functions to use this stored address instead of
getting them from the caller.
Split the address into two using a union. The user should only
access the const version so that it is never changed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Additions to the error enum after explicit 0x27 setting for
SEV_RET_INVALID_KEY leads to incorrect value assignments.
Use explicit values to match the manufacturer specifications more
clearly.
Fixes: 3a45dc2b419e ("crypto: ccp: Define the SEV-SNP commands")
CC: stable@vger.kernel.org
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Device nodes are enabled by default, so no need for 'status = "okay"' in
the DTS example.
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Herbert asks for long-term maintenance of everything under
crypto/asymmetric_keys/ and associated algorithms (ECDSA, GOST, RSA) [1].
Ignat has kindly agreed to co-maintain this with me going forward.
Stefan has agreed to be added as reviewer for ECDSA. He introduced it
in 2021 and has been meticulously providing reviews for 3rd party
patches anyway.
Retain David Howells' maintainer entry until he explicitly requests to
be removed. He originally introduced asymmetric keys in 2012.
RSA was introduced by Tadeusz Struk as an employee of Intel in 2015,
but he's changed jobs and last contributed to the implementation in 2016.
GOST was introduced by Vitaly Chikunov as an employee of Basealt LLC [2]
(Базальт СПО [3]) in 2019. This company is an OFAC sanctioned entity
[4][5], which makes employees ineligible as maintainer [6]. It's not
clear if Vitaly is still working for Basealt, he did not immediately
respond to my e-mail. Since knowledge and use of GOST algorithms is
relatively limited outside the Russian Federation, assign "Odd fixes"
status for now.
[1] https://lore.kernel.org/r/Z8QNJqQKhyyft_gz@gondor.apana.org.au/
[2] https://prohoster.info/ru/blog/novosti-interneta/reliz-yadra-linux-5-2
[3] https://www.basealt.ru/
[4] https://ofac.treasury.gov/recent-actions/20240823
[5] https://sanctionssearch.ofac.treas.gov/Details.aspx?id=50178
[6] https://lore.kernel.org/r/7ee74c1b5b589619a13c6318c9fbd0d6ac7c334a.camel@HansenPartnership.com/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
This patch addresses an issue where authentication failures were being
erroneously reported due to negative test failures in the "ccm(aes)"
selftest.
pr_debug suppress unnecessary screaming of these tests.
Signed-off-by: Shashank Gupta <shashankg@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Recently the ZSTD tree has been removed from linux-next due to lack of
updates for last year [1]. As there are several users of ZSTD in kernel
we need to keep the code maintained and updated. I'll act as a backup to
get the ZSTD upstream to linux, Nick is OK with that [2].
[1] https://lore.kernel.org/all/20250216224200.50b9dd6a@canb.auug.org.au/
[2] https://github.com/facebook/zstd/issues/4262#issuecomment-2691527952
CC: Nick Terrell <terrelln@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
virtcrypto_clear_request() does the same as the code here, but uses
kfree_sensitive() for one of the free operation.
So, better safe than sorry, use virtcrypto_clear_request() directly to
save a few lines of code and cleanly free the memory.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
async_xor_val has been unused since commit
a7c224a820c3 ("md/raid5: convert to new xor compution interface")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Pull smb server fixes from Steve French:
- Two fixes for oplock break/lease races
* tag 'v6.14-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: prevent connection release during oplock break notification
ksmbd: fix use-after-free in ksmbd_free_work_struct
|
|
Parameterize the basic testing to generate directly a perf.data file
or to generate/use one from pipe input or output. To simplify the
refactor move some of the head/grep logic around. Use "-q" with grep
to make the test output cleaner.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250311211635.541090-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When make_data fails its error message needs to go to stderr rather
than stdout and the stdout value is captured in a variable. Quote the
$err value so that it is always a valid input for test. This error is
commonly encountered if no sample data is gathered by the test.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250312001841.1515779-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The dc6d2bc2d893a878 ("perf sample: Make user_regs and intr_regs optional") misses
the changes to a file, resulting in this problem:
$ make LIBUNWIND=1 -C tools/perf O=/tmp/build/perf-tools-next install-bin
<SNIP>
CC /tmp/build/perf-tools-next/util/unwind-libunwind-local.o
CC /tmp/build/perf-tools-next/util/unwind-libunwind.o
<SNIP>
util/unwind-libunwind-local.c: In function ‘access_mem’:
util/unwind-libunwind-local.c:582:56: error: ‘ui->sample->user_regs’ is a pointer; did you mean to use ‘->’?
582 | if (__write || !stack || !ui->sample->user_regs.regs) {
| ^
| ->
util/unwind-libunwind-local.c:587:38: error: passing argument 2 of ‘perf_reg_value’ from incompatible pointer type [-Wincompatible-pointer-types]
587 | ret = perf_reg_value(&start, &ui->sample->user_regs,
| ^~~~~~~~~~~~~~~~~~~~~~
| |
| struct regs_dump **
<SNIP>
⬢ [acme@toolbox perf-tools-next]$ git bisect bad
dc6d2bc2d893a878e7b58578ff01b4738708deb4 is the first bad commit
commit dc6d2bc2d893a878e7b58578ff01b4738708deb4 (HEAD)
Author: Ian Rogers <irogers@google.com>
Date: Mon Jan 13 11:43:45 2025 -0800
perf sample: Make user_regs and intr_regs optional
Detected using:
make -C tools/perf build-test
Fixes: dc6d2bc2d893a878 ("perf sample: Make user_regs and intr_regs optional")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250313033121.758978-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Single caller, so inline it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Found with CC=clang W=1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Use max() to simplify gen_after() and improve its readability.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The extra byte is not used - remove it.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
And the stripes heap gets deleted.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a simple tracepoint for stripe creation, we'll want to expand this
later.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Convert to the new persistent stripe LRU.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Convert to the new persistent stripe LRU.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We're improving our handling of write errors - we shouldn't write
degraded data just because a write failed once, we should retry it (on
other devices, if possible).
But for this to work, we need to kick devices out when they're only
returning errors - otherwise those retries will loop infinitely.
This adds a configurable timeout - if writes are failing for too long,
we'll set that device read-only.
In the future we should also implement more tracking and another knob
for an "allowed error rate", so that we can kick out drives that are
acting "unhealthy".
Another thing we'll want is a mechanism (likely in userspace) for
bringing a device back in after a transient error - perhaps a cable was
jiggled, or there was a controller reset.
After transient errors we also need a mechanism to walk (from the
journal) recent btree updates that weren't flushed to that device and
treat them as "degraded", since unflushed data may well not have been
written. Out of scope for this patch, but becoming relevant.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, we woudn't try to read at all from a failed device - that
doesn't make much sense, the device may be unhealthy (perhaps taking
longer than it should to service reads), but if it's our only option we
should still try to read from it.
Now, bch2_bkey_pick_read_device() will pick failed devices only if there
are no non-failed replicas to read from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The next patch implementing freezing will change bch2_dev_get_ioref() to
sleep if a device is currently frozen.
Add an annotation and fix the journal code accordingly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This was completely fubar; it's now simplified a bit as well.
Note that for_each_online_member() takes and releases io_refs as it
iterates, so we need to release that if we break.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can't use the standard fs_holder_ops because they're meant for single
device filesystems - fs_bdev_mark_dead() in particular - and they assume
that the blk_holder is the super_block, which also doesn't work for a
multi device filesystem.
These generally follow the standard fs_holder_ops; the
locking/refcounting is a bit simplified because c->ro_ref suffices, and
bch2_fs_bdev_mark_dead() is not necessarily shutting down the entire
filesystem.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This is necessary for the new blk_holder_ops, which want the vfs
super_block available for synchronization.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Note that we open block devices before we allocate bch_fs, but once
attached to a filesystem they will be closed before the bch_fs is torn
down - so stashing a pointer without a refcount looks incorrect but it's
not.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More prep work for automatically kicking devices out after too many IO
errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We need to start accounting successes for every IO, not just failures,
so introduce a unified hook for io completion accounting and convert
io_read.c.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We were using our device pointer after we'd released our ref to it.
Unlikely to be a race that's practical to hit, since actually removing a
member device is a whole process besides just taking it offline, but -
needs to be fixed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If a device is ro or failed, we might not have anywhere to move a
replica.
Check for this early, before doing the read and attempting to write.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This implements a new extent field bitflags that apply to the whole
extent. There's been a couple things we've wanted this for in the past,
but the immediate need is extent poisoning, to solve a rebalance issue.
Unknown extent fields can't be parsed (we won't known their size, so we
can't advance to the next field), so this is an incompat feature, and
using it prevents the filesystem from being mounted by old versions.
This also adds the BCH_EXTENT_poisoned flag; this indicates that the
data is known to be bad (i.e. there was a checksum error, and we had to
write a new checksum) and reads will return errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|