Age | Commit message (Collapse) | Author |
|
The downstream patches add more functionality to irq_pool_affinity.
Move the irq_pool_affinity logic to a new file in order to ease the
coding and maintenance of it.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Move affinity binding of the IRQ to irq_request function in order to
bind the IRQ before inserting it to the xarray.
After this change, the IRQ is ready for use when inserted to the xarray.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Currently, IRQ layer have a separate flow for ctrl and comp IRQs, and
the distinction between ctrl and comp IRQs is done in the IRQ layer.
In order to ease the coding and maintenance of the IRQ layer,
introduce a new API for requesting control IRQs -
mlx5_ctrl_irq_request(struct mlx5_core_dev *dev).
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Callers of this functions ignore its return value, as reported by
Wang Qing, in one of the return paths, it returns positive values.
Since return value is ignored anyways, void out the return type of the
function.
Reported-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
At the moment, urandom_read() (used for /dev/urandom) resets crng_init_cnt
to zero when it is called at crng_init<2. This is inconsistent: We do it
for /dev/urandom reads, but not for the equivalent
getrandom(GRND_INSECURE).
(And worse, as Jason pointed out, we're only doing this as long as
maxwarn>0.)
crng_init_cnt is only read in crng_fast_load(); it is relevant at
crng_init==0 for determining when to switch to crng_init==1 (and where in
the RNG state array to write).
As far as I understand:
- crng_init==0 means "we have nothing, we might just be returning the same
exact numbers on every boot on every machine, we don't even have
non-cryptographic randomness; we should shove every bit of entropy we
can get into the RNG immediately"
- crng_init==1 means "well we have something, it might not be
cryptographic, but at least we're not gonna return the same data every
time or whatever, it's probably good enough for TCP and ASLR and stuff;
we now have time to build up actual cryptographic entropy in the input
pool"
- crng_init==2 means "this is supposed to be cryptographically secure now,
but we'll keep adding more entropy just to be sure".
The current code means that if someone is pulling data from /dev/urandom
fast enough at crng_init==0, we'll keep resetting crng_init_cnt, and we'll
never make forward progress to crng_init==1. It seems to be intended to
prevent an attacker from bruteforcing the contents of small individual RNG
inputs on the way from crng_init==0 to crng_init==1, but that's misguided;
crng_init==1 isn't supposed to provide proper cryptographic security
anyway, RNG users who care about getting secure RNG output have to wait
until crng_init==2.
This code was inconsistent, and it probably made things worse - just get
rid of it.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
RDRAND is not fast. RDRAND is actually quite slow. We've known this for
a while, which is why functions like get_random_u{32,64} were converted
to use batching of our ChaCha-based CRNG instead.
Yet CRNG extraction still includes a call to RDRAND, in the hot path of
every call to get_random_bytes(), /dev/urandom, and getrandom(2).
This call to RDRAND here seems quite superfluous. CRNG is already
extracting things based on a 256-bit key, based on good entropy, which
is then reseeded periodically, updated, backtrack-mutated, and so
forth. The CRNG extraction construction is something that we're already
relying on to be secure and solid. If it's not, that's a serious
problem, and it's unlikely that mixing in a measly 32 bits from RDRAND
is going to alleviate things.
And in the case where the CRNG doesn't have enough entropy yet, we're
already initializing the ChaCha key row with RDRAND in
crng_init_try_arch_early().
Removing the call to RDRAND improves performance on an i7-11850H by
370%. In other words, the vast majority of the work done by
extract_crng() prior to this commit was devoted to fetching 32 bits of
RDRAND.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Previously, the ChaCha constants for the primary pool were only
initialized in crng_initialize_primary(), called by rand_initialize().
However, some randomness is actually extracted from the primary pool
beforehand, e.g. by kmem_cache_create(). Therefore, statically
initialize the ChaCha constants for the primary pool.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <linux-crypto@vger.kernel.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Rather than an awkward combination of ifdefs and __maybe_unused, we can
ensure more source gets parsed, regardless of the configuration, by
using IS_ENABLED for the CONFIG_NUMA conditional code. This makes things
cleaner and easier to follow.
I've confirmed that on !CONFIG_NUMA, we don't wind up with excess code
by accident; the generated object file is the same.
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
We print out "crng init done" for !TRUST_CPU, so we should also print
out the same for TRUST_CPU.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
If we're trusting bootloader randomness, crng_fast_load() is called by
add_hwgenerator_randomness(), which sets us to crng_init==1. However,
usually it is only called once for an initial 64-byte push, so bootloader
entropy will not mix any bytes into the input pool. So it's conceivable
that crng_init==1 when crng_initialize_primary() is called later, but
then the input pool is empty. When that happens, the crng state key will
be overwritten with extracted output from the empty input pool. That's
bad.
In contrast, if we're not trusting bootloader randomness, we call
crng_slow_load() *and* we call mix_pool_bytes(), so that later
crng_initialize_primary() isn't drawing on nothing.
In order to prevent crng_initialize_primary() from extracting an empty
pool, have the trusted bootloader case mirror that of the untrusted
bootloader case, mixing the input into the pool.
[linux@dominikbrodowski.net: rewrite commit message]
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
When crng_fast_load() is called by add_hwgenerator_randomness(), we
currently will advance to crng_init==1 once we've acquired 64 bytes, and
then throw away the rest of the buffer. Usually, that is not a problem:
When add_hwgenerator_randomness() gets called via EFI or DT during
setup_arch(), there won't be any IRQ randomness. Therefore, the 64 bytes
passed by EFI exactly matches what is needed to advance to crng_init==1.
Usually, DT seems to pass 64 bytes as well -- with one notable exception
being kexec, which hands over 128 bytes of entropy to the kexec'd kernel.
In that case, we'll advance to crng_init==1 once 64 of those bytes are
consumed by crng_fast_load(), but won't continue onward feeding in bytes
to progress to crng_init==2. This commit fixes the issue by feeding
any leftover bytes into the next phase in add_hwgenerator_randomness().
[linux@dominikbrodowski.net: rewrite commit message]
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
If the bootloader supplies sufficient material and crng_reseed() is called
very early on, but not too early that wqs aren't available yet, then we
might transition to crng_init==2 before rand_initialize()'s call to
crng_initialize_primary() made. Then, when crng_initialize_primary() is
called, if we're trusting the CPU's RDRAND instructions, we'll
needlessly reinitialize the RNG and emit a message about it. This is
mostly harmless, as numa_crng_init() will allocate and then free what it
just allocated, and excessive calls to invalidate_batched_entropy()
aren't so harmful. But it is funky and the extra message is confusing,
so avoid the re-initialization all together by checking for crng_init <
2 in crng_initialize_primary(), just as we already do in crng_reseed().
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Currently, if CONFIG_RANDOM_TRUST_BOOTLOADER is enabled, multiple calls
to add_bootloader_randomness() are broken and can cause a NULL pointer
dereference, as noted by Ivan T. Ivanov. This is not only a hypothetical
problem, as qemu on arm64 may provide bootloader entropy via EFI and via
devicetree.
On the first call to add_hwgenerator_randomness(), crng_fast_load() is
executed, and if the seed is long enough, crng_init will be set to 1.
On subsequent calls to add_bootloader_randomness() and then to
add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
wait_event_interruptible() and then credit_entropy_bits() will be called.
If the entropy count for that second seed is large enough, that proceeds
to crng_reseed().
However, both wait_event_interruptible() and crng_reseed() depends
(at least in numa_crng_init()) on workqueues. Therefore, test whether
system_wq is already initialized, which is a sufficient indicator that
workqueue_init_early() has progressed far enough.
If we wind up hitting the !system_wq case, we later want to do what
would have been done there when wqs are up, so set a flag, and do that
work later from the rand_initialize() call.
Reported-by: Ivan T. Ivanov <iivanov@suse.de>
Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
Cc: stable@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
[Jason: added crng_need_done state and related logic.]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
By using `char` instead of `unsigned char`, certain platforms will sign
extend the byte when `w = rol32(*bytes++, input_rotate)` is called,
meaning that bit 7 is overrepresented when mixing. This isn't a real
problem (unless the mixer itself is already broken) since it's still
invertible, but it's not quite correct either. Fix this by using an
explicit unsigned type.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This commit addresses one of the lower hanging fruits of the RNG: its
usage of SHA1.
BLAKE2s is generally faster, and certainly more secure, than SHA1, which
has [1] been [2] really [3] very [4] broken [5]. Additionally, the
current construction in the RNG doesn't use the full SHA1 function, as
specified, and allows overwriting the IV with RDRAND output in an
undocumented way, even in the case when RDRAND isn't set to "trusted",
which means potential malicious IV choices. And its short length means
that keeping only half of it secret when feeding back into the mixer
gives us only 2^80 bits of forward secrecy. In other words, not only is
the choice of hash function dated, but the use of it isn't really great
either.
This commit aims to fix both of these issues while also keeping the
general structure and semantics as close to the original as possible.
Specifically:
a) Rather than overwriting the hash IV with RDRAND, we put it into
BLAKE2's documented "salt" and "personal" fields, which were
specifically created for this type of usage.
b) Since this function feeds the full hash result back into the
entropy collector, we only return from it half the length of the
hash, just as it was done before. This increases the
construction's forward secrecy from 2^80 to a much more
comfortable 2^128.
c) Rather than using the raw "sha1_transform" function alone, we
instead use the full proper BLAKE2s function, with finalization.
This also has the advantage of supplying 16 bytes at a time rather than
SHA1's 10 bytes, which, in addition to having a faster compression
function to begin with, means faster extraction in general. On an Intel
i7-11850H, this commit makes initial seeding around 131% faster.
BLAKE2s itself has the nice property of internally being based on the
ChaCha permutation, which the RNG is already using for expansion, so
there shouldn't be any issue with newness, funkiness, or surprising CPU
behavior, since it's based on something already in use.
[1] https://eprint.iacr.org/2005/010.pdf
[2] https://www.iacr.org/archive/crypto2005/36210017/36210017.pdf
[3] https://eprint.iacr.org/2015/967.pdf
[4] https://shattered.io/static/shattered.pdf
[5] https://www.usenix.org/system/files/sec20-leurent.pdf
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In preparation for using blake2s in the RNG, we change the way that it
is wired-in to the build system. Instead of using ifdefs to select the
right symbol, we use weak symbols. And because ARM doesn't need the
generic implementation, we make the generic one default only if an arch
library doesn't need it already, and then have arch libraries that do
need it opt-in. So that the arch libraries can remain tristate rather
than bool, we then split the shash part from the glue code.
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: linux-kbuild@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
_extract_crng() does plain loads of crng->init_time and
crng_global_init_time, which causes undefined behavior if
crng_reseed() and RNDRESEEDCRNG modify these corrently.
Use READ_ONCE() and WRITE_ONCE() to make the behavior defined.
Don't fix the race on crng->init_time by protecting it with crng->lock,
since it's not a problem for duplicate reseedings to occur. I.e., the
lockless access with READ_ONCE() is fine.
Fixes: d848e5f8e1eb ("random: add new ioctl RNDRESEEDCRNG")
Fixes: e192be9d9a30 ("random: replace non-blocking pool with a Chacha20-based CRNG")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
extract_crng() and crng_backtrack_protect() load crng_node_pool with a
plain load, which causes undefined behavior if do_numa_crng_init()
modifies it concurrently.
Fix this by using READ_ONCE(). Note: as per the previous discussion
https://lore.kernel.org/lkml/20211219025139.31085-1-ebiggers@kernel.org/T/#u,
READ_ONCE() is believed to be sufficient here, and it was requested that
it be used here instead of smp_load_acquire().
Also change do_numa_crng_init() to set crng_node_pool using
cmpxchg_release() instead of mb() + cmpxchg(), as the former is
sufficient here but is more lightweight.
Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Since commit
ee3e00e9e7101 ("random: use registers from interrupted code for CPU's w/o a cycle counter")
the irq_flags argument is no longer used.
Remove unused irq_flags.
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: linux-hyperv@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The section at the top of random.c which documents the input functions
available does not document add_hwgenerator_randomness() which might lead
a reader to overlook it. Add a brief note about it.
Signed-off-by: Mark Brown <broonie@kernel.org>
[Jason: reorganize position of function in doc comment and also document
add_bootloader_randomness() while we're at it.]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
ssh://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.16-2021-12-31:
amdgpu:
- Suspend/resume fix
- Restore runtime pm behavior with efifb
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211231143825.11479-1-alexander.deucher@amd.com
|
|
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the firmware efi sysfs code to use default_groups
field which has been the preferred way since aa30f47cf666 ("kobject: Add
support for default attribute groups to kobj_type") so that we can soon
get rid of the obsolete default_attrs field.
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
In an effort to ensure the initrd observed and used by the OS is
the same one that was meant to be loaded, which is difficult to
guarantee otherwise, let's measure the initrd if the EFI stub and
specifically the newly introduced LOAD_FILE2 protocol was used.
Modify the initrd loading sequence so that the contents of the initrd
are measured into PCR9. Note that the patch is currently using
EV_EVENT_TAG to create the eventlog entry instead of EV_IPL. According
to the TCP PC Client specification this is used for PCRs defined for OS
and application usage.
Co-developed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20211119114745.1560453-5-ilias.apalodimas@linaro.org
[ardb: add braces to initializer of tagged_event_data]
Link: https://github.com/ClangBuiltLinux/linux/issues/1547
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the md rdev sysfs code to use default_groups field which
has been the preferred way since commit aa30f47cf666 ("kobject: Add
support for default attribute groups to kobj_type") so that we can soon
get rid of the obsolete default_attrs field.
Cc: Song Liu <song@kernel.org>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Song Liu <song@kernel.org>
|
|
kfree() and bitmap_free() are the same. But using the latter is more
consistent when freeing memory allocated with bitmap_zalloc().
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When a bitmap is local to a function, it is safe to use the non-atomic
__[set|clear]_bit(). No concurrent accesses can occur.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The 'possible_idx' bitmap is set just after it is zeroed, so we can save
the first step.
The 'free_idx' bitmap is used only at the end of the function as the
result of a bitmap xor operation. So there is no need to explicitly
zero it before.
So, slightly simply the code and remove 2 useless 'bitmap_zero()' call
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In current switchdev implementation, every VF PR is assigned to
individual ring on switchdev ctrl VSI. For slow-path traffic, there
is a mapping VF->ring done in software based on src_vsi value (by
calling ice_eswitch_get_target_netdev function).
With this change, HW solution is introduced which is more
efficient. For each VF, src MAC (VF's MAC) filter will be created,
which forwards packets to the corresponding switchdev ctrl VSI queue
based on src MAC address.
This filter has to be removed and then replayed in case of
resetting one VF. Keep information about this rule in repr->mac_rule,
thanks to that we know which rule has to be removed and replayed
for a given VF.
In case of CORE/GLOBAL all rules are removed
automatically. We have to take care of readding them. This is done
by ice_replay_vsi_adv_rule.
When driver leaves switchdev mode, remove all advanced rules
from switchdev ctrl VSI. This is done by ice_rem_adv_rule_for_vsi.
Flag repr->rule_added is needed because in some cases reset
might be triggered before VF sends request to add MAC.
Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In case of error, memremap() returns NULL pointer not
ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.
Fixes: 0db89fa243e5 ("ACPI: Introduce Platform Firmware Runtime Update device driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The following changes were made:
1. Align function signatures to 80 characters per line.
2. Remove tabs for variable assignment and use 1 space instead.
3. Don't compare to NULL in "if" clause.
4. Remove strange indentations.
This will ease on the maintenance of the driver for the future.
Link: https://lore.kernel.org/r/20211215135721.3662-7-mgurtovoy@nvidia.com
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The AMD P-State driver is based on ACPI CPPC function, so ACPI should be
dependence of this driver in the kernel config.
In file included from ../drivers/cpufreq/amd-pstate.c:40:0:
../include/acpi/processor.h:226:2: error: unknown type name ‘phys_cpuid_t’
phys_cpuid_t phys_id; /* CPU hardware ID such as APIC ID for x86 */
^~~~~~~~~~~~
../include/acpi/processor.h:355:1: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
phys_cpuid_t acpi_get_phys_id(acpi_handle, int type, u32 acpi_id);
^~~~~~~~~~~~
phys_addr_t
CC drivers/rtc/rtc-rv3029c2.o
../include/acpi/processor.h:356:1: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
phys_cpuid_t acpi_map_madt_entry(u32 acpi_id);
^~~~~~~~~~~~
phys_addr_t
../include/acpi/processor.h:357:20: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
int acpi_map_cpuid(phys_cpuid_t phys_id, u32 acpi_id);
^~~~~~~~~~~~
phys_addr_t
See https://lore.kernel.org/lkml/20e286d4-25d7-fb6e-31a1-4349c805aae3@infradead.org/.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add the description of @req and @boost_supported in struct amd_cpudata
kernel-doc comment to remove warnings found by running scripts/kernel-doc,
which is caused by using 'make W=1'.
drivers/cpufreq/amd-pstate.c:104: warning: Function parameter or member
'req' not described in 'amd_cpudata'
drivers/cpufreq/amd-pstate.c:104: warning: Function parameter or member
'boost_supported' not described in 'amd_cpudata'
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
ice_replay_vsi_adv_rule will replay advanced rules for a given VSI.
Exit this function when list of rules for given recipe is empty.
Do not add rule when given vsi_handle does not match vsi_handle
from the rule info.
Use ICE_MAX_NUM_RECIPES instead of ICE_SW_LKUP_LAST in order to find
advanced rules as well.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
bioset acct is only needed for raid0 and raid5. Therefore, md_run only
allocates it for raid0 and raid5. However, this does not cover
personality takeover, which may cause uninitialized bioset. For example,
the following repro steps:
mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1
mdadm --wait /dev/md0
mkfs.xfs /dev/md0
mdadm /dev/md0 --grow -l5
mount /dev/md0 /mnt
causes panic like:
[ 225.933939] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 225.934903] #PF: supervisor instruction fetch in kernel mode
[ 225.935639] #PF: error_code(0x0010) - not-present page
[ 225.936361] PGD 0 P4D 0
[ 225.936677] Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
[ 225.937525] CPU: 27 PID: 1133 Comm: mount Not tainted 5.16.0-rc3+ #706
[ 225.938416] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.module_el8.4.0+547+a85d02ba 04/01/2014
[ 225.939922] RIP: 0010:0x0
[ 225.940289] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 225.941196] RSP: 0018:ffff88815897eff0 EFLAGS: 00010246
[ 225.941897] RAX: 0000000000000000 RBX: 0000000000092800 RCX: ffffffff81370a39
[ 225.942813] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000092800
[ 225.943772] RBP: 1ffff1102b12fe04 R08: fffffbfff0b43c01 R09: fffffbfff0b43c01
[ 225.944807] R10: ffffffff85a1e007 R11: fffffbfff0b43c00 R12: ffff88810eaaaf58
[ 225.945757] R13: 0000000000000000 R14: ffff88810eaaafb8 R15: ffff88815897f040
[ 225.946709] FS: 00007ff3f2505080(0000) GS:ffff888fb5e00000(0000) knlGS:0000000000000000
[ 225.947814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 225.948556] CR2: ffffffffffffffd6 CR3: 000000015aa5a006 CR4: 0000000000370ee0
[ 225.949537] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 225.950455] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 225.951414] Call Trace:
[ 225.951787] <TASK>
[ 225.952120] mempool_alloc+0xe5/0x250
[ 225.952625] ? mempool_resize+0x370/0x370
[ 225.953187] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 225.953862] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 225.954464] ? sched_clock_cpu+0x15/0x120
[ 225.955019] ? find_held_lock+0xac/0xd0
[ 225.955564] bio_alloc_bioset+0x1ed/0x2a0
[ 225.956080] ? lock_downgrade+0x3a0/0x3a0
[ 225.956644] ? bvec_alloc+0xc0/0xc0
[ 225.957135] bio_clone_fast+0x19/0x80
[ 225.957651] raid5_make_request+0x1370/0x1b70
[ 225.958286] ? sched_clock_cpu+0x15/0x120
[ 225.958797] ? __lock_acquire+0x8b2/0x3510
[ 225.959339] ? raid5_get_active_stripe+0xce0/0xce0
[ 225.959986] ? lock_is_held_type+0xd8/0x130
[ 225.960528] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 225.961135] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 225.961703] ? sched_clock_cpu+0x15/0x120
[ 225.962232] ? lock_release+0x27a/0x6c0
[ 225.962746] ? do_wait_intr_irq+0x130/0x130
[ 225.963302] ? lock_downgrade+0x3a0/0x3a0
[ 225.963815] ? lock_release+0x6c0/0x6c0
[ 225.964348] md_handle_request+0x342/0x530
[ 225.964888] ? set_in_sync+0x170/0x170
[ 225.965397] ? blk_queue_split+0x133/0x150
[ 225.965988] ? __blk_queue_split+0x8b0/0x8b0
[ 225.966524] ? submit_bio_checks+0x3b2/0x9d0
[ 225.967069] md_submit_bio+0x127/0x1c0
[...]
Fix this by moving alloc/free of acct bioset to pers->run and pers->free.
While we are on this, properly handle md_integrity_register() error in
raid0_run().
Fixes: daee2024715d (md: check level before create and exit io_acct_set)
Cc: stable@vger.kernel.org
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
Use the possessive "its" instead of the contraction "it's"
in printed messages.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Song Liu <song@kernel.org>
|
|
Returns EAGAIN in case the raid456 driver would block waiting for reshape.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Vishal Verma <vverma@digitalocean.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
This adds nowait support to the RAID10 driver. Very similar to
raid1 driver changes. It makes RAID10 driver return with EAGAIN
for situations where it could wait for eg:
- Waiting for the barrier,
- Reshape operation,
- Discard operation.
wait_barrier() and regular_request_wait() fn are modified to return bool
to support error for wait barriers. They returns true in case of wait
or if wait is not required and returns false if wait was required
but not performed to support nowait.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Vishal Verma <vverma@digitalocean.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
This adds nowait support to the RAID1 driver. It makes RAID1 driver
return with EAGAIN for situations where it could wait for eg:
- Waiting for the barrier,
wait_barrier() fn is modified to return bool to support error for
wait barriers. It returns true in case of wait or if wait is not
required and returns false if wait was required but not performed
to support nowait.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Vishal Verma <vverma@digitalocean.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") added support
for checking whether a given bdev supports handling of REQ_NOWAIT or not.
Since then commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable
it for linear target") added support for REQ_NOWAIT for dm. This uses
a similar approach to incorporate REQ_NOWAIT for md based bios.
This patch was tested using t/io_uring tool within FIO. A nvme drive
was partitioned into 2 partitions and a simple raid 0 configuration
/dev/md0 was created.
md0 : active raid0 nvme4n1p1[1] nvme4n1p2[0]
937423872 blocks super 1.2 512k chunks
Before patch:
$ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
Running top while the above runs:
$ ps -eL | grep $(pidof io_uring)
38396 38396 pts/2 00:00:00 io_uring
38396 38397 pts/2 00:00:15 io_uring
38396 38398 pts/2 00:00:13 iou-wrk-38397
We can see iou-wrk-38397 io worker thread created which gets created
when io_uring sees that the underlying device (/dev/md0 in this case)
doesn't support nowait.
After patch:
$ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
Running top while the above runs:
$ ps -eL | grep $(pidof io_uring)
38341 38341 pts/2 00:10:22 io_uring
38341 38342 pts/2 00:10:37 io_uring
After running this patch, we don't see any io worker thread
being created which indicated that io_uring saw that the
underlying device does support nowait. This is the exact behaviour
noticed on a dm device which also supports nowait.
For all the other raid personalities except raid0, we would need
to train pieces which involves make_request fn in order for them
to correctly handle REQ_NOWAIT.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Vishal Verma <vverma@digitalocean.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
As suggested by Neil Brown[1], this limitation seems to be
deprecated.
With plugging in use, writes are processed behind the raid thread
and conf->pending_count is not increased. This limitation occurs only
if caller doesn't use plugs.
It can be avoided and often it is (with plugging). There are no reports
that queue is growing to enormous size so remove queue limitation for
non-plugged IOs too.
[1] https://lore.kernel.org/linux-raid/162496301481.7211.18031090130574610495@noble.neil.brown.name
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
raid_run_ops() relies on the implicitly disabled preemption for
its percpu ops, although this is really about CPU locality. This
breaks RT semantics as it can take regular (and thus sleeping)
spinlocks, such as stripe_lock.
Add a local_lock such that non-RT does not change and continues
to be just map to preempt_disable/enable, but makes RT happy as
the region will use a per-CPU spinlock and thus be preemptible
and still guarantee CPU locality.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
We missed adding handle_err for gpi mode, so add a new function
spi_geni_handle_err() which would call handle_fifo_timeout() or newly
added handle_gpi_timeout() based on mode
Fixes: b59c122484ec ("spi: spi-geni-qcom: Add support for GPI dma")
Reported-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20220103071118.27220-2-vkoul@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Before we invoke spi_finalize_current_transfer() in
spi_gsi_callback_result() we should set the spi->cur_msg->status as
appropriate (0 for success, error otherwise).
The helps to return error on transfer and not wait till it timesout on
error
Fixes: b59c122484ec ("spi: spi-geni-qcom: Add support for GPI dma")
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20220103071118.27220-1-vkoul@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
struct uart_port contains a cached copy of the Modem Control signals.
It is used to skip register writes in uart_update_mctrl() if the new
signal state equals the old signal state. It also avoids a register
read to obtain the current state of output signals.
When a uart_port is registered, uart_configure_port() changes signal
state but neglects to keep the cached copy in sync. That may cause
a subsequent register write to be incorrectly skipped. Fix it before
it trips somebody up.
This behavior has been present ever since the serial core was introduced
in 2002:
https://git.kernel.org/history/history/c/33c0d1b0c3eb
So far it was never an issue because the cached copy is initialized to 0
by kzalloc() and when uart_configure_port() is executed, at most DTR has
been set by uart_set_options() or sunsu_console_setup(). Therefore,
a stable designation seems unnecessary.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/bceeaba030b028ed810272d55d5fc6f3656ddddb.1641129752.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In this error handling, "transmit_chars_dma" function will call
"transmit_chars_pio" once per characters. But "transmit_chars_pio" will
continue to send characters while xmit buffer is not empty.
Remove this useless loop, one call is sufficient.
Signed-off-by: Valentin Caron <valentin.caron@foss.st.com>
Link: https://lore.kernel.org/r/20220104182445.4195-5-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If flow control is enabled, framework will call stop_tx to
pause transfer and then call start_tx to resume transfer.
Clear USART_CR3_DMAT bit in stop_tx ops to pause DMA transfer.
Signed-off-by: Erwan Le Ray <erwan.leray@foss.st.com>
Signed-off-by: Valentin Caron <valentin.caron@foss.st.com>
Link: https://lore.kernel.org/r/20220104182445.4195-4-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
TX DMA state condition is handled by tx_dma_busy boolean.
This boolean is set when dma descriptor is requested and reset when dma
channel is stopped (dma_terminate).
In stm32_usart_serial_remove(), stm32_usart_stop_tx() and
stm32_usart_transmit_chars_dma() fallback error case, DMA channel is
stopped but tx_dma_busy is not handled.
Rework the driver by using two new functions to solve this issue:
- stm32_usart_tx_dma_started return true if DMA TX have a descriptor.
- stm32_usart_tx_dma_enabled return true if DMAT bit is set.
stm32_usart_tx_dma_started uses tx_dma_busy flag to prevent dual DMA
transaction at the same time. This flag is set when a DMA transaction
begins and is unset when dmaengine_terminate_async function is called.
A new DMA transaction cannot be created if this flag is set.
Create a new function "stm32_usart_tx_dma_terminate" to be sure the flag
is unset after each call of dmaengine_terminate_async.
Signed-off-by: Erwan Le Ray <erwan.leray@foss.st.com>
Signed-off-by: Valentin Caron <valentin.caron@foss.st.com>
Link: https://lore.kernel.org/r/20220104182445.4195-3-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Terminate DMA transaction and clear CR3_DMAT when shutdown is requested,
instead of when remove is requested. If DMA transfer is not stopped in
shutdown ops, driver will fail to start a new DMA transfer after next
startup ops.
Fixes: 3489187204eb ("serial: stm32: adding dma support")
Signed-off-by: Erwan Le Ray <erwan.leray@foss.st.com>
Signed-off-by: Valentin Caron <valentin.caron@foss.st.com>
Link: https://lore.kernel.org/r/20220104182445.4195-2-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit d8d8ffa47783 ("amba-pl011: do not disable RTS during shutdown")
amended the PL011 serial driver to leave DTR/RTS polarity untouched on
tty close. That change made sense.
But the commit also added code to save DTR/RTS state to an internal
variable on tty close and restore it on tty open. That part of the
commit makes less sense: The driver has no ->pm() callback, so the uart
remains powered after tty close and automatically preserves register
state, including DTR/RTS.
Saving and restoring registers isn't the job of the ->startup() and
->shutdown() callbacks anyway. Rather, it should happen in ->pm().
Additionally, after pl011_startup() restores the state, the serial core
overrides it in uart_port_dtr_rts() if a baud rate has been set:
tty_port_open()
uart_port_activate()
uart_startup()
uart_port_startup()
pl011_startup() # restores DTR/RTS from uap->old_cr
tty_port_block_til_ready()
tty_port_raise_dtr_rts # if (C_BAUD(tty))
uart_dtr_rts()
uart_port_dtr_rts() # raises DTR/RTS
The serial core also overrides DTR/RTS on tty close in uart_shutdown()
if C_HUPCL(tty) is set. So a user-defined DTR/RTS polarity won't
survive a close/open cycle anyway, unless the user has set the baud rate
to zero and disabled hupcl on the tty.
Bottom line is, the code to save and restore DTR/RTS has no effect.
Remove it.
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/e22089ab49e6e78822c50c8c4db46bf3ee885623.1641129328.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
pl011_set_termios() briefly resets the CR register to zero, thereby
glitching DTR/RTS signals. With rs485 this may result in the bus being
occupied for no reason.
Where does this register write originate from?
The PL011 driver was forked from the PL010 driver in 2004:
https://git.kernel.org/history/history/c/157c0342e591
Until this commit, the PL010 driver's IRQ handler ambauart_int()
modified the CR register without holding the port spinlock.
ambauart_set_termios() also modified that register. To prevent
concurrent read-modify-writes by the IRQ handler and to prevent
transmission while changing baudrate, ambauart_set_termios() had to
disable interrupts. On the PL010, that is achieved by writing zero to
the CR register.
However, on the PL011, interrupts are disabled in the IMSC register,
not in the CR register.
Additionally, the commit amended both the PL010 and PL011 driver to
acquire the port spinlock in the IRQ handler, obviating the need to
disable interrupts in ->set_termios().
So the CR register write is obsolete for two reasons. Drop it.
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/f49f945375f5ccb979893c49f1129f51651ac738.1641129062.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|