Age | Commit message (Collapse) | Author |
|
Samuel Holland <samuel.holland@sifive.com> says:
This series fixes a couple of issues related to using the cbo.zero
instruction in userspace. The first patch fixes a bug where the wrong
enable bit gets set if the kernel is running in M-mode. The remaining
patches fix a bug where the enable bit gets reset to its default value
after a nonretentive idle state. I have hardware which reproduces this:
Before this series:
$ tools/testing/selftests/riscv/hwprobe/cbo
TAP version 13
1..3
ok 1 Zicboz block size
# Zicboz block size: 64
Illegal instruction
After applying this series:
$ tools/testing/selftests/riscv/hwprobe/cbo
TAP version 13
1..3
ok 1 Zicboz block size
# Zicboz block size: 64
ok 2 cbo.zero
ok 3 cbo.zero check
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
* b4-shazam-merge:
riscv: Save/restore envcfg CSR during CPU suspend
riscv: Add a custom ISA extension for the [ms]envcfg CSR
riscv: Fix enabling cbo.zero when running in M-mode
Link: https://lore.kernel.org/r/20240228065559.3434837-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The [ms]envcfg CSR was added in version 1.12 of the RISC-V privileged
ISA (aka S[ms]1p12). However, bits in this CSR are defined by several
other extensions which may be implemented separately from any particular
version of the privileged ISA (for example, some unrelated errata may
prevent an implementation from claiming conformance with Ss1p12). As a
result, Linux cannot simply use the privileged ISA version to determine
if the CSR is present. It must also check if any of these other
extensions are implemented. It also cannot probe the existence of the
CSR at runtime, because Linux does not require Sstrict, so (in the
absence of additional information) it cannot know if a CSR at that
address is [ms]envcfg or part of some non-conforming vendor extension.
Since there are several standard extensions that imply the existence of
the [ms]envcfg CSR, it becomes unwieldy to check for all of them
wherever the CSR is accessed. Instead, define a custom Xlinuxenvcfg ISA
extension bit that is implied by the other extensions and denotes that
the CSR exists as defined in the privileged ISA, containing at least one
of the fields common between menvcfg and senvcfg.
This extension does not need to be parsed from the devicetree or ISA
string because it can only be implemented as a subset of some other
standard extension.
Cc: <stable@vger.kernel.org> # v6.7+
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240228065559.3434837-3-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
When the kernel is running in M-mode, the CBZE bit must be set in the
menvcfg CSR, not in senvcfg.
Cc: <stable@vger.kernel.org>
Fixes: 43c16d51a19b ("RISC-V: Enable cbo.zero in usermode")
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240228065559.3434837-2-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Before attempting to support the pre-ratification version of vector
found on older T-Head CPUs, disallow "v" in riscv,isa on these
platforms. The deprecated property has no clear way to communicate
the specific version of vector that is supported and much of the vendor
provided software puts "v" in the isa string. riscv,isa-extensions
should be used instead. This should not be too much of a burden for
these systems, as the vendor shipped devicetrees and firmware do not
work with a mainline kernel and will require updating.
We can limit this restriction to only ignore v in riscv,isa on CPUs
that report T-Head's vendor ID and a zero marchid. Newer T-Head CPUs
that support the ratified version of vector should report non-zero
marchid, according to Guo Ren [1].
Link: https://lore.kernel.org/linux-riscv/CAJF2gTRy5eK73=d6s7CVy9m9pB8p4rAoMHM3cZFwzg=AuF7TDA@mail.gmail.com/ [1]
Fixes: dc6667a4e7e3 ("riscv: Extending cpufeature.c to detect V-extension")
Co-developed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20240223-tidings-shabby-607f086cb4d7@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Charlie Jenkins <charlie@rivosinc.com> says:
Each architecture generally implements fine-tuned checksum functions to
leverage the instruction set. This patch adds the main checksum
functions that are used in networking. Tested on QEMU, this series
allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster.
This patch takes heavy use of the Zbb extension using alternatives
patching.
To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT.
I have attempted to make these functions as optimal as possible, but I
have not ran anything on actual riscv hardware. My performance testing
has been limited to inspecting the assembly, running the algorithms on
x86 hardware, and running in QEMU.
ip_fast_csum is a relatively small function so even though it is
possible to read 64 bits at a time on compatible hardware, the
bottleneck becomes the clean up and setup code so loading 32 bits at a
time is actually faster.
* b4-shazam-merge:
kunit: Add tests for csum_ipv6_magic and ip_fast_csum
riscv: Add checksum library
riscv: Add checksum header
riscv: Add static key for misaligned accesses
asm-generic: Improve csum_fold
Link: https://lore.kernel.org/r/20240108-optimize_checksum-v15-0-1c50de5f2167@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Support static branches depending on the value of misaligned accesses.
This will be used by a later patch in the series. At any point in time,
this static branch will only be enabled if all online CPUs are
considered "fast".
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20240108-optimize_checksum-v15-2-1c50de5f2167@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Clément Léger <cleger@rivosinc.com> says:
This series add support for a few more extensions that are present in
the RVA22U64/RVA23U64 (either mandatory or optional) and that are useful
for userspace:
- Zicond
- Zacas
- Ztso
Series currently based on riscv/for-next.
* b4-shazam-lts:
riscv: hwprobe: export Zicond extension
riscv: hwprobe: export Zacas ISA extension
riscv: add ISA extension parsing for Zacas
dt-bindings: riscv: add Zacas ISA extension description
riscv: hwprobe: export Ztso ISA extension
riscv: add ISA extension parsing for Ztso
Link: https://lore.kernel.org/r/20231220155723.684081-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add parsing for Zacas ISA extension which was ratified recently in the
riscv-zacas manual.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20231220155723.684081-5-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add support to parse the Ztso string in the riscv,isa string. The
bindings already supports it but not the ISA parsing code.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20231220155723.684081-2-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
There were a few single-letter extensions that we had references to
floating around in the kernel, but that never ended up as actual ISA
specs and have mostly been replaced by multi-letter extensions. This
removes the references to those extensions.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231110175903.2631-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add parsing for Zfa ISA extension [1] which were ratified in commit
056b6ff467c7 ("Zfa is ratified") of riscv-isa-manual[2].
Link: https://drive.google.com/file/d/1VT6QIggpb59-8QRV266dEE4T8FZTxGq4/view [1]
Link: https://github.com/riscv/riscv-isa-manual/commits/056b6ff467c7 [2]
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20231114141256.126749-19-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add parsing for Zvfh[min] ISA extension[1] which were ratified in
june 2023 around commit e2ccd0548d6c ("Remove draft warnings from
Zvfh[min]") in riscv-v-spec[2].
Link: https://drive.google.com/file/d/1_Yt60HGAf1r1hx7JnsIptw0sqkBd9BQ8/view [1]
Link: https://github.com/riscv/riscv-v-spec/commits/e2ccd0548d6c [2]
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20231114141256.126749-16-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add parsing for Zihintntl ISA extension[1] that was ratified in commit
0dc91f5 ("Zihintntl is ratified") of riscv-isa-manual[2].
Link: https://drive.google.com/file/d/13_wsN8YmRfH8YWysFyTX-DjTkCnBd9hj/view [1]
Link: https://github.com/riscv/riscv-isa-manual/commit/0dc91f505e6d [2]
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20231114141256.126749-13-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add parsing for Zfh[min] ISA extensions[1].
Link: https://drive.google.com/file/d/1z3tQQLm5ALsAD77PM0l0CHnapxWCeVzP/view [1]
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20231114141256.126749-10-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add parsing of some Zv* vector crypto ISA extensions that are mentioned
in "RISC-V Cryptography Extensions Volume II" [1]. These ISA extensions
are the following:
- Zvbb: Vector Basic Bit-manipulation
- Zvbc: Vector Carryless Multiplication
- Zvkb: Vector Cryptography Bit-manipulation
- Zvkg: Vector GCM/GMAC.
- Zvkned: NIST Suite: Vector AES Block Cipher
- Zvknh[ab]: NIST Suite: Vector SHA-2 Secure Hash
- Zvksed: ShangMi Suite: SM4 Block Cipher
- Zvksh: ShangMi Suite: SM3 Secure Hash
- Zvkn: NIST Algorithm Suite
- Zvknc: NIST Algorithm Suite with carryless multiply
- Zvkng: NIST Algorithm Suite with GCM.
- Zvks: ShangMi Algorithm Suite
- Zvksc: ShangMi Algorithm Suite with carryless multiplication
- Zvksg: ShangMi Algorithm Suite with GCM.
- Zvkt: Vector Data-Independent Execution Latency.
Link: https://drive.google.com/file/d/1gb9OLH-DhbCgWp7VwpPOVrrY6f3oSJLL/view [1]
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20231114141256.126749-7-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The Scalar Crypto specification defines Zk as a shorthand for the
Zkn, Zkr and Zkt extensions. The same follows for both Zkn, Zks and Zbk,
which are all shorthands for various other extensions. The detailed
breakdown can be found in their dt-binding entries.
Since Zkn also implies the Zbkb, Zbkc and Zbkx extensions, simply passing
"zk" through a DT should enable all of Zbkb, Zbkc, Zbkx, Zkn, Zkr and Zkt.
For example, setting the "riscv,isa" DT property to "rv64imafdc_zk"
should generate the following cpuinfo output:
"rv64imafdc_zicntr_zicsr_zifencei_zihpm_zbkb_zbkc_zbkx_zknd_zkne_zknh_zkr_zkt"
riscv_isa_ext_data grows a pair of new members, to permit setting the
relevant bits for "bundled" extensions, both while parsing the ISA string
and the new dedicated extension properties.
Co-developed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231114141256.126749-4-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Zbc was documented in the dt-bindings but actually not supported in ISA
string parsing. Add it.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20231114141256.126749-2-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- Support for handling misaligned accesses in S-mode
- Probing for misaligned access support is now properly cached and
handled in parallel
- PTDUMP now reflects the SW reserved bits, as well as the PBMT and
NAPOT extensions
- Performance improvements for TLB flushing
- Support for many new relocations in the module loader
- Various bug fixes and cleanups
* tag 'riscv-for-linus-6.7-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
riscv: Optimize bitops with Zbb extension
riscv: Rearrange hwcap.h and cpufeature.h
drivers: perf: Do not broadcast to other cpus when starting a counter
drivers: perf: Check find_first_bit() return value
of: property: Add fw_devlink support for msi-parent
RISC-V: Don't fail in riscv_of_parent_hartid() for disabled HARTs
riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings
riscv: Don't use PGD entries for the linear mapping
RISC-V: Probe misaligned access speed in parallel
RISC-V: Remove __init on unaligned_emulation_finish()
RISC-V: Show accurate per-hart isa in /proc/cpuinfo
RISC-V: Don't rely on positional structure initialization
riscv: Add tests for riscv module loading
riscv: Add remaining module relocations
riscv: Avoid unaligned access when relocating modules
riscv: split cache ops out of dma-noncoherent.c
riscv: Improve flush_tlb_kernel_range()
riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
riscv: Improve flush_tlb_range() for hugetlb pages
riscv: Improve tlb_flush()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for cbo.zero in userspace
- Support for CBOs on ACPI-based systems
- A handful of improvements for the T-Head cache flushing ops
- Support for software shadow call stacks
- Various cleanups and fixes
* tag 'riscv-for-linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits)
RISC-V: hwprobe: Fix vDSO SIGSEGV
riscv: configs: defconfig: Enable configs required for RZ/Five SoC
riscv: errata: prefix T-Head mnemonics with th.
riscv: put interrupt entries into .irqentry.text
riscv: mm: Update the comment of CONFIG_PAGE_OFFSET
riscv: Using TOOLCHAIN_HAS_ZIHINTPAUSE marco replace zihintpause
riscv/mm: Fix the comment for swap pte format
RISC-V: clarify the QEMU workaround in ISA parser
riscv: correct pt_level name via pgtable_l5/4_enabled
RISC-V: Provide pgtable_l5_enabled on rv32
clocksource: timer-riscv: Increase rating of clock_event_device for Sstc
clocksource: timer-riscv: Don't enable/disable timer interrupt
lkdtm: Fix CFI_BACKWARD on RISC-V
riscv: Use separate IRQ shadow call stacks
riscv: Implement Shadow Call Stack
riscv: Move global pointer loading to a macro
riscv: Deduplicate IRQ stack switching
riscv: VMAP_STACK overflow detection thread-safe
RISC-V: cacheflush: Initialize CBO variables on ACPI systems
RISC-V: ACPI: RHCT: Add function to get CBO block sizes
...
|
|
Probing for misaligned access speed takes about 0.06 seconds. On a
system with 64 cores, doing this in smp_callin() means it's done
serially, extending boot time by 3.8 seconds. That's a lot of boot time.
Instead of measuring each CPU serially, let's do the measurements on
all CPUs in parallel. If we disable preemption on all CPUs, the
jiffies stop ticking, so we can do this in stages of 1) everybody
except core 0, then 2) core 0. The allocations are all done outside of
on_each_cpu() to avoid calling alloc_pages() with interrupts disabled.
For hotplugged CPUs that come in after the boot time measurement,
register CPU hotplug callbacks, and do the measurement there. Interrupts
are enabled in those callbacks, so they're fine to do alloc_pages() in.
Reported-by: Jisheng Zhang <jszhang@kernel.org>
Closes: https://lore.kernel.org/all/mhng-9359993d-6872-4134-83ce-c97debe1cf9a@palmer-ri-x1c9/T/#mae9b8f40016f9df428829d33360144dc5026bcbf
Fixes: 584ea6564bca ("RISC-V: Probe for unaligned access speed")
Signed-off-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20231106225855.3121724-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
If misaligned_access_speed percpu var isn't so called "HWPROBE
MISALIGNED UNKNOWN", it means the probe has happened(this is possible
for example, hotplug off then hotplug on one cpu), and the percpu var
has been set, don't probe again in this case.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: 584ea6564bca ("RISC-V: Probe for unaligned access speed")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230912154040.3306-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Clément Léger <cleger@rivosinc.com> says:
Since commit 61cadb9 ("Provide new description of misaligned load/store
behavior compatible with privileged architecture.") in the RISC-V ISA
manual, it is stated that misaligned load/store might not be supported.
However, the RISC-V kernel uABI describes that misaligned accesses are
supported. In order to support that, this series adds support for S-mode
handling of misaligned accesses as well support for prctl(PR_UNALIGN).
Handling misaligned access in kernel allows for a finer grain control
of the misaligned accesses behavior, and thanks to the prctl() call,
can allow disabling misaligned access emulation to generate SIGBUS. User
space can then optimize its software by removing such access based on
SIGBUS generation.
This series is useful when using a SBI implementation that does not
handle misaligned traps as well as detecting misaligned accesses
generated by userspace application using the prctrl(PR_SET_UNALIGN)
feature.
This series can be tested using the spike simulator[1] and a modified
openSBI version[2] which allows to always delegate misaligned load/store to
S-mode. A test[3] that exercise various instructions/registers can be
executed to verify the unaligned access support.
[1] https://github.com/riscv-software-src/riscv-isa-sim
[2] https://github.com/rivosinc/opensbi/tree/dev/cleger/no_misaligned
[3] https://github.com/clementleger/unaligned_test
* b4-shazam-merge:
riscv: add support for PR_SET_UNALIGN and PR_GET_UNALIGN
riscv: report misaligned accesses emulation to hwprobe
riscv: annotate check_unaligned_access_boot_cpu() with __init
riscv: add support for sysctl unaligned_enabled control
riscv: add floating point insn support to misaligned access emulation
riscv: report perf event for misaligned fault
riscv: add support for misaligned trap handling in S-mode
riscv: remove unused functions in traps_misaligned.c
Link: https://lore.kernel.org/r/20231004151405.521596-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
hwprobe provides a way to report if misaligned access are emulated. In
order to correctly populate that feature, we can check if it actually
traps when doing a misaligned access. This can be checked using an
exception table entry which will actually be used when a misaligned
access is done from kernel mode.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-8-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
This function is solely called as an initcall, thus annotate it with
__init.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-7-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Extensions prefixed with "Su" won't corrupt the workaround in many
cases. The only exception is when the first multi-letter extension in the
ISA string begins with "Su" and is not prefixed with an underscore.
For instance, following ISA string can confuse this QEMU workaround.
* "rv64imacsuclic" (RV64I + M + A + C + "Suclic")
However, this case is very unlikely because extensions prefixed by either
"Z", "Sm" or "Ss" will most likely precede first.
For instance, the "Suclic" extension (draft as of now) will be placed after
related "Smclic" and "Ssclic" extensions. It's also highly likely that
other unprivileged extensions like "Zba" will precede.
It's also possible to suppress the issue in the QEMU workaround with an
underscore. Following ISA string won't confuse the QEMU workaround.
* "rv64imac_suclic" (RV64I + M + A + C + delimited "Suclic")
This fix is to tell kernel developers the nature of this workaround
precisely. There are some "Su*" extensions to be ratified but don't worry
about this workaround too much.
This commit comes with other minor editorial fixes (for minor wording and
spacing issues, without changing the meaning).
Signed-off-by: Tsukasa OI <research_trasio@irq.a4lg.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/8a127608cf6194a6d288289f2520bd1744b81437.1690350252.git.research_trasio@irq.a4lg.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The RISC-V integer conditional (Zicond) operation extension defines
standard conditional arithmetic and conditional-select/move operations
which are inspired from the XVentanaCondOps extension. In fact, QEMU
RISC-V also has support for emulating Zicond extension.
Let us detect Zicond extension from ISA string available through
DT or ACPI.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Extend the ISA string parsing to detect the Smstateen extension. If the
extension is enabled then access to certain 'state' such as AIA CSRs in
VS mode is controlled by *stateen0 registers.
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
When Zicboz is present, enable its instruction (cbo.zero) in
usermode by setting its respective senvcfg bit. We don't bother
trying to set this bit per-task, which would also require an
interface for tasks to request enabling and/or disabling. Instead,
permanently set the bit for each hart which has the extension when
bringing it online.
This patch also introduces riscv_cpu_has_extension_[un]likely()
functions to check a specific hart's ISA bitmap for extensions.
Prior to checking the specific hart's bitmap in these functions
we try the bitmap which represents the LCD of extensions, but only
when we know it will use its optimized, alternatives path by gating
its call on CONFIG_RISCV_ALTERNATIVE. When alternatives are used, the
compiler ensures that the invocation of the LCD search becomes a
constant true or false. When it's true, even the new functions will
completely vanish from their callsites. OTOH, when the LCD check is
false, we need to do a search of the hart's ISA bitmap. Had we also
checked the LCD bitmap without the use of alternatives, then we would
have ended up with two bitmap searches instead of one.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230918131518.56803-10-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
commit c818fea83de4 ("riscv: say disabling zicbom if no or bad
riscv,cbom-block-size found") improved the error messages for
zicbom but zicboz was missed since its patches were in flight
at the same time. Get 'em now.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230918131518.56803-9-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Evan Green <evan@rivosinc.com> says:
The current setting for the hwprobe bit indicating misaligned access
speed is controlled by a vendor-specific feature probe function. This is
essentially a per-SoC table we have to maintain on behalf of each vendor
going forward. Let's convert that instead to something we detect at
runtime.
We have two assembly routines at the heart of our probe: one that
does a bunch of word-sized accesses (without aligning its input buffer),
and the other that does byte accesses. If we can move a larger number of
bytes using misaligned word accesses than we can with the same amount of
time doing byte accesses, then we can declare misaligned accesses as
"fast".
The tradeoff of reducing this maintenance burden is boot time. We spend
4-6 jiffies per core doing this measurement (0-2 on jiffie edge
alignment, and 4 on measurement). The timing loop was based on
raid6_choose_gen(), which uses (16+1)*N jiffies (where N is the number
of algorithms). By taking only the fastest iteration out of all
attempts for use in the comparison, variance between runs is very low.
On my THead C906, it looks like this:
[ 0.047563] cpu0: Ratio of byte access time to unaligned word access is 4.34, unaligned accesses are fast
Several others have chimed in with results on slow machines with the
older algorithm, which took all runs into account, including noise like
interrupts. Even with this variation, results indicate that in all cases
(fast, slow, and emulated) the measured numbers are nowhere near each
other (always multiple factors away).
* b4-shazam-merge:
RISC-V: alternative: Remove feature_probe_func
RISC-V: Probe for unaligned access speed
Link: https://lore.kernel.org/r/20230818194136.4084400-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Rather than deferring unaligned access speed determinations to a vendor
function, let's probe them and find out how fast they are. If we
determine that an unaligned word access is faster than N byte accesses,
mark the hardware's unaligned access as "fast". Otherwise, we mark
accesses as slow.
The algorithm itself runs for a fixed amount of jiffies. Within each
iteration it attempts to time a single loop, and then keeps only the best
(fastest) loop it saw. This algorithm was found to have lower variance from
run to run than my first attempt, which counted the total number of
iterations that could be done in that fixed amount of jiffies. By taking
only the best iteration in the loop, assuming at least one loop wasn't
perturbed by an interrupt, we eliminate the effects of interrupts and
other "warm up" factors like branch prediction. The only downside is it
depends on having an rdtime granular and accurate enough to measure a
single copy. If we ever manage to complete a loop in 0 rdtime ticks, we
leave the unaligned setting at UNKNOWN.
There is a slight change in user-visible behavior here. Previously, all
boards except the THead C906 reported misaligned access speed of
UNKNOWN. C906 reported FAST. With this change, since we're now measuring
misaligned access speed on each hart, all RISC-V systems will have this
key set as either FAST or SLOW.
Currently, we don't have a way to confidently measure the difference between
SLOW and EMULATED, so we label anything not fast as SLOW. This will
mislabel some systems that are actually EMULATED as SLOW. When we get
support for delegating misaligned access traps to the kernel (as opposed
to the firmware quietly handling it), we can explicitly test in Linux to
see if unaligned accesses trap. Those systems will start to report
EMULATED, though older (today's) systems without that new SBI mechanism
will continue to report SLOW.
I've updated the documentation for those hwprobe values to reflect
this, specifically: SLOW may or may not be emulated by software, and FAST
represents means being faster than equivalent byte accesses. The change
in documentation is accurate with respect to both the former and current
behavior.
Signed-off-by: Evan Green <evan@rivosinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230818194136.4084400-2-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for the new "riscv,isa-extensions" and "riscv,isa-base"
device tree interfaces for probing extensions
- Support for userspace access to the performance counters
- Support for more instructions in kprobes
- Crash kernels can be allocated above 4GiB
- Support for KCFI
- Support for ELFs in !MMU configurations
- ARCH_KMALLOC_MINALIGN has been reduced to 8
- mmap() defaults to sv48-sized addresses, with longer addresses hidden
behind a hint (similar to Arm and Intel)
- Also various fixes and cleanups
* tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
lib/Kconfig.debug: Restrict DEBUG_INFO_SPLIT for RISC-V
riscv: support PREEMPT_DYNAMIC with static keys
riscv: Move create_tmp_mapping() to init sections
riscv: Mark KASAN tmp* page tables variables as static
riscv: mm: use bitmap_zero() API
riscv: enable DEBUG_FORCE_FUNCTION_ALIGN_64B
riscv: remove redundant mv instructions
RISC-V: mm: Document mmap changes
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Restrict address space for sv39,sv48,sv57
riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent
riscv: allow kmalloc() caches aligned to the smallest value
riscv: support the elf-fdpic binfmt loader
binfmt_elf_fdpic: support 64-bit systems
riscv: Allow CONFIG_CFI_CLANG to be selected
riscv/purgatory: Disable CFI
riscv: Add CFI error handling
riscv: Add ftrace_stub_graph
riscv: Add types to indirectly called assembly functions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree include cleanups from Rob Herring:
"These are the remaining few clean-ups of DT related includes which
didn't get applied to subsystem trees"
* tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
ipmi: Explicitly include correct DT includes
tpm: Explicitly include correct DT includes
lib/genalloc: Explicitly include correct DT includes
parport: Explicitly include correct DT includes
sbus: Explicitly include correct DT includes
mux: Explicitly include correct DT includes
macintosh: Explicitly include correct DT includes
hte: Explicitly include correct DT includes
EDAC: Explicitly include correct DT includes
clocksource: Explicitly include correct DT includes
sparc: Explicitly include correct DT includes
riscv: Explicitly include correct DT includes
|
|
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it was merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Link: https://lore.kernel.org/r/20230714174043.4040561-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
As it says on the tin, provide Kconfig option to control parsing the
"riscv,isa" devicetree property. If either option is used, the kernel
will fall back to parsing "riscv,isa", where "riscv,isa-base" and
"riscv,isa-extensions" are not present.
The Kconfig options are set up so that the default kernel configuration
will enable the fallback path, without needing the commandline option.
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Suggested-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230713-aviator-plausibly-a35662485c2c@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add support for parsing the new riscv,isa-extensions property in
riscv_fill_hwcap(), by means of a new "property" member of the
riscv_isa_ext_data struct. For now, this shadows the name of the
extension for all users, however this may not be the case for all
extensions, based on how the dt-binding is written.
For the sake of backwards compatibility, fall back to the old scheme
if the new properties are not detected. For now, just inform, rather
than warn, when that happens.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230713-vocation-profane-39a74b3c2649@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Before adding more complexity to it, split riscv_fill_hwcap() into 3
distinct sections:
- riscv_fill_hwcap() still is the top level function, into which the
additional complexity will be added.
- riscv_fill_hwcap_from_isa_string() handles getting the information
from the riscv,isa/ACPI equivalent across harts & the various quirks
there
- riscv_parse_isa_string() does what it says on the tin.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230713-daylight-puritan-37aeb41a4d9b@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
So that riscv_fill_hwcap() can use riscv_isa_ext to probe for single
letter extensions, add them to it.
As a result, what gets spat out in /proc/cpuinfo will become borked, as
single letter extensions will be printed as part of the base extensions
and while printing from riscv_isa_arr. Take the opportunity to unify the
printing of the isa string, using the new member of riscv_isa_ext_data
in the process.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230713-despite-bright-de00ac888cc7@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
In riscv_fill_hwcap() riscv_isa_ext array can be looped over, rather
than duplicating the list of extensions with individual
SET_ISA_EXT_MAP() usage. While at it, drop the statement-of-the-obvious
comments from the struct, rename uprop to something more suitable for
its new use & constify the members.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230713-dastardly-affiliate-4cf819dccde2@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
To facilitate using one struct to define extensions, rather than having
several, shunt isa_ext_arr to cpufeature.c, where it will be used for
probing extension presence also.
As that scope of the array as widened, prefix it with riscv & drop the
type from the variable name.
Since the new array is const, print_isa() needs a wee bit of cleanup to
avoid complaints about losing the const qualifier.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230713-spirits-upside-a2c61c65fd5a@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
ACPI ISA strings are based on a specification after Zicsr and Zifencei
were split out of I, so we shouldn't be treating them as part of I. We
haven't release an ACPI-based kernel yet, so we don't need to worry
about compatibility with the old ISA strings.
Fixes: 07edc32779e3 ("RISC-V: always report presence of extensions formerly part of the base ISA")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20230711224600.10879-1-palmer@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Conor Dooley <conor@kernel.org> says:
From: Conor Dooley <conor.dooley@microchip.com>
Here are some bits that were discussed with Drew on the "should we
allow caps" threads that I have now created patches for:
- splitting of riscv_of_processor_hartid() into two distinct functions,
one for use purely during early boot, prior to the establishment of
the possible-cpus mask & another to fit the other current use-cases
- that then allows us to then completely skip some validation of the
hartid in the parser
- the biggest diff in the series is a rework of the comments in the
parser, as I have mostly found the existing (sparse) ones to not be
all that helpful whenever I have to go back and look at it
- from writing the comments, I found a conditional doing a bit of a
dance that I found counter-intuitive, so I've had a go at making that
match what I would expect a little better
- `i` implies 4 other extensions, so add them as extensions and set
them for the craic. Sure why not like...
* b4-shazam-merge:
RISC-V: always report presence of extensions formerly part of the base ISA
dt-bindings: riscv: explicitly mention assumption of Zicntr & Zihpm support
RISC-V: remove decrement/increment dance in ISA string parser
RISC-V: rework comments in ISA string parser
RISC-V: validate riscv,isa at boot, not during ISA string parsing
RISC-V: split early & late of_node to hartid mapping
RISC-V: simplify register width check in ISA string parsing
Link: https://lore.kernel.org/r/20230607-audacity-overhaul-82bb867a825f@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Of these four extensions, two were part of the base ISA when the port was
written and are required by the kernel. The other two are implied when
`i` is in riscv,isa on DT systems.
There's not much that userspace can do with this extra information, but
there is no harm in reporting an ISA string that closer resembles the
current versions of the specifications either.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230607-nest-collision-5796b6be8be6@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
While expanding on the comments in the ISA string parsing code, I
noticed that the conditional decrement of `isa` at the end of the loop
was a bit odd.
The parsing code expects that at the start of the for loop, `isa` will
point to the first character of the next unparsed extension.
However, depending on what the next extension is, this may not be true.
Unless the next extension is a multi-letter extension preceded by an
underscore, `isa` will either point to the string's null-terminator or
to the first character of the next extension, once the switch statement
has been evaluated.
Obviously incrementing `isa` at the end of the loop could cause it to
increment past the null terminator or miss a single letter extension, so
`isa` is conditionally decremented, just so that the loop can increment
it again.
It's easier to understand the code if, instead of this decrement +
increment dance, we instead use a while loop & rely on the handling of
individual extension types to leave `isa` pointing to the first
character of the next extension.
As already mentioned, this won't be the case where the following
extension is multi-letter & preceded by an underscore. To handle that,
invert the check and increment rather than decrement.
Hopefully this eliminates a "huh?!?" moment the next time somebody tries
to understand this code.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20230607-estate-left-f20faabefb89@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
I have found these comments to not be at all helpful whenever I look at
the parser. Further, the comments in the default case (single letter
parser) are not quite right either.
Group the comments into a larger one at the start of each case, that
attempts to explain things at a higher level.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230607-headpiece-tannery-83ed5cc4856a@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Since riscv_fill_hwcap() now only iterates over possible cpus, the
basic validation of whether riscv,isa contains "rv<width>" can be moved
to riscv_early_of_processor_hartid().
Further, "ima" support is required by the kernel, so reject any CPU not
fitting the bill.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20230607-guts-blurry-67e711acf328@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Saving off the `isa` pointer to a temp variable, followed by checking if
it has been incremented is a bit of an odd pattern. Perhaps it was done
to avoid a funky looking if statement mixed with the ifdeffery.
Now that we use IS_ENABLED() here just return from the parser as soon as
we detect a mismatch between the string and the currently running
kernel.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20230607-splatter-bacterium-a75bb9f0d0b7@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Evan Green <evan@rivosinc.com> says:
This change detects the presence of Zba, Zbb, and Zbs extensions and exports
them per-hart to userspace via the hwprobe mechanism. Glibc can then use
these in setting up hwcaps-based library search paths.
There's a little bit of extra housekeeping here: the first change adds
Zba and Zbs to the set of extensions the kernel recognizes, and the second
change starts tracking ISA features per-hart (in addition to the ANDed
mask of features across all harts which the kernel uses to make
decisions). Now that we track the ISA information per-hart, we could
even fix up /proc/cpuinfo to accurately report extension per-hart,
though I've left that out of this series for now.
* b4-shazam-merge:
RISC-V: hwprobe: Expose Zba, Zbb, and Zbs
RISC-V: Track ISA extensions per hart
RISC-V: Add Zba, Zbs extension probing
Link: https://lore.kernel.org/r/20230509182504.2997252-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The kernel maintains a mask of ISA extensions ANDed together across all
harts. Let's also keep a bitmap of ISA extensions for each CPU. Although
the kernel is currently unlikely to enable a feature that exists only on
some CPUs, we want the ability to report asymmetric CPU extensions
accurately to usermode.
Note that riscv_fill_hwcaps() runs before the per_cpu_offsets are built,
which is why I've used a [NR_CPUS] array rather than per_cpu() data.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230509182504.2997252-3-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add the Zba address bit manipulation extension and Zbs single bit
instructions extension into those the kernel is aware of and maintains
in its riscv_isa bitmap.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230509182504.2997252-2-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|