summaryrefslogtreecommitdiff
path: root/arch/riscv/include
AgeCommit message (Collapse)Author
2024-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...
2024-07-20Merge tag 'riscv-for-linus-6.11-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for various new ISA extensions: * The Zve32[xf] and Zve64[xfd] sub-extensios of the vector extension * Zimop and Zcmop for may-be-operations * The Zca, Zcf, Zcd and Zcb sub-extensions of the C extension * Zawrs - riscv,cpu-intc is now dtschema - A handful of performance improvements and cleanups to text patching - Support for memory hot{,un}plug - The highest user-allocatable virtual address is now visible in hwprobe * tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (58 commits) riscv: lib: relax assembly constraints in hweight riscv: set trap vector earlier KVM: riscv: selftests: Add Zawrs extension to get-reg-list test KVM: riscv: Support guest wrs.nto riscv: hwprobe: export Zawrs ISA extension riscv: Add Zawrs support for spinlocks dt-bindings: riscv: Add Zawrs ISA extension description riscv: Provide a definition for 'pause' riscv: hwprobe: export highest virtual userspace address riscv: Improve sbi_ecall() code generation by reordering arguments riscv: Add tracepoints for SBI calls and returns riscv: Optimize crc32 with Zbc extension riscv: Enable DAX VMEMMAP optimization riscv: mm: Add support for ZONE_DEVICE virtio-mem: Enable virtio-mem for RISC-V riscv: Enable memory hotplugging for RISC-V riscv: mm: Take memory hotplug read-lock during kernel page table dump riscv: mm: Add memory hotplugging support riscv: mm: Add pfn_to_kaddr() implementation riscv: mm: Refactor create_linear_mapping_range() for memory hot add ...
2024-07-18Merge tag 'ftrace-v6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: "Rewrite of function graph tracer to allow multiple users Up until now, the function graph tracer could only have a single user attached to it. If another user tried to attach to the function graph tracer while one was already attached, it would fail. Allowing function graph tracer to have more than one user has been asked for since 2009, but it required a rewrite to the logic to pull it off so it never happened. Until now! There's three systems that trace the return of a function. That is kretprobes, function graph tracer, and BPF. kretprobes and function graph tracing both do it similarly. The difference is that kretprobes uses a shadow stack per callback and function graph tracer creates a shadow stack for all tasks. The function graph tracer method makes it possible to trace the return of all functions. As kretprobes now needs that feature too, allowing it to use function graph tracer was needed. BPF also wants to trace the return of many probes and its method doesn't scale either. Having it use function graph tracer would improve that. By allowing function graph tracer to have multiple users allows both kretprobes and BPF to use function graph tracer in these cases. This will allow kretprobes code to be removed in the future as it's version will no longer be needed. Note, function graph tracer is only limited to 16 simultaneous users, due to shadow stack size and allocated slots" * tag 'ftrace-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (49 commits) fgraph: Use str_plural() in test_graph_storage_single() function_graph: Add READ_ONCE() when accessing fgraph_array[] ftrace: Add missing kerneldoc parameters to unregister_ftrace_direct() function_graph: Everyone uses HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, remove it function_graph: Fix up ftrace_graph_ret_addr() function_graph: Make fgraph_update_pid_func() a stub for !DYNAMIC_FTRACE function_graph: Rename BYTE_NUMBER to CHAR_NUMBER in selftests fgraph: Remove some unused functions ftrace: Hide one more entry in stack trace when ftrace_pid is enabled function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not enabled function_graph: Make fgraph_do_direct static key static ftrace: Fix prototypes for ftrace_startup/shutdown_subops() ftrace: Assign RCU list variable with rcu_assign_ptr() ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU ftrace: Declare function_trace_op in header to quiet sparse warning ftrace: Add comments to ftrace_hash_move() and friends ftrace: Convert "inc" parameter to bool in ftrace_hash_rec_update_modify() ftrace: Add comments to ftrace_hash_rec_disable/enable() ftrace: Remove "filter_hash" parameter from __ftrace_hash_rec_update() ftrace: Rename dup_hash() and comment it ...
2024-07-16Merge tag 'kvm-x86-generic-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM generic changes for 6.11 - Enable halt poll shrinking by default, as Intel found it to be a clear win. - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86. - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out(). - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs. - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout. - A few minor cleanups
2024-07-15riscv: lib: relax assembly constraints in hweightQingfang Deng
rd and rs don't have to be the same. In some cases where rs needs to be saved for later usage, this will save us some mv instructions. Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn> Reviewed-by: Xiao Wang <xiao.w.wang@intel.com> Link: https://lore.kernel.org/r/20240527092405.134967-1-dqfext@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-12mm: provide mm_struct and address to huge_ptep_get()Christophe Leroy
On powerpc 8xx huge_ptep_get() will need to know whether the given ptep is a PTE entry or a PMD entry. This cannot be known with the PMD entry itself because there is no easy way to know it from the content of the entry. So huge_ptep_get() will need to know either the size of the page or get the pmd. In order to be consistent with huge_ptep_get_and_clear(), give mm and address to huge_ptep_get(). Link: https://lkml.kernel.org/r/cc00c70dd384298796a4e1b25d6c4eb306d3af85.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12Merge patch series "riscv: Apply Zawrs when available"Palmer Dabbelt
Andrew Jones <ajones@ventanamicro.com> says: Zawrs provides two instructions (wrs.nto and wrs.sto), where both are meant to allow the hart to enter a low-power state while waiting on a store to a memory location. The instructions also both wait an implementation-defined "short" duration (unless the implementation terminates the stall for another reason). The difference is that while wrs.sto will terminate when the duration elapses, wrs.nto, depending on configuration, will either just keep waiting or an ILL exception will be raised. Linux will use wrs.nto, so if platforms have an implementation which falls in the "just keep waiting" category (which is not expected), then it should _not_ advertise Zawrs in the hardware description. Like wfi (and with the same {m,h}status bits to configure it), when wrs.nto is configured to raise exceptions it's expected that the higher privilege level will see the instruction was a wait instruction, do something, and then resume execution following the instruction. For example, KVM does configure exceptions for wfi (hstatus.VTW=1) and therefore also for wrs.nto. KVM does this for wfi since it's better to allow other tasks to be scheduled while a VCPU waits for an interrupt. For waits such as those where wrs.nto/sto would be used, which are typically locks, it is also a good idea for KVM to be involved, as it can attempt to schedule the lock holding VCPU. This series starts with Christoph's addition of the riscv smp_cond_load_relaxed function which applies wrs.sto when available. That patch has been reworked to use wrs.nto and to use the same approach as Arm for the wait loop, since we can't have arbitrary C code between the load-reserved and the wrs. Then, hwprobe support is added (since the instructions are also usable from usermode), and finally KVM is taught about wrs.nto, allowing guests to see and use the Zawrs extension. We still don't have test results from hardware, and it's not possible to prove that using Zawrs is a win when testing on QEMU, not even when oversubscribing VCPUs to guests. However, it is possible to use KVM selftests to force a scenario where we can prove Zawrs does its job and does it well. [4] is a test which does this and, on my machine, without Zawrs it takes 16 seconds to complete and with Zawrs it takes 0.25 seconds. This series is also available here [1]. In order to use QEMU for testing a build with [2] is needed. In order to enable guests to use Zawrs with KVM using kvmtool, the branch at [3] may be used. [1] https://github.com/jones-drew/linux/commits/riscv/zawrs-v3/ [2] https://lore.kernel.org/all/20240312152901.512001-2-ajones@ventanamicro.com/ [3] https://github.com/jones-drew/kvmtool/commits/riscv/zawrs/ [4] https://github.com/jones-drew/linux/commit/cb2beccebcece10881db842ed69bdd5715cfab5d Link: https://lore.kernel.org/r/20240426100820.14762-8-ajones@ventanamicro.com * b4-shazam-merge: KVM: riscv: selftests: Add Zawrs extension to get-reg-list test KVM: riscv: Support guest wrs.nto riscv: hwprobe: export Zawrs ISA extension riscv: Add Zawrs support for spinlocks dt-bindings: riscv: Add Zawrs ISA extension description riscv: Provide a definition for 'pause' Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-12KVM: riscv: Support guest wrs.ntoAndrew Jones
When a guest traps on wrs.nto, call kvm_vcpu_on_spin() to attempt to yield to the lock holding VCPU. Also extend the KVM ISA extension ONE_REG interface to allow KVM userspace to detect and enable the Zawrs extension for the Guest/VM. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240426100820.14762-13-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-12riscv: hwprobe: export Zawrs ISA extensionAndrew Jones
Export Zawrs ISA extension through hwprobe. [Palmer: there's a gap in the numbers here as there will be a merge conflict when this is picked up. To avoid confusion I just set the hwprobe ID to match what it would be post-merge.] Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20240426100820.14762-12-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-12Merge tag 'loongarch-kvm-6.11' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.11 1. Add ParaVirt steal time support. 2. Add some VM migration enhancement. 3. Add perf kvm-stat support for loongarch.
2024-07-12Merge tag 'kvm-riscv-6.11-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 6.11 - Redirect AMO load/store access fault traps to guest - Perf kvm stat support for RISC-V - Use guest files for IMSIC virtualization, when available ONE_REG support for the Zimop, Zcmop, Zca, Zcf, Zcd, Zcb and Zawrs ISA extensions is coming through the RISC-V tree.
2024-07-12riscv: Add Zawrs support for spinlocksChristoph Müllner
RISC-V code uses the generic ticket lock implementation, which calls the macros smp_cond_load_relaxed() and smp_cond_load_acquire(). Introduce a RISC-V specific implementation of smp_cond_load_relaxed() which applies WRS.NTO of the Zawrs extension in order to reduce power consumption while waiting and allows hypervisors to enable guests to trap while waiting. smp_cond_load_acquire() doesn't need a RISC-V specific implementation as the generic implementation is based on smp_cond_load_relaxed() and smp_acquire__after_ctrl_dep() sufficiently provides the acquire semantics. This implementation is heavily based on Arm's approach which is the approach Andrea Parri also suggested. The Zawrs specification can be found here: https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Co-developed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240426100820.14762-11-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-12riscv: Provide a definition for 'pause'Andrew Jones
If we're going to provide the encoding for 'pause' in cpu_relax() anyway, then we can drop the toolchain checks and just always use it. The advantage of doing this is that other code that need pause don't need to also define it (yes, another use is coming). Add the definition to insn-def.h since it's an instruction definition and also because insn-def.h doesn't include much, so it's safe to include from asm/vdso/processor.h without concern for circular dependencies. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240426100820.14762-9-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-11riscv: hwprobe: export highest virtual userspace addressClément Léger
Some userspace applications (OpenJDK for instance) uses the free MSBs in pointers to insert additional information for their own logic and need to get this information from somewhere. Currently they rely on parsing /proc/cpuinfo "mmu=svxx" string to obtain the current value of virtual address usable bits [1]. Since this reflect the raw supported MMU mode, it might differ from the logical one used internally which is why arch_get_mmap_end() is used. Exporting the highest mmapable address through hwprobe will allow a more stable interface to be used. For that purpose, add a new hwprobe key named RISCV_HWPROBE_KEY_HIGHEST_VIRT_ADDRESS which will export the highest userspace virtual address. Link: https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp#L171 [1] Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240410144558.1104006-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10riscv: Remove unnecessary int cast in variable_fls()Thorsten Blum
__builtin_clz() returns an int and casting the whole expression to int is unnecessary. Remove it. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2024-07-10riscv: Improve sbi_ecall() code generation by reordering argumentsAlexandre Ghiti
The sbi_ecall() function arguments are not in the same order as the ecall arguments, so we end up re-ordering the registers before the ecall which is useless and costly. So simply reorder the arguments in the same way as expected by ecall. Instead of reordering directly the arguments of sbi_ecall(), use a proxy macro since the current ordering is more natural. Before: Dump of assembler code for function sbi_ecall: 0xffffffff800085e0 <+0>: add sp,sp,-32 0xffffffff800085e2 <+2>: sd s0,24(sp) 0xffffffff800085e4 <+4>: mv t1,a0 0xffffffff800085e6 <+6>: add s0,sp,32 0xffffffff800085e8 <+8>: mv t3,a1 0xffffffff800085ea <+10>: mv a0,a2 0xffffffff800085ec <+12>: mv a1,a3 0xffffffff800085ee <+14>: mv a2,a4 0xffffffff800085f0 <+16>: mv a3,a5 0xffffffff800085f2 <+18>: mv a4,a6 0xffffffff800085f4 <+20>: mv a5,a7 0xffffffff800085f6 <+22>: mv a6,t3 0xffffffff800085f8 <+24>: mv a7,t1 0xffffffff800085fa <+26>: ecall 0xffffffff800085fe <+30>: ld s0,24(sp) 0xffffffff80008600 <+32>: add sp,sp,32 0xffffffff80008602 <+34>: ret After: Dump of assembler code for function __sbi_ecall: 0xffffffff8000b6b2 <+0>: add sp,sp,-32 0xffffffff8000b6b4 <+2>: sd s0,24(sp) 0xffffffff8000b6b6 <+4>: add s0,sp,32 0xffffffff8000b6b8 <+6>: ecall 0xffffffff8000b6bc <+10>: ld s0,24(sp) 0xffffffff8000b6be <+12>: add sp,sp,32 0xffffffff8000b6c0 <+14>: ret Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/20240322112629.68170-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10riscv: Add tracepoints for SBI calls and returnsSamuel Holland
These are useful for measuring the latency of SBI calls. The SBI HSM extension is excluded because those functions are called from contexts such as cpuidle where instrumentation is not allowed. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240321230131.1838105-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10riscv: convert to generic syscall tableArnd Bergmann
The uapi/asm/unistd_{32,64}.h and asm/syscall_table_{32,64}.h headers can now be generated from scripts/syscall.tbl, which makes this consistent with the other architectures that have their own syscall.tbl. riscv has two extra system call that gets added to scripts/syscall.tbl. The newstat and rlimit entries in the syscall_abis_64 line are for system calls that were part of the generic ABI when riscv64 got added but are no longer enabled by default for new architectures. Both riscv32 and riscv64 also implement memfd_secret, which is optional for all architectures. Unlike all the other 32-bit architectures, the time32 and stat64 sets of syscalls are not enabled on riscv32. Both the user visible side of asm/unistd.h and the internal syscall table in the kernel should have the same effective contents after this. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-10clone3: drop __ARCH_WANT_SYS_CLONE3 macroArnd Bergmann
When clone3() was introduced, it was not obvious how each architecture deals with setting up the stack and keeping the register contents in a fork()-like system call, so this was left for the architecture maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those that already implement it. Five years later, we still have a few architectures left that are missing clone3(), and the macro keeps getting in the way as it's fundamentally different from all the other __ARCH_WANT_SYS_* macros that are meant to provide backwards-compatibility with applications using older syscalls that are no longer provided by default. Address this by reversing the polarity of the macro, adding an __ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3 from all the other ones. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-03mm: implement update_mmu_tlb() using update_mmu_tlb_range()Bang Li
Let's make update_mmu_tlb() simply a generic wrapper around update_mmu_tlb_range(). Only the latter can now be overridden by the architecture. We can now remove __HAVE_ARCH_UPDATE_MMU_TLB as well. Link: https://lkml.kernel.org/r/20240522061204.117421-3-libang.li@antgroup.com Signed-off-by: Bang Li <libang.li@antgroup.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Lance Yang <ioworker0@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03mm: add update_mmu_tlb_range()Bang Li
Patch series "Add update_mmu_tlb_range() to simplify code", v4. This series of commits mainly adds the update_mmu_tlb_range() to batch update tlb in an address range and implement update_mmu_tlb() using update_mmu_tlb_range(). After commit 19eaf44954df ("mm: thp: support allocation of anonymous multi-size THP"), We may need to batch update tlb of a certain address range by calling update_mmu_tlb() in a loop. Using the update_mmu_tlb_range(), we can simplify the code and possibly reduce the execution of some unnecessary code in some architectures. This patch (of 3): Add update_mmu_tlb_range(), we can batch update tlb of an address range. Link: https://lkml.kernel.org/r/20240522061204.117421-1-libang.li@antgroup.com Link: https://lkml.kernel.org/r/20240522061204.117421-2-libang.li@antgroup.com Signed-off-by: Bang Li <libang.li@antgroup.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Lance Yang <ioworker0@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-26Merge patch series "riscv: Memory Hot(Un)Plug support"Palmer Dabbelt
Björn Töpel <bjorn@kernel.org> says: From: Björn Töpel <bjorn@rivosinc.com> ================================================================ Memory Hot(Un)Plug support (and ZONE_DEVICE) for the RISC-V port ================================================================ Introduction ============ To quote "Documentation/admin-guide/mm/memory-hotplug.rst": "Memory hot(un)plug allows for increasing and decreasing the size of physical memory available to a machine at runtime." This series adds memory hot(un)plugging, and ZONE_DEVICE support for the RISC-V Linux port. MM configuration ================ RISC-V MM has the following configuration: * Memory blocks are 128M, analogous to x86-64. It uses PMD ("hugepage") vmemmaps. From that follows that 2M (PMD) worth of vmemmap spans 32768 pages á 4K which gets us 128M. * The pageblock size is the minimum minimum virtio_mem size, and on RISC-V it's 2M (2^9 * 4K). Implementation ============== The PGD table on RISC-V is shared/copied between for all processes. To avoid doing page table synchronization, the first patch (patch 1) pre-allocated the PGD entries for vmemmap/direct map. By doing that the init_mm PGD will be fixed at kernel init, and synchronization can be avoided all together. The following two patches (patch 2-3) does some preparations, followed by the actual MHP implementation (patch 4-5). Then, MHP and virtio-mem are enabled (patch 6-7), and finally ZONE_DEVICE support is added (patch 8). MHP and locking =============== TL;DR: The MHP does not step on any toes, except for ptdump. Additional locking is required for ptdump. Long version: For v2 I spent some time digging into init_mm synchronization/update. Here are my findings, and I'd love them to be corrected if incorrect. It's been a gnarly path... The `init_mm` structure is a special mm (perhaps not a "real" one). It's a "lazy context" that tracks kernel page table resources, e.g., the kernel page table (swapper_pg_dir), a kernel page_table_lock (more about the usage below), mmap_lock, and such. `init_mm` does not track/contain any VMAs. Having the `init_mm` is convenient, so that the regular kernel page table walk/modify functions can be used. Now, `init_mm` being special means that the locking for kernel page tables are special as well. On RISC-V the PGD (top-level page table structure), similar to x86, is shared (copied) with user processes. If the kernel PGD is modified, it has to be synched to user-mode processes PGDs. This is avoided by pre-populating the PGD, so it'll be fixed from boot. The in-kernel pgd regions are documented in `Documentation/arch/riscv/vm-layout.rst`. The distinct regions are: * vmemmap * vmalloc/ioremap space * direct mapping of all physical memory * kasan * modules, BPF * kernel Memory hotplug is the process of adding/removing memory to/from the kernel. Adding is done in two phases: 1. Add the memory to the kernel 2. Online memory, making it available to the page allocator. Step 1 is partially architecture dependent, and updates the init_mm page table: * Update the direct map page tables. The direct map is a linear map, representing all physical memory: `virt = phys + PAGE_OFFSET` * Add a `struct page` for each added page of memory. Update the vmemmap (virtual mapping to the `struct page`, so we can easily transform a kernel virtual address to a `struct page *` address. From an MHP perspective, there are two regions of the PGD that are updated: * vmemmap * direct mapping of all physical memory The `struct mm_struct` has a couple of locks in play: * `spinlock_t page_table_lock` protects the page table, and some counters * `struct rw_semaphore mmap_lock` protect an mm's VMAs Note again that `init_mm` does not contain any VMAs, but still uses the mmap_lock in some places. The `page_table_lock` was originally used to to protect all pages tables, but more recently a split page table lock has been introduced. The split lock has a per-table lock for the PTE and PMD tables. If split lock is disabled, all tables are guarded by `mm->page_table_lock` (for user processes). Split page table locks are not used for init_mm. MHP operations is typically synchronized using `DEFINE_STATIC_PERCPU_RWSEM(mem_hotplug_lock)`. Actors ------ The following non-MHP actors in the kernel traverses (read), and/or modifies the kernel PGD. * `ptdump` Walks the entire `init_mm`, via `ptdump_walk_pgd()` with the `mmap_write_lock(init_mm)` taken. Observation: ptdump can race with MHP, and needs additional locking to avoid crashes/races. * `set_direct_*` / `arch/riscv/mm/pageattr.c` The `set_direct_*` functionality is used to "synchronize" the direct map to other kernel mappings, e.g. modules/kernel text. The direct map is using "as large huge table mappings as possible", which means that the `set_direct_*` might need to split the direct map. The `set_direct_*` functions operates with the `mmap_write_lock(init_mm)` taken. Observation: `set_direct_*` uses the direct map, but will never modify the same entry as MHP. If there is a mapping, that entry will never race with MHP. Further, MHP acts when memory is offline. * HVO / `mm/hugetlb_vmemmap` HVO optimizes the backing `struct page` for hugetlb pages, which means changing the "vmemmap" region. HVO can split (merge?) a vmemmap pmd. However, it will never race with MHP, since HVO only operates at online memory. HVO cannot touch memory being MHP added or removed. * `apply_to_page_range` Walks a range, creates pages and applies a callback (setting permissions) for the page. When creating a table, it might use `int __pte_alloc_kernel(pmd_t *pmd)` which takes the `init_mm.page_table_lock` to synchronize pmd populate. Used by: `mm/vmalloc.c` and `mm/kasan/shadow.c`. The KASAN callback takes the `init_mm.page_table_lock` to synchronize pte creation. Observations: `apply_to_page_range` applies to the "vmalloc/ioremap space" region, and "kasan" region. *Not* affected by MHP. * `apply_to_existing_page_range` Walks a range, applies a callback (setting permissions) for the page (no page creation). Used by: `kernel/bpf/arena.c` and `mm/kasan/shadow.c`. The KASAN callback takes the `init_mm.page_table_lock` to synchronize pte creation. *Not* affected by MHP regions. * `apply_to_existing_page_range` applies to the "vmalloc/ioremap space" region, and "kasan" region. *Not* affected by MHP regions. * `ioremap_page_range` and `vmap_page_range` Uses the same internal function, and might create table entries at the "vmalloc/ioremap space" region. Can call `__pte_alloc_kernel()` which takes the `init_mm.page_table_lock` synchronizing pmd populate in the region. *Not* affected by MHP regions. Summary: * MHP add will never modify the same page table entries, as any of the other actors. * MHP remove is done when memory is offlined, and will not clash with any of the actors. * Functions that walk the entire kernel page table need synchronization * It's sufficient to add the MHP lock ptdump. Testing ======= This series adds basic DT supported hotplugging. There is a QEMU series enabling MHP for the RISC-V "virt" machine here: [1] ACPI/MSI support is still in the making for RISC-V, and prior proper (ACPI) PCI MSI support lands [2] and NUMA SRAT support [3], it hard to try it out. I've prepared a QEMU branch with proper ACPI GED/PC-DIMM support [4], and a this series with the required prerequisites [5] (AIA, ACPI AIA MADT, ACPI NUMA SRAT). To test with virtio-mem, e.g.: | qemu-system-riscv64 \ | -machine virt,aia=aplic-imsic \ | -cpu rv64,v=true,vlen=256,elen=64,h=true,zbkb=on,zbkc=on,zbkx=on,zkr=on,zkt=on,svinval=on,svnapot=on,svpbmt=on \ | -nodefaults \ | -nographic -smp 8 -kernel rv64-u-boot.bin \ | -drive file=rootfs.img,format=raw,if=virtio \ | -device virtio-rng-pci \ | -m 16G,slots=3,maxmem=32G \ | -object memory-backend-ram,id=mem0,size=16G \ | -numa node,nodeid=0,memdev=mem0 \ | -serial chardev:char0 \ | -mon chardev=char0,mode=readline \ | -chardev stdio,mux=on,id=char0 \ | -device pci-serial,id=serial0,chardev=char0 \ | -object memory-backend-ram,id=vmem0,size=2G \ | -device virtio-mem-pci,id=vm0,memdev=vmem0,node=0 where "rv64-u-boot.bin" is U-boot with EFI/ACPI-support (use [6] if you're lazy). In the QEMU monitor: | (qemu) info memory-devices | (qemu) qom-set vm0 requested-size 1G ...to test DAX/KMEM, use the follow QEMU parameters: | -object memory-backend-file,id=mem1,share=on,mem-path=virtio_pmem.img,size=4G \ | -device virtio-pmem-pci,memdev=mem1,id=nv1 and the regular ndctl/daxctl dance. If you're brave to try the ACPI branch, add "acpi=on" to "-machine virt", and test PC-DIMM MHP (in addition to virtio-{p},mem): In the QEMU monitor: | (qemu) object_add memory-backend-ram,id=mem1,size=1G | (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 You can also try hot-remove with some QEMU options, say: | -object memory-backend-file,id=mem-1,size=256M,mem-path=/pagesize-2MB | -device pc-dimm,id=mem1,memdev=mem-1 | -object memory-backend-file,id=mem-2,size=1G,mem-path=/pagesize-1GB | -device pc-dimm,id=mem2,memdev=mem-2 | -object memory-backend-file,id=mem-3,size=256M,mem-path=/pagesize-2MB | -device pc-dimm,id=mem3,memdev=mem-3 Remove "acpi=on" to run with DT. Thanks to Alex, Andrew, David, and Oscar for all comments/tests/fixups. References ========== [1] https://lore.kernel.org/qemu-devel/20240521105635.795211-1-bjorn@kernel.org/ [2] https://lore.kernel.org/linux-riscv/20240501121742.1215792-1-sunilvl@ventanamicro.com/ [3] https://lore.kernel.org/linux-riscv/cover.1713778236.git.haibo1.xu@intel.com/ [4] https://github.com/bjoto/qemu/commits/virtio-mem-pc-dimm-mhp-acpi-v2/ [5] https://github.com/bjoto/linux/commits/mhp-v4-acpi [6] https://github.com/bjoto/riscv-rootfs-utils/tree/acpi * b4-shazam-merge: riscv: Enable DAX VMEMMAP optimization riscv: mm: Add support for ZONE_DEVICE virtio-mem: Enable virtio-mem for RISC-V riscv: Enable memory hotplugging for RISC-V riscv: mm: Take memory hotplug read-lock during kernel page table dump riscv: mm: Add memory hotplugging support riscv: mm: Add pfn_to_kaddr() implementation riscv: mm: Refactor create_linear_mapping_range() for memory hot add riscv: mm: Change attribute from __init to __meminit for page functions riscv: mm: Pre-allocate vmemmap/direct map/kasan PGD entries riscv: mm: Properly forward vmemmap_populate() altmap parameter Link: https://lore.kernel.org/r/20240605114100.315918-1-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: mm: Add support for ZONE_DEVICEBjörn Töpel
ZONE_DEVICE pages need DEVMAP PTEs support to function (ARCH_HAS_PTE_DEVMAP). Claim another RSW (reserved for software) bit in the PTE for DEVMAP mark, add the corresponding helpers, and enable ARCH_HAS_PTE_DEVMAP for riscv64. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240605114100.315918-11-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: mm: Add pfn_to_kaddr() implementationBjörn Töpel
The pfn_to_kaddr() function is used by KASAN's memory hotplugging path. Add the missing function to the RISC-V port, so that it can be built with MHP and CONFIG_KASAN. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240605114100.315918-6-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: mm: Change attribute from __init to __meminit for page functionsBjörn Töpel
Prepare for memory hotplugging support by changing from __init to __meminit for the page table functions that are used by the upcoming architecture specific callbacks. Changing the __init attribute to __meminit, avoids that the functions are removed after init. The __meminit attribute makes sure the functions are kept in the kernel text post init, but only if memory hotplugging is enabled for the build. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240605114100.315918-4-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: mm: Pre-allocate vmemmap/direct map/kasan PGD entriesBjörn Töpel
The RISC-V port copies the PGD table from init_mm/swapper_pg_dir to all userland page tables, which means that if the PGD level table is changed, other page tables has to be updated as well. Instead of having the PGD changes ripple out to all tables, the synchronization can be avoided by pre-allocating the PGD entries/pages at boot, avoiding the synchronization all together. This is currently done for the bpf/modules, and vmalloc PGD regions. Extend this scheme for the PGD regions touched by memory hotplugging. Prepare the RISC-V port for memory hotplug by pre-allocate vmemmap/direct map/kasan entries at the PGD level. This will roughly waste ~128 (plus 32 if KASAN is enabled) worth of 4K pages when memory hotplugging is enabled in the kernel configuration. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240605114100.315918-3-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: dmi: Add SMBIOS/DMI supportHaibo Xu
Enable the dmi driver for riscv which would allow access the SMBIOS info through some userspace file(/sys/firmware/dmi/*). The change was based on that of arm64 and has been verified by dmidecode tool. Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240613065507.287577-1-haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: Implement pte_accessible()Alexandre Ghiti
Like other architectures, a pte is accessible if it is present or if there is a pending tlb flush and the pte is protnone (which could be the case when a pte is downgraded to protnone before a flush tlb is executed). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240128115953.25085-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26Merge patch series "Add support for a few Zc* extensions, Zcmop and Zimop"Palmer Dabbelt
Clément Léger <cleger@rivosinc.com> says: Add support for (yet again) more RVA23U64 missing extensions. Add support for Zimop, Zcmop, Zca, Zcf, Zcd and Zcb extensions ISA string parsing, hwprobe and kvm support. Zce, Zcmt and Zcmp extensions have been left out since they target microcontrollers/embedded CPUs and are not needed by RVA23U64. Since Zc* extensions states that C implies Zca, Zcf (if F and RV32), Zcd (if D), this series modifies the way ISA string is parsed and now does it in two phases. First one parses the string and the second one validates it for the final ISA description. * b4-shazam-merge: KVM: riscv: selftests: Add Zcmop extension to get-reg-list test RISC-V: KVM: Allow Zcmop extension for Guest/VM riscv: hwprobe: export Zcmop ISA extension riscv: add ISA extension parsing for Zcmop dt-bindings: riscv: add Zcmop ISA extension description KVM: riscv: selftests: Add some Zc* extensions to get-reg-list test RISC-V: KVM: Allow Zca, Zcf, Zcd and Zcb extensions for Guest/VM riscv: hwprobe: export Zca, Zcf, Zcd and Zcb ISA extensions riscv: add ISA parsing for Zca, Zcf, Zcd and Zcb riscv: add ISA extensions validation callback dt-bindings: riscv: add Zca, Zcf, Zcd and Zcb ISA extension description KVM: riscv: selftests: Add Zimop extension to get-reg-list test RISC-V: KVM: Allow Zimop extension for Guest/VM riscv: hwprobe: export Zimop ISA extension riscv: add ISA extension parsing for Zimop dt-bindings: riscv: add Zimop ISA extension description Link: https://lore.kernel.org/r/20240619113529.676940-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26RISC-V: KVM: Allow Zcmop extension for Guest/VMClément Léger
Extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Zcmop extension for Guest/VM. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240619113529.676940-16-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: hwprobe: export Zcmop ISA extensionClément Léger
Export Zcmop ISA extension through hwprobe. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Evan Green <evan@rivosinc.com> Link: https://lore.kernel.org/r/20240619113529.676940-15-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: add ISA extension parsing for ZcmopClément Léger
Add parsing for Zcmop ISA extension which was ratified in commit c732a4f39a4c ("Zcmop is ratified/1.0") of the riscv-isa-manual. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240619113529.676940-14-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26RISC-V: KVM: Allow Zca, Zcf, Zcd and Zcb extensions for Guest/VMClément Léger
Extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Zca, Zcf, Zcd and Zcb extensions for Guest/VM. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240619113529.676940-11-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: hwprobe: export Zca, Zcf, Zcd and Zcb ISA extensionsClément Léger
Export Zca, Zcf, Zcd and Zcb ISA extension through hwprobe. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240619113529.676940-10-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: add ISA parsing for Zca, Zcf, Zcd and ZcbClément Léger
The Zc* standard extension for code reduction introduces new extensions. This patch adds support for Zca, Zcf, Zcd and Zcb. Zce, Zcmt and Zcmp are left out of this patch since they are targeting microcontrollers/ embedded CPUs instead of application processors. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240619113529.676940-9-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: add ISA extensions validation callbackClément Léger
Since a few extensions (Zicbom/Zicboz) already needs validation and future ones will need it as well (Zc*) add a validate() callback to struct riscv_isa_ext_data. This require to rework the way extensions are parsed and split it in two phases. First phase is isa string or isa extension list parsing and consists in enabling all the extensions in a temporary bitmask (source isa) without any validation. The second step "resolves" the final isa bitmap, handling potential missing dependencies. The mechanism is quite simple and simply validate each extension described in the source bitmap before enabling it in the resolved isa bitmap. validate() callbacks can return either 0 for success, -EPROBEDEFER if extension needs to be validated again at next loop. A previous ISA bitmap is kept to avoid looping multiple times if an extension dependencies are never satisfied until we reach a stable state. In order to avoid any potential infinite looping, allow looping a maximum of the number of extension we handle. Zicboz and Zicbom extensions are modified to use this validation mechanism. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240619113529.676940-8-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26RISC-V: KVM: Allow Zimop extension for Guest/VMClément Léger
Extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Zimop extension for Guest/VM. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240619113529.676940-5-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: hwprobe: export Zimop ISA extensionClément Léger
Export Zimop ISA extension through hwprobe. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240619113529.676940-4-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: add ISA extension parsing for ZimopClément Léger
Add parsing for Zimop ISA extension which was ratified in commit 58220614a5f of the riscv-isa-manual. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240619113529.676940-3-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26Merge patch series "riscv: Various text patching improvements"Palmer Dabbelt
Samuel Holland <samuel.holland@sifive.com> says: Here are a few changes to minimize calls to stop_machine() and flush_icache_*() in the various text patching functions, as well as to simplify the code. * b4-shazam-merge: riscv: Remove extra variable in patch_text_nosync() riscv: Use offset_in_page() in text patching functions riscv: Pass patch_text() the length in bytes riscv: Simplify text patching loops riscv: kprobes: Use patch_text_nosync() for insn slots riscv: jump_label: Simplify assembly syntax riscv: jump_label: Batch icache maintenance Link: https://lore.kernel.org/r/20240327160520.791322-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: Pass patch_text() the length in bytesSamuel Holland
patch_text_nosync() already handles an arbitrary length of code, so this removes a superfluous loop and reduces the number of icache flushes. Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240327160520.791322-6-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: jump_label: Simplify assembly syntaxSamuel Holland
The idiomatic way to write "jal zero" is "j". Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240327160520.791322-3-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: jump_label: Batch icache maintenanceSamuel Holland
Switch to the batched version of the jump label update functions so instruction cache maintenance is deferred until the end of the update. Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240327160520.791322-2-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26RISC-V: KVM: Share APLIC and IMSIC defines with irqchip driversAnup Patel
We have common APLIC and IMSIC headers available under include/linux/irqchip/ directory which are used by APLIC and IMSIC irqchip drivers. Let us replace the use of kvm_aia_*.h headers with include/linux/irqchip/riscv-*.h headers. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20240411090639.237119-2-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-06-25RISC-V: fix vector insn load/store width maskJesse Taube
RVFDQ_FL_FS_WIDTH_MASK should be 3 bits [14-12], shifted down by 12 bits. Replace GENMASK(3, 0) with GENMASK(2, 0). Fixes: cd054837243b ("riscv: Allocate user's vector context in the first-use trap") Signed-off-by: Jesse Taube <jesse@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240606182800.415831-1-jesse@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-11KVM: Delete the now unused kvm_arch_sched_in()Sean Christopherson
Delete kvm_arch_sched_in() now that all implementations are nops. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240522014013.1672962-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-11function_graph: Everyone uses HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, remove itSteven Rostedt (Google)
All architectures that implement function graph also implements HAVE_FUNCTION_GRAPH_RET_ADDR_PTR. Remove it, as it is no longer a differentiator. Link: https://lore.kernel.org/linux-trace-kernel/20240611031737.982047614@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-30riscv: vector: adjust minimum Vector requirement to ZVE32XAndy Chiu
Make has_vector() to check for ZVE32X. Every in-kernel usage of V that requires a more complicate version of V must then call out explicitly. Also, change riscv_v_first_use_handler(), and boot code that calls riscv_v_setup_vsize() to accept ZVE32X. Most kernel/user interfaces requires minimum of ZVE32X. Thus, programs compiled and run with ZVE32X should be supported by the kernel on most aspects. This includes context-switch, signal, ptrace, prctl, and hwprobe. One exception is that ELF_HWCAP returns 'V' only if full V is supported on the platform. This means that the system without a full V must not rely on ELF_HWCAP to tell whether it is allowable to execute Vector without first invoking a prctl() check. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Acked-by: Joel Granados <j.granados@samsung.com> Link: https://lore.kernel.org/r/20240510-zve-detection-v5-7-0711bdd26c12@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-30riscv: hwprobe: add zve Vector subextensions into hwprobe interfaceAndy Chiu
The following Vector subextensions for "embedded" platforms are added into RISCV_HWPROBE_KEY_IMA_EXT_0: - ZVE32X - ZVE32F - ZVE64X - ZVE64F - ZVE64D Extensions ending with an X indicates that the platform doesn't have a vector FPU. Extensions ending with F/D mean that whether single (F) or double (D) precision vector operation is supported. The number 32 or 64 follows from ZVE tells the maximum element length. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20240510-zve-detection-v5-6-0711bdd26c12@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-30riscv: cpufeature: add zve32[xf] and zve64[xfd] isa detectionAndy Chiu
Multiple Vector subextensions are added. Also, the patch takes care of the dependencies of Vector subextensions by macro expansions. So, if some "embedded" platform only reports "zve64f" on the ISA string, the parser is able to expand it to zve32x zve32f zve64x and zve64f. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240510-zve-detection-v5-5-0711bdd26c12@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>