summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2022-01-07KVM: x86: Update vPMCs when retiring instructionsEric Hankland
When KVM retires a guest instruction through emulation, increment any vPMCs that are configured to monitor "instructions retired," and update the sample period of those counters so that they will overflow at the right time. Signed-off-by: Eric Hankland <ehankland@google.com> [jmattson: - Split the code to increment "branch instructions retired" into a separate commit. - Added 'static' to kvm_pmu_incr_counter() definition. - Modified kvm_pmu_incr_counter() to check pmc->perf_event->state == PERF_EVENT_STATE_ACTIVE. ] Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") Signed-off-by: Jim Mattson <jmattson@google.com> [likexu: - Drop checks for pmc->perf_event or event state or event type - Increase a counter once its umask bits and the first 8 select bits are matched - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf; - Rename kvm_pmu_record_event to kvm_pmu_trigger_event; - Add counter enable and CPL check for kvm_pmu_trigger_event(); ] Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-6-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()Like Xu
Depending on whether intr should be triggered or not, KVM registers two different event overflow callbacks in the perf_event context. The code skeleton of these two functions is very similar, so the pmc->intr can be stored into pmc from pmc_reprogram_counter() which provides smaller instructions footprint against the u-architecture branch predictor. The __kvm_perf_overflow() can be called in non-nmi contexts and a flag is needed to distinguish the caller context and thus avoid a check on kvm_is_in_guest(), otherwise we might get warnings from suspicious RCU or check_preemption_disabled(). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-5-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event()Like Xu
Since we set the same semantic event value for the fixed counter in pmc->eventsel, returning the perf_hw_id for the fixed counter via find_fixed_event() can be painlessly replaced by pmc_perf_hw_id() with the help of pmc_is_fixed() check. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-4-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()Like Xu
The find_arch_event() returns a "unsigned int" value, which is used by the pmc_reprogram_counter() to program a PERF_TYPE_HARDWARE type perf_event. The returned value is actually the kernel defined generic perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler incoming parameters for better self-explanation. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-3-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/pmu: Setup pmc->eventsel for fixed PMCsLike Xu
The current pmc->eventsel for fixed counter is underutilised. The pmc->eventsel can be setup for all known available fixed counters since we have mapping between fixed pmc index and the intel_arch_events array. Either gp or fixed counter, it will simplify the later checks for consistency between eventsel and perf_hw_id. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-2-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86: avoid out of bounds indices for fixed performance countersPaolo Bonzini
Because IceLake has 4 fixed performance counters but KVM only supports 3, it is possible for reprogram_fixed_counters to pass to reprogram_fixed_counter an index that is out of bounds for the fixed_pmc_events array. Ultimately intel_find_fixed_event, which is the only place that uses fixed_pmc_events, handles this correctly because it checks against the size of fixed_pmc_events anyway. Every other place operates on the fixed_counters[] array which is sized according to INTEL_PMC_MAX_FIXED. However, it is cleaner if the unsupported performance counters are culled early on in reprogram_fixed_counters. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: VMX: Mark VCPU_EXREG_CR3 dirty when !CR0_PG -> CR0_PG if EPT + !URGLai Jiangshan
When !CR0_PG -> CR0_PG, vcpu->arch.cr3 becomes active, but GUEST_CR3 is still vmx->ept_identity_map_addr if EPT + !URG. So VCPU_EXREG_CR3 is considered to be dirty and GUEST_CR3 needs to be updated in this case. Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211216021938.11752-4-jiangshanlai@gmail.com> Fixes: c62c7bd4f95b ("KVM: VMX: Update vmcs.GUEST_CR3 only when the guest CR3 is dirty") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/mmu: Reconstruct shadow page root if the guest PDPTEs is changedLai Jiangshan
For shadow paging, the page table needs to be reconstructed before the coming VMENTER if the guest PDPTEs is changed. But not all paths that call load_pdptrs() will cause the page tables to be reconstructed. Normally, kvm_mmu_reset_context() and kvm_mmu_free_roots() are used to launch later reconstruction. The commit d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed") skips kvm_mmu_reset_context() after load_pdptrs() when changing CR0.CD and CR0.NW. The commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush") skips kvm_mmu_free_roots() after load_pdptrs() when rewriting the CR3 with the same value. The commit a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after load_pdptrs() when changing CR4.PGE. Guests like linux would keep the PDPTEs unchanged for every instance of pagetable, so this missing reconstruction has no problem for linux guests. Fixes: d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed") Fixes: 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush") Fixes: a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211216021938.11752-3-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs()Lai Jiangshan
The host CR3 in the vcpu thread can only be changed when scheduling, so commit 15ad9762d69f ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()") changed vmx.c to only save it in vmx_prepare_switch_to_guest(). However, it also has to be synced in vmx_sync_vmcs_host_state() when switching VMCS. vmx_set_host_fs_gs() is called in both places, so rename it to vmx_set_vmcs_host_state() and make it update HOST_CR3. Fixes: 15ad9762d69f ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211216021938.11752-2-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07Revert "KVM: X86: Update mmu->pdptrs only when it is changed"Paolo Bonzini
This reverts commit 24cd19a28cb7174df502162641d6e1e12e7ffbd9. Sean Christopherson reports: "Commit 24cd19a28cb7 ('KVM: X86: Update mmu->pdptrs only when it is changed') breaks nested VMs with EPT in L0 and PAE shadow paging in L2. Reproducing is trivial, just disable EPT in L1 and run a VM. I haven't investigating how it breaks things." Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07Merge tag 'kvm-riscv-5.17-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 5.17, take #1 - Use common KVM implementation of MMU memory caches - SBI v0.2 support for Guest - Initial KVM selftests support - Fix to avoid spurious virtual interrupts after clearing hideleg CSR - Update email address for Anup and Atish
2022-01-07Merge tag 'kvmarm-5.17' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update
2022-01-07s390/pci: simplify __pciwb_mio() inline asmNiklas Schnelle
The PCI Write Barrier instruction ignores the registers encoded in it. There is thus no need to explicitly set the register to zero or to associate it with a variable at all. In the resulting binary this removes an unnecessary lghi and it makes the code simpler. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-01-06Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf-next 2022-01-06 We've added 41 non-merge commits during the last 2 day(s) which contain a total of 36 files changed, 1214 insertions(+), 368 deletions(-). The main changes are: 1) Various fixes in the verifier, from Kris and Daniel. 2) Fixes in sockmap, from John. 3) bpf_getsockopt fix, from Kuniyuki. 4) INET_POST_BIND fix, from Menglong. 5) arm64 JIT fix for bpf pseudo funcs, from Hou. 6) BPF ISA doc improvements, from Christoph. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits) bpf: selftests: Add bind retry for post_bind{4, 6} bpf: selftests: Use C99 initializers in test_sock.c net: bpf: Handle return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() bpf/selftests: Test bpf_d_path on rdonly_mem. libbpf: Add documentation for bpf_map batch operations selftests/bpf: Don't rely on preserving volatile in PT_REGS macros in loop3 xdp: Add xdp_do_redirect_frame() for pre-computed xdp_frames xdp: Move conversion to xdp_frame out of map functions page_pool: Store the XDP mem id page_pool: Add callback to init pages when they are allocated xdp: Allow registering memory model without rxq reference samples/bpf: xdpsock: Add timestamp for Tx-only operation samples/bpf: xdpsock: Add time-out for cleaning Tx samples/bpf: xdpsock: Add sched policy and priority support samples/bpf: xdpsock: Add cyclic TX operation capability samples/bpf: xdpsock: Add clockid selection support samples/bpf: xdpsock: Add Dest and Src MAC setting for Tx-only operation samples/bpf: xdpsock: Add VLAN support for Tx-only operation libbpf 1.0: Deprecate bpf_object__find_map_by_offset() API libbpf 1.0: Deprecate bpf_map__is_offload_neutral() ... ==================== Link: https://lore.kernel.org/r/20220107013626.53943-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07parisc: io: Improve the outb(), outw() and outl() macrosBart Van Assche
This patch fixes the following build error for source file drivers/scsi/pcmcia/sym53c500_cs.c: In file included from ./include/linux/bug.h:5, from ./include/linux/cpumask.h:14, from ./include/linux/mm_types_task.h:14, from ./include/linux/mm_types.h:5, from ./include/linux/buildid.h:5, from ./include/linux/module.h:14, from drivers/scsi/pcmcia/sym53c500_cs.c:42: drivers/scsi/pcmcia/sym53c500_cs.c: In function ‘SYM53C500_intr’: ./arch/parisc/include/asm/bug.h:28:2: error: expected expression before ‘do’ 28 | do { \ | ^~ ./arch/parisc/include/asm/io.h:276:20: note: in expansion of macro ‘BUG’ 276 | #define outb(x, y) BUG() | ^~~ drivers/scsi/pcmcia/sym53c500_cs.c:124:19: note: in expansion of macro ‘outb’ 124 | #define REG0(x) (outb(C4_IMG, (x) + CONFIG4)) | ^~~~ drivers/scsi/pcmcia/sym53c500_cs.c:362:2: note: in expansion of macro ‘REG0’ 362 | REG0(port_base); | ^~~~ Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: linux-parisc@vger.kernel.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-07parisc: Add kgdb io_module to read chars via PDCHelge Deller
Add a simplistic keyboard driver for usage of PDC I/O functions with kgdb. This driver makes it possible to use KGDB with QEMU. Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-07parisc: Fix pdc_toc_pim_11 and pdc_toc_pim_20 definitionsHelge Deller
The definitions for pdc_toc_pim_11 and pdc_toc_pim_20 are wrong since they include an entry for a hversion field which doesn't exist in the specification. Fix this and clean up some whitespaces so that the whole file will be in sync with it's copy in the SeaBIOS-hppa sources. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.16
2022-01-07parisc: Add lws_atomic_xchg and lws_atomic_store syscallsJohn David Anglin
This patch adds two new LWS routines - lws_atomic_xchg and lws_atomic_store. These are simpler than the CAS routines. Currently, we use the CAS routines for atomic stores. This is inefficient since it requires both winning the spinlock and a successful CAS operation. Change has been tested on c8000 and rp3440. In v2, I moved the code to disble/enable page faults inside the spinlocks. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-07parisc: Rewrite light-weight syscall and futex codeJohn David Anglin
The parisc architecture lacks general hardware support for compare and swap. Particularly for userspace, it is difficult to implement software atomic support. Page faults in critical regions can cause processes to sleep and block the forward progress of other processes. Thus, it is essential that page faults be disabled in critical regions. For performance reasons, we also need to disable external interrupts in critical regions. In order to do this, we need a mechanism to trigger COW breaks outside the critical region. Fortunately, parisc has the "stbys,e" instruction. When the leftmost byte of a word is addressed, this instruction triggers all the exceptions of a normal store but it does not write to memory. Thus, we can use it to trigger COW breaks outside the critical region without modifying the data that is to be updated atomically. COW breaks occur randomly. So even if we have priviously executed a "stbys,e" instruction, we still need to disable pagefaults around the critical region. If a fault occurs in the critical region, we return -EAGAIN. I had to add a wrapper around _arch_futex_atomic_op_inuser() as I found in testing that returning -EAGAIN caused problems for some processes even though it is listed as a possible return value. The patch implements the above. The code no longer attempts to sleep with interrupts disabled and I haven't seen any stalls with the change. I have attempted to merge common code and streamline the fast path. In the futex code, we only compute the spinlock address once. I eliminated some debug code in the original CAS routine that just made the flow more complicated. I don't clip the arguments when called from wide mode. As a result, the LWS routines should work when called from 64-bit processes. I defined TASK_PAGEFAULT_DISABLED offset for use in the lws_pagefault_disable and lws_pagefault_enable macros. Since we now disable interrupts on the gateway page where necessary, it might be possible to allow processes to be scheduled when they are on the gateway page. Change has been tested on c8000 and rp3440. It improves glibc build and test time by about 10%. In v2, I removed the lws_atomic_xchg and and lws_atomic_store calls. I also removed the bug fixes that were not directly related to this patch. In v3, I removed the code to force interruptions from arch_futex_atomic_op_inuser(). It is always called with page faults disabled, so this code had no effect. In v4, I fixed a typo in depi_safe line. In v5, I moved the code to disable/enable page faults inside the spinlocks. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-07parisc: Enhance page fault termination messageJohn David Anglin
In debugging kernel panics, I believe it is useful to know what type of page fault caused the termination. "Bad Address" is too vague. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-07parisc: Don't call faulthandler_disabled() in do_page_fault()John David Anglin
It is dangerous to call faulthandler_disabled() when user_mode(regs) is true. The task pagefault_disabled counter is racy and it is not updated atomically on parisc. As a result, calling faulthandler_disabled() may cause erroneous termination. We now handle execption fixups and termination when user_mode(regs) is false in handle_interruption(). Thus, we can just remove the faulthandler_disabled() check from do_page_fault(). Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-07parisc: Switch user access functions to signal errors in r29 instead of r8Helge Deller
Use register r29 instead of register r8 to signal faults when accessing user memory. In case of faults, the fixup routine will store -EFAULT in this register. This change saves up to 752 bytes on a 32bit kernel, partly because the compiler doesn't need to save and restore the old r8 value on the stack. bloat-o-meter results for usage with r29 register: add/remove: 0/0 grow/shrink: 23/86 up/down: 228/-980 (-752) bloat-o-meter results for usage with r28 register: add/remove: 0/0 grow/shrink: 28/83 up/down: 296/-956 (-660) Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-07parisc: Avoid calling faulthandler_disabled() twiceJohn David Anglin
In handle_interruption(), we call faulthandler_disabled() to check whether the fault handler is not disabled. If the fault handler is disabled, we immediately call do_page_fault(). It then calls faulthandler_disabled(). If disabled, do_page_fault() attempts to fixup the exception by jumping to no_context: no_context: if (!user_mode(regs) && fixup_exception(regs)) { return; } parisc_terminate("Bad Address (null pointer deref?)", regs, code, address); Apart from the error messages, the two blocks of code perform the same function. We can avoid two calls to faulthandler_disabled() by a simple revision to the code in handle_interruption(). Note: I didn't try to fix the formatting of this code block. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-07parisc: Fix lpa and lpa_user definesJohn David Anglin
While working on the rewrite to the light-weight syscall and futex code, I experimented with using a hash index based on the user physical address of atomic variable. This exposed two problems with the lpa and lpa_user defines. Because of the copy instruction, the pa argument needs to be an early clobber argument. This prevents gcc from allocating the va and pa arguments to the same register. Secondly, the lpa instruction can cause a page fault so we need to catch exceptions. Signed-off-by: John David Anglin <dave.anglin@bell.net> Fixes: 116d753308cf ("parisc: Use lpa instruction to load physical addresses in driver code") Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.2+
2022-01-07parisc: Define depi_safe macroJohn David Anglin
The depi instruction is similar to the extru instruction on 64-bit machines. It leaves the most-significant 32 bits of the target register in an undefined state. On 64-bit machines, the macro uses depdi to perform safe deposits in the least-significant 32 bits. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-07lib/crypto: blake2s: include as built-inJason A. Donenfeld
In preparation for using blake2s in the RNG, we change the way that it is wired-in to the build system. Instead of using ifdefs to select the right symbol, we use weak symbols. And because ARM doesn't need the generic implementation, we make the generic one default only if an arch library doesn't need it already, and then have arch libraries that do need it opt-in. So that the arch libraries can remain tristate rather than bool, we then split the shash part from the glue code. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: linux-kbuild@vger.kernel.org Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: remove unused irq_flags argument from add_interrupt_randomness()Sebastian Andrzej Siewior
Since commit ee3e00e9e7101 ("random: use registers from interrupted code for CPU's w/o a cycle counter") the irq_flags argument is no longer used. Remove unused irq_flags. Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Liu <wei.liu@kernel.org> Cc: linux-hyperv@vger.kernel.org Cc: x86@kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-06x86, sched: Fix undefined reference to init_freq_invariance_cppc() build errorHuang Rui
The init_freq_invariance_cppc function is implemented in smpboot and depends on CONFIG_SMP. MODPOST vmlinux.symvers MODINFO modules.builtin.modinfo GEN modules.builtin LD .tmp_vmlinux.kallsyms1 ld: drivers/acpi/cppc_acpi.o: in function `acpi_cppc_processor_probe': /home/ray/brahma3/linux/drivers/acpi/cppc_acpi.c:819: undefined reference to `init_freq_invariance_cppc' make: *** [Makefile:1161: vmlinux] Error 1 See https://lore.kernel.org/lkml/484af487-7511-647e-5c5b-33d4429acdec@infradead.org/. Fixes: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Huang Rui <ray.huang@amd.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-06RISC-V: Clean up the defconfigsPalmer Dabbelt
It's been a while since cleaning up the defconfigs, so I manually checked up on each change. This found a handful of minor issues, which have been fixed in-line.
2022-01-06RISC-V: defconfigs: Remove redundant K210 DT sourcePalmer Dabbelt
The "k210_generic" DT has been the default in Kconfig since 67d96729a9e ("riscv: Update Canaan Kendryte K210 device tree"), so drop it from the defconfigs to avoid diff with savedefconfig. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-01-06bootmem: Use page->index instead of page->freelistMatthew Wilcox (Oracle)
page->freelist is for the use of slab. Using page->index is the same set of bits as page->freelist, and by using an integer instead of a pointer, we can avoid casts. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: <x86@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com>
2022-01-06powerpc/xmon: Dump XIVE information for online-only processors.Sachin Sant
dxa command in XMON debugger iterates through all possible processors. As a result, empty lines are printed even for processors which are not online. CPU 47:pp=00 CPPR=ff IPI=0x0040002f PQ=-- EQ idx=699 T=0 00000000 00000000 CPU 48: CPU 49: Restrict XIVE information(dxa) to be displayed for online processors only. Signed-off-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/164139226833.12930.272224382183014664.sendpatchset@MacBook-Pro.local
2022-01-06KVM: RISC-V: Avoid spurious virtual interrupts after clearing hideleg CSRVincent Chen
When the last VM is terminated, the host kernel will invoke function hardware_disable_nolock() on each CPU to disable the related virtualization functions. Here, RISC-V currently only clears hideleg CSR and hedeleg CSR. This behavior will cause the host kernel to receive spurious interrupts if hvip CSR has pending interrupts and the corresponding enable bits in vsie CSR are asserted. To avoid it, hvip CSR and vsie CSR must be cleared before clearing hideleg CSR. Fixes: 99cdc6c18c2d ("RISC-V: Add initial skeletal KVM support") Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>
2022-01-06RISC-V: KVM: Add VM capability to allow userspace get GPA bitsAnup Patel
The number of GPA bits supported for a RISC-V Guest/VM is based on the MMU mode used by the G-stage translation. The KVM RISC-V will detect and use the best possible MMU mode for the G-stage in kvm_arch_init(). We add a generic VM capability KVM_CAP_VM_GPA_BITS which can be used by the KVM userspace to get the number of GPA (guest physical address) bits supported for a Guest/VM. Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-and-tested-by: Atish Patra <atishp@rivosinc.com>
2022-01-06RISC-V: KVM: Forward SBI experimental and vendor extensionsAnup Patel
The SBI experimental extension space is for temporary (or experimental) stuff whereas SBI vendor extension space is for hardware vendor specific stuff. Both these SBI extension spaces won't be standardized by the SBI specification so let's blindly forward such SBI calls to the userspace. Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-and-tested-by: Atish Patra <atishp@rivosinc.com>
2022-01-06RISC-V: KVM: make kvm_riscv_vcpu_fp_clean() staticJisheng Zhang
There are no users outside vcpu_fp.c so make kvm_riscv_vcpu_fp_clean() static. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Anup Patel <anup.patel@wdc.com>
2022-01-06RISC-V: KVM: Add SBI HSM extension in KVMAtish Patra
SBI HSM extension allows OS to start/stop harts any time. It also allows ordered booting of harts instead of random booting. Implement SBI HSM exntesion and designate the vcpu 0 as the boot vcpu id. All other non-zero non-booting vcpus should be brought up by the OS implementing HSM extension. If the guest OS doesn't implement HSM extension, only single vcpu will be available to OS. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>
2022-01-06RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2Atish Patra
The SBI v0.2 contains some of the improved versions of required v0.1 extensions such as remote fence, timer and IPI. This patch implements those extensions. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>
2022-01-06RISC-V: KVM: Add SBI v0.2 base extensionAtish Patra
SBI v0.2 base extension defined to allow backward compatibility and probing of future extensions. This is also the only mandatory SBI extension that must be implemented by SBI implementors. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>
2022-01-06RISC-V: KVM: Reorganize SBI code by moving SBI v0.1 to its own fileAtish Patra
With SBI v0.2, there may be more SBI extensions in future. It makes more sense to group related extensions in separate files. Guest kernel will choose appropriate SBI version dynamically. Move the existing implementation to a separate file so that it can be removed in future without much conflict. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>
2022-01-06RISC-V: KVM: Mark the existing SBI implementation as v0.1Atish Patra
The existing SBI specification impelementation follows v0.1 specification. The latest specification allows more scalability and performance improvements. Rename the existing implementation as v0.1 and provide a way to allow future extensions. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>
2022-01-06KVM: RISC-V: Use common KVM implementation of MMU memory cachesSean Christopherson
Use common KVM's implementation of the MMU memory caches, which for all intents and purposes is semantically identical to RISC-V's version, the only difference being that the common implementation will fall back to an atomic allocation if there's a KVM bug that triggers a cache underflow. RISC-V appears to have based its MMU code on arm64 before the conversion to the common caches in commit c1a33aebe91d ("KVM: arm64: Use common KVM implementation of MMU memory caches"), despite having also copy-pasted the definition of KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE in kvm_types.h. Opportunistically drop the superfluous wrapper kvm_riscv_stage2_flush_cache(), whose name is very, very confusing as "cache flush" in the context of MMU code almost always refers to flushing hardware caches, not freeing unused software objects. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>
2022-01-06arm/xen: Read extended regions from DT and init Xen resourceOleksandr Tyshchenko
This patch implements arch_xen_unpopulated_init() on Arm where the extended regions (if any) are gathered from DT and inserted into specific Xen resource to be used as unused address space for Xen scratch pages by unpopulated-alloc code. The extended region (safe range) is a region of guest physical address space which is unused and could be safely used to create grant/foreign mappings instead of wasting real RAM pages from the domain memory for establishing these mappings. The extended regions are chosen by the hypervisor at the domain creation time and advertised to it via "reg" property under hypervisor node in the guest device-tree. As region 0 is reserved for grant table space (always present), the indexes for extended regions are 1...N. If arch_xen_unpopulated_init() fails for some reason the default behaviour will be restored (allocate xenballooned pages). This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/1639080336-26573-6-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-06arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DTOleksandr Tyshchenko
Read the start address of the grant table space from DT (region 0). This patch mostly restores behaviour before commit 3cf4095d7446 ("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table") but trying not to break the ACPI support added after that commit. So the patch touches DT part only and leaves the ACPI part with xen_xlate_map_ballooned_pages(). Also in order to make a code more resilient use a fallback to xen_xlate_map_ballooned_pages() if grant table region wasn't found. This is a preparation for using Xen extended region feature where unused regions of guest physical address space (provided by the hypervisor) will be used to create grant/foreign/whatever mappings instead of wasting real RAM pages from the domain memory for establishing these mappings. The immediate benefit of this change: - Avoid superpage shattering in Xen P2M when establishing stage-2 mapping (GFN <-> MFN) for the grant table space - Avoid wasting real RAM pages (reducing the amount of memory usuable) for mapping grant table space - The grant table space is always mapped at the exact same place (region 0 is reserved for the grant table) Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/1639080336-26573-3-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-06xen/x86: obtain upper 32 bits of video frame buffer address for Dom0Jan Beulich
The hypervisor has been supplying this information for a couple of major releases. Make use of it. The need to set a flag in the capabilities field also points out that the prior setting of that field from the hypervisor interface's gbl_caps one was wrong, so that code gets deleted (there's also no equivalent of this in native boot code). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/a3df8bf3-d044-b7bb-3383-cd5239d6d4af@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-05RISC-V: defconfigs: Remove redundant CONFIG_EFI=yPalmer Dabbelt
We've always had CONFIG_EFI as "def_bool y" so this has always been redundant. It's removed by savedefconfig, so drop it to keep things clean. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-01-05RISC-V: defconfigs: Remove redundant CONFIG_POWER_RESETPalmer Dabbelt
As of ab7fbad0c7d7 ("riscv: Fix unmet direct dependencies built based on SOC_VIRT") we select CONFIG_POWER_RESET=y along with CONFIG_SOC_VIRT, which is already in defconfig. This make setting CONFIG_POWER_RESET in the defconfigs redundant, so remove it to remain consistent with savedefconfig. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-01-05RISC-V: defconfigs: Sort CONFIG_BLK_DEV_BSGPalmer Dabbelt
This should have no functional change, it just sorts CONFIG_BLK_DEV_BSG the same way savedefconfig does. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-01-05RISC-V: defconfigs: Sort CONFIG_SURFACE_PLATFORMSPalmer Dabbelt
This should have no functional change, it just sorts CONFIG_SURFACE_PLATFORMS the same way savedefconfig does. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-01-05RISC-V: defconfigs: Sort CONFIG_MMCPalmer Dabbelt
This should have no functional change, it just sorts CONFIG_MMC the same way savedefconfig does. This only touches the rv64 defconfig because rv32_defconfig was already sorted correctly. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>